The project of Friendly AI would benefit from being approached in a much more down-to-earth way. Discourse about the subject seems to be dominated by a set of possibilities which are given far too much credence:
- A single AI will take over the world
- A future galactic civilization depends on 21st-century Earth
- 10n-year lifespans are at stake, n greater than or equal to 3
- We might be living in a simulation
- Acausal deal-making
- Multiverse theory
Add up all of that, and you have a great recipe for enjoyable irrelevance. Negate every single one of those ideas, and you have an alternative set of working assumptions that are still consistent with the idea that Friendly AI matters, and which are much more suited to practical success:
- There will always be multiple centers of power
- What's at stake is, at most, the future centuries of a solar-system civilization
- No assumption that individual humans can survive even for hundreds of years, or that they would want to
- Assume that the visible world is the real world
- Assume that life and intelligence are about causal interaction
- Assume that the single visible world is the only world we affect or have reason to care about
The simplest reason to care about Friendly AI is that we are going to be coexisting with AI, and so we should want it to be something we can live with. I don't see that anything important would be lost by strongly foregrounding the second set of assumptions, and treating the first set of possibilities just as possibilities, rather than as the working hypothesis about reality.
[Earlier posts on related themes: practical FAI, FAI without "outsourcing".]
One of these is not like the others. Whether a single AI takes over the world, or whether there are always multiple centers of power is something we choose. The rest are properties of the universe waiting to be discovered (or which we think we've discovered, as the case may be).
I think it's an especially important choice, and I think that the world ends up looking much better, by my values, with one dominant AI rather than power diffused among many. This is not to say that one AI would make all the decisions, but rather that something has to be powerful enough to veto all the really bad ideas, and notice when others are playing with fire; and I don't think very many entities can stably have that power at once. In human societies, spreading power out more reduces the damage if some of the humans turn out to be bad. Among AIs, the failure modes are different and I don't think that's the case any more; I think an insane AI does a more or less constant amount of damage (it destroys everything), so spreading out the power just increases the risk of one of them being bad.
Whether the recursive self-improvement is possible or not, is the property of the universe. Also other details, like how much additional time and energy is necessary for another increase in intelligence.
The answer to this question can make some outcomes more likely. For example, if recursive self-improvement is possible, and at some level you can get a huge increase in intelligence very quickly and relatively cheaply, one of the centers of power could easily overpower the other ones. Perhaps even in situations where every super-agent would read and analyze the source code of all other super-agents all the time; the increased intelligence could allow one of them to make changes that will seem harmless to the other ones.
On the other hand the multiple centers of power scenario is more likely if humankind spreads to many planets, and there is some natural limit how high an intelligence can become before it somehow collapses or starts needing insane amounts of energy; so no super-agent could be smart enough to conquer the rest of the world.