Most concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is “why would we give an AI goals anyway?” I think there are good reasons to expect goal-oriented behavior, and I’ve been on that side of a lot of arguments. But I don’t think the issue is settled, and it might be possible to get better outcomes without them. I flesh out one possible alternative here, based on the dictum "take the action I would like best" rather than "achieve the outcome I would like best."
(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments.)
I think this is a very important contribution. The only internal downside of this might be that the simulation of the overseer within the ai would be sentient. But if defined correctly, most of these simulations would not really be leading bad lives. The external downside is overtaking by other goal oriented AIs.
The thing is, I think in any design, it is impossible to tear away purpose from a lot of the subsequent design decisions. I need to think about this a little deeper.