Most concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is “why would we give an AI goals anyway?” I think there are good reasons to expect goal-oriented behavior, and I’ve been on that side of a lot of arguments. But I don’t think the issue is settled, and it might be possible to get better outcomes without them. I flesh out one possible alternative here, based on the dictum "take the action I would like best" rather than "achieve the outcome I would like best."
(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments.)
In the "Learning from examples" case, Arthur looks a lot like AIXI with a time horizon of 1 (i.e., one that acts to maximize just the expected next reward), and I don't understand why you say "But unlike AIXI, Arthur will make no effort to manipulate these judgments." For example, it seems like Arthur could learn a model in which approval[T](a) = 1 if a is an action which results in taking over the approval input terminal and giving itself maximum approval.