I agree that trying to map all human values is extremely complex as articulated here [http://wiki.lesswrong.com/wiki/Complexity_of_value] , but the problem as I see it, is that we do not really have a choice - there has to be some way of measuring the initial AGI to see how it is handling these concepts.
I dont understand why we don’t try to prototype a high level ontology of core values for an AGI to adhere to - something that humans can discuss and argue about for many years before we actually build an AGI.
Law is a useful example which shows that human values cannot be absolutely quantified into a universal system. The law is constantly abused, misused and corrected so if a similar system were to be put into place for an AGI it could quickly lead to UFAI.
One of the interesting things about the law is that for core concepts like murder, the rules are well defined and fairly unambiguous, whereas more trivial things (in terms of risk to humans) like tax laws, parking laws are the bits that have a lot of complexity to them.
The idea is that if you have a CEV gadget, you can ask it moral questions and it will come up with answers that do, in fact, look good. You wouldn't have thought of them yourself (because you're not that clever and don't have a built-in population ethic, blah blah blah), but once you see them, you definitely prefer them.
That's... interesting.
Is there a writeup somewhere of why we should expect an unaltered me to endorse the proposals that would, if implemented, best instantiate our coherent extrapolated volition?
I'm having a very hard time seeing why I should expect that; it seems to assume a level of consistency between what I endorse, what I desire, and what I "actually want" (in the CEV sense) that just doesn't seem true of humans.