Value is Fragile

Value Theory

Followup toThe Fun Theory Sequence, Fake Fake Utility Functions, Joy in the Merely Good, The Hidden Complexity of WishesThe Gift We Give To Tomorrow, No Universally Compelling Arguments, Anthropomorphic Optimism, Magical Categories, ...

If I had to pick a single statement that relies on more Overcoming Bias content I've written than any other, that statement would be:

Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.

"Well," says the one, "maybe according to your provincial human values, you wouldn't like it.  But I can easily imagine a galactic civilization full of agents who are nothing like you, yet find great value and interest in their own goals.  And that's fine by me.  I'm not so bigoted as you are.  Let the Future go its own way, without trying to bind it forever to the laughably primitive prejudices of a pack of four-limbed Squishy Things -"

My friend, I have no problem with the thought of a galactic civilization vastly unlike our own... full of strange beings who look nothing like me even in their own imaginations... pursuing pleasures and experiences I can't begin to empathize with... trading in a marketplace of unimaginable goods... allying to pursue incomprehensible objectives... people whose life-stories I could never understand.

That's what the Future looks like if things go right.

If the chain of inheritance from human (meta)morals is broken, the Future does not look like this.  It does not end up magically, delightfully incomprehensible.

With very high probability, it ends up looking dull.  Pointless.  Something whose loss you wouldn't mourn.

Seeing this as obvious, is what requires that immense amount of background explanation.

And I'm not going to iterate through all the points and winding pathways of argument here, because that would take us back through 75% of my Overcoming Bias posts.  Except to remark on how many different things must be known to constrain the final answer.

Consider the incredibly important human value of "boredom" - our desire not to do "the same thing" over and over and over again.  You can imagine a mind that contained almost the whole specification of human value, almost all the morals and metamorals, but left out just this one thing -

- and so it spent until the end of time, and until the farthest reaches of its light cone, replaying a single highly optimized experience, over and over and over again.

Or imagine a mind that contained almost the whole specification of which sort of feelings humans most enjoy - but not the idea that those feelings had important external referents.  So that the mind just went around feeling like it had made an important discovery, feeling it had found the perfect lover, feeling it had helped a friend, but not actually doing any of those things - having become its own experience machine.  And if the mind pursued those feelings and their referents, it would be a good future and true; but because this one dimension of value was left out, the future became something dull.  Boring and repetitive, because although this mind felt that it was encountering experiences of incredible novelty, this feeling was in no wise true.

Or the converse problem - an agent that contains all the aspects of human value, except the valuation of subjective experience.  So that the result is a nonsentient optimizer that goes around making genuine discoveries, but the discoveries are not savored and enjoyed, because there is no one there to do so.  This, I admit, I don't quite know to be possible.  Consciousness does still confuse me to some extent.  But a universe with no one to bear witness to it, might as well not be.

Value isn't just complicated, it's fragile.  There is more than one dimension of human value, where if just that one thing is lost, the Future becomes null.  A single blow and all value shatters.  Not every single blow will shatter all value - but more than one possible "single blow" will do so.

And then there are the long defenses of this proposition, which relies on 75% of my Overcoming Bias posts, so that it would be more than one day's work to summarize all of it.  Maybe some other week.  There's so many branches I've seen that discussion tree go down.

After all - a mind shouldn't just go around having the same experience over and over and over again.  Surely no superintelligence would be so grossly mistaken about the correct action?

Why would any supermind want something so inherently worthless as the feeling of discovery without any real discoveries?  Even if that were its utility function, wouldn't it just notice that its utility function was wrong, and rewrite it?  It's got free will, right?

Surely, at least boredom has to be a universal value.  It evolved in humans because it's valuable, right?  So any mind that doesn't share our dislike of repetition, will fail to thrive in the universe and be eliminated...

If you are familiar with the difference between instrumental values and terminal values, and familiar with the stupidity of natural selection, and you understand how this stupidity manifests in the difference between executing adaptations versus maximizing fitness, and you know this turned instrumental subgoals of reproduction into decontextualized unconditional emotions...

...and you're familiar with how the tradeoff between exploration and exploitation works in Artificial Intelligence...

...then you might be able to see that the human form of boredom that demands a steady trickle of novelty for its own sake, isn't a grand universal, but just a particular algorithm that evolution coughed out into us.  And you might be able to see how the vast majority of possible expected utility maximizers, would only engage in just so much efficient exploration, and spend most of its time exploiting the best alternative found so far, over and over and over.

That's a lot of background knowledge, though.

And so on and so on and so on through 75% of my posts on Overcoming Bias, and many chains of fallacy and counter-explanation.  Some week I may try to write up the whole diagram.  But for now I'm going to assume that you've read the arguments, and just deliver the conclusion:

We can't relax our grip on the future - let go of the steering wheel - and still end up with anything of value.

And those who think we can -

- they're trying to be cosmopolitan.  I understand that.  I read those same science fiction books as a kid:  The provincial villains who enslave aliens for the crime of not looking just like humans.  The provincial villains who enslave helpless AIs in durance vile on the assumption that silicon can't be sentient.  And the cosmopolitan heroes who understand that minds don't have to be just like us to be embraced as valuable -

I read those books.  I once believed them.  But the beauty that jumps out of one box, is not jumping out of all boxes.  (This being the moral of the sequence on Lawful Creativity.)  If you leave behind all order, what is left is not the perfect answer, what is left is perfect noise.  Sometimes you have to abandon an old design rule to build a better mousetrap, but that's not the same as giving up all design rules and collecting wood shavings into a heap, with every pattern of wood as good as any other.  The old rule is always abandoned at the behest of some higher rule, some higher criterion of value that governs.

If you loose the grip of human morals and metamorals - the result is not mysterious and alien and beautiful by the standards of human value.  It is moral noise, a universe tiled with paperclips.  To change away from human morals in the direction of improvement rather than entropy, requires a criterion of improvement; and that criterion would be physically represented in our brains, and our brains alone.

Relax the grip of human value upon the universe, and it will end up seriously valueless.  Not, strange and alien and wonderful, shocking and terrifying and beautiful beyond all human imagination.  Just, tiled with paperclips.

It's only some humans, you see, who have this idea of embracing manifold varieties of mind - of wanting the Future to be something greater than the past - of being not bound to our past selves - of trying to change and move forward.

A paperclip maximizer just chooses whichever action leads to the greatest number of paperclips.

No free lunch.  You want a wonderful and mysterious universe?  That's your value.  You work to create that value.  Let that value exert its force through you who represents it, let it make decisions in you to shape the future.  And maybe you shall indeed obtain a wonderful and mysterious universe.

No free lunch.  Valuable things appear because a goal system that values them takes action to create them.  Paperclips don't materialize from nowhere for a paperclip maximizer.  And a wonderfully alien and mysterious Future will not materialize from nowhere for us humans, if our values that prefer it are physically obliterated - or even disturbed in the wrong dimension.  Then there is nothing left in the universe that works to make the universe valuable.

You do have values, even when you're trying to be "cosmopolitan", trying to display a properly virtuous appreciation of alien minds.  Your values are then faded further into the invisible background - they are less obviously human.  Your brain probably won't even generate an alternative so awful that it would wake you up, make you say "No!  Something went wrong!" even at your most cosmopolitan.  E.g. "a nonsentient optimizer absorbs all matter in its future light cone and tiles the universe with paperclips".  You'll just imagine strange alien worlds to appreciate.

Trying to be "cosmopolitan" - to be a citizen of the cosmos - just strips off a surface veneer of goals that seem obviously "human".

But if you wouldn't like the Future tiled over with paperclips, and you would prefer a civilization of...

...sentient beings...

...with enjoyable experiences...

...that aren't the same experience over and over again...

...and are bound to something besides just being a sequence of internal pleasurable feelings...

...learning, discovering, freely choosing...

...well, I've just been through the posts on Fun Theory that went into some of the hidden details on those short English words.

Values that you might praise as cosmopolitan or universal or fundamental or obvious common sense, are represented in your brain just as much as those values that you might dismiss as merely human.  Those values come of the long history of humanity, and the morally miraculous stupidity of evolution that created us.  (And once I finally came to that realization, I felt less ashamed of values that seemed 'provincial' - but that's another matter.)

These values do not emerge in all possible minds.  They will not appear from nowhere to rebuke and revoke the utility function of an expected paperclip maximizer.

Touch too hard in the wrong dimension, and the physical representation of those values will shatter - and not come back, for there will be nothing left to want to bring it back.

And the referent of those values - a worthwhile universe - would no longer have any physical reason to come into being.

Let go of the steering wheel, and the Future crashes.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 7:08 AM
Select new highlight date
All comments loaded

Robin, I discussed this in The Gift We Give To Tomorrow as a "moral miracle" that of course isn't really a miracle at all. We're judging the winding path that evolution took to human value, and judging it as fortuitous using our human values. (See also, "Where Recursive Justification Hits Bottom", "The Ultimate Source", "Created Already In Motion", etcetera.)

Pearson, it's not that kind of chaining. More like trying to explain to someone why their randomly chosen lottery ticket won't win (big space, small target, poor aim) when their brain manufactures argument after argument after different argument for why they'll soon be rich.

The core problem is simple. The targeting information disappears, so does the good outcome. Knowing enough to refute every fallacious remanufacturing of the value-information from nowhere, is the hard part.

What are the odds that every proof of God's existence is wrong, when there are so many proofs? Pretty high. A selective search for plausible-sounding excuses won't change reality itself. But knowing the specific refutations - being able to pinpoint the flaws in every supposed proof - that might take some study.

I think Eliezer is due for congratulation here. This series is nothing short of a mammoth intellectual achievement, integrating modern academic thought about ethics, evolutionary psychology and biases with the provocative questions of the transhumanist movement. I've learned a staggering amount from reading this OB series, especially about human values and my own biases and mental blank spots.

I hope we can all build on this. Really. There's a lot left to do, especially for transhumanists and those who hope for a significantly better future than the best available in today's world. For those who have more pedestrian ambitions for the future (i.e. most of the world), this series provides a stark warning as to how the well intentioned may destroy everything.

Bravo!

[crosspost from h+ goodness]

Probability of an evolved alien species:

(A) Possessing analogues of pleasure and pain: HIGH. Reinforcement learning is simpler than consequentialism for natural selection to stumble across.

(B) Having a human idiom of boredom that desires a steady trickle of novelty: MEDIUM. This has to do with acclimation and adjustment as a widespread neural idiom, and the way that we try to abstract that as a moral value. It's fragile but not impossible.

(C) Having a sense of humor: LOW.

Probability of an expected paperclip maximizer having analogous properties, if it originated as a self-improving code soup (rather than by natural selection), or if it was programmed over a competence threshold by foolish humans and then exploded:

(A) MEDIUM

(B) LOW

(C) LOW

I suspect it gets worse. Eliezer seems to lean heavily on the psychological unity of humankind, but there's a lot of room for variance within that human dot. My morality is a human morality, but that doesn't mean I'd agree with a weighted sum across all possible extrapolated human moralities. So even if you preserve human morals and metamorals, you could still end up with a future we'd find horrifying (albeit better than a paperclip galaxy). It might be said that that's only a Weirdtopia, that's you're horrified at first, but then you see that it's actually for the best after all. But if "the utility function [really] isn't up for grabs," then I'll be horrified for as long as I damn well please.

I'll be horrified for as long as I damn well please.

Well, okay, but the Weirdtopia thesis under consideration makes the empirical falsifiable prediction that "as long as you damn well please" isn't actually a very long time. Also, I call scope neglect: your puny human brain can model some aspects of your local environment, which is a tiny fraction of this Earth, but you're simply not competent to judge the entire future, which is much larger.

I would like to point out that you're probably replying to your past self. This gives me significant amusement.

A utility function is like a program in a Turing-complete language. If the behaviour can be computed at all, it can be computed by a utility function.

Tim, I've seen you state this before, but it's simply wrong. A utility function is not like a Turing-complete language. It imposes rather strong constraints on possible behavior.

Consider a program which when given the choices (A,B) outputs A. If you reset it and give it choices (B,C) it outputs B. If you reset it again and give it choices (C,A) it outputs C. The behavior of this program cannot be reproduced by a utility function.

Here's another example: When given (A,B) a program outputs "indifferent". When given (equal chance of A or B, A, B) it outputs "equal chance of A or B". This is also not allowed by EU maximization.

@Jotaf, "Order > chaos, no?"

Imagine God shows up tomorrow. "Everyone, hey, yeah. So I've got this other creation and they're super moral. Man, moral freaks, let me tell you. Make Mennonites look Shintoist. And, sure, I like them better than you. It's why I'm never around, sorry. Thing is, their planet is about to get eaten by a supernova. So.. I'm giving them the moral green light to invade Earth. It's been real."

I'd be the first to sign up for the resistance. Who cares about moral superiority? Are we more moral than a paperclip maximizer? Are human ideals 'better'? Who cares? I don't want an OfficeMax universe, so I'll take up arms against a paperclip maximizer, whether its blessed by God or not.

This is a critical post. I disagree with where Eliezer has gone from here; but I'm with him up to and including this point. This post is a good starting point for a dialogue.

  • likely values for all intelligent beings and optimization processes (power, resources)

Agree.

  • likely values for creatures with roughly human-level brain power (boredom, knowledge)

Disagree. Maybe we don't mean the same thing by boredom?

  • likely values for all creatures under evolutionary competition (reproduction, survival, family/clan/tribe)

Mostly agree. Depends somewhat on definition of evolution. Some evolved organisms pursue only 1 or 2 of these but all pursue at least one.

  • likely values for creatures under evolutionary competition who cannot copy their minds (individual identity, fear of personal death)

Disagree. Genome equivalents which don't generate terminally valued individual identity in the minds they descrive should outperform those that do.

  • likely values for creatures under evolutionary competition who cannot wirehead (pain, pleasure)

Disagree. Why not just direct expected utility? Pain and pleasure are easy to find but don't work nearly as well.

  • likely values for creatures with sexual reproduction (beauty, status, sex)

Define sexual. Most sexual creatures are too simple to value the first two. Most plausible posthumans aren't sexual in a traditional sense.

  • likely values for intelligent creatures with sexual reproduction (music, art, literature, humor)

Disagree.

  • likely values for intelligent creatures who cannot directly prove their beliefs (honesty, reputation, piety)

Agree assuming that they aren't singletons. Even then for sub-components.

  • values caused by idiosyncratic environmental characteristics (salt, sugar)

Agree.

  • values caused by random genetic/memetic drift and co-evolution (Mozart, Britney Spears, female breasts, devotion to specific religions)

Agree. Some caveats about Mozart.

We can sort the values evolution gave us into the following categories (not necessarily exhaustive). Note that only the first category of values is likely to be preserved without special effort, if Eliezer is right and our future is dominated by singleton FOOM scenarios. But many other values are likely to survive naturally in alternative futures.

  • likely values for all intelligent beings and optimization processes (power, resources)
  • likely values for creatures with roughly human-level brain power (boredom, knowledge)
  • likely values for all creatures under evolutionary competition (reproduction, survival, family/clan/tribe)
  • likely values for creatures under evolutionary competition who cannot copy their minds (individual identity, fear of personal death)
  • likely values for creatures under evolutionary competition who cannot wirehead (pain, pleasure)
  • likely values for creatures with sexual reproduction (beauty, status, sex)
  • likely values for intelligent creatures with sexual reproduction (music, art, literature, humor)
  • likely values for intelligent creatures who cannot directly prove their beliefs (honesty, reputation, piety)
  • values caused by idiosyncratic environmental characteristics (salt, sugar)
  • values caused by random genetic/memetic drift and co-evolution (Mozart, Britney Spears, female breasts, devotion to specific religions)

The above probably isn't controversial, rather the disagreement is mainly on the following:

  • the probabilities of various future scenarios
  • which values, if any, can be preserved using approaches such as FAI
  • which values, if any, we should try to preserve

I agree with Roko that Eliezer has made his case in an impressive fashion, but it seems that many of us are still not convinced on these three key points.

Take the last one. I agree with those who say that human values do not form a consistent and coherent whole. Another way of saying this is that human beings are not expected utility maximizers, not as individuals and certainly not as societies. Nor do most of us desire to become expected utility maximizers. Even amongst the readership of this blog, where one might logically expect to find the world's largest collection of EU-maximizer wannabes, few have expressed this desire. But there is no principled way to derive an utility function from something that is not an expected utility maximizer!

Is there any justification for trying to create an expected utility maximizer that will forever have power over everyone else, whose utility function is derived using a more or less arbitrary method from the incoherent values of those who happen to live in the present? That is, besides the argument that it is the only feasible alternative to a null future. Many of us are not convinced of this, neither the "only" nor the "feasible".

Wei_Dai2, it looks like you missed Eliezer's main point:

Value isn't just complicated, it's fragile. There is more than one dimension of human value, where if just that one thing is lost, the Future becomes null. A single blow and all value shatters. Not every single blow will shatter all value - but more than one possible "single blow" will do so.

It doesn't matter that "many" values survive, if Eliezer's "value is fragile" thesis is correct, because we could lose the whole future if we lose just a single critical value. Do we have such critical values? Maybe, maybe not, but you didn't address that issue.

I like the idea of replying to past selves and think it should be encouraged.

"Yeah, past me is terrible, but don't even get me started on future me, sheesh!"

Quite. I never expected LW to resemble classic scenes from Homestuck... except, you know, way more functional.

Carl:

I don't think that automatic fear, suspicion and hatred of outsiders is a necessary prerequisite to a special consideration for close friends, family, etc. Also, yes, outgroup hatred makes cooperation on large-scale Prisoner's Dilemmas even harder than it generally is for humans.

But finally, I want to point out that we are currently wired so that we can't get as motivated to face a huge problem if there's no villain to focus fear and hatred on. The "fighting" circuitry can spur us to superhuman efforts and successes, but it doesn't seem to trigger without an enemy we can characterize as morally evil.

If a disease of some sort threatened the survival of humanity, governments might put up a fight, but they'd never ask (and wouldn't receive) the level of mobilization and personal sacrifice that they got during World War II— although if they were crafty enough to say that terrorists caused it, they just might. Concern for loved ones isn't powerful enough without an idea that an evil enemy threatens them.

Wouldn't you prefer to have that concern for loved ones be a sufficient motivating force?

Ian C.: [i]"Yes, a designed life form can have paper clip values, but I don't think we'll encounter any naturally occurring beings like this. So our provincial little values may not be so provincial after all, but common on many planets."[i] Almost all life forms (especially simpler ones) are sort of paperclip maximizers, they just make copies of themselves ad infinitum. If life could leave this planet and use materials more efficiently, it would consume everything. Good for us evolution couldn't optimize them to such an extent.

Evolution (as an algorithm) doesn't work on the indestructible. Therefore all naturally-evolved beings must be fragile to some extent, and must have evolved to value protecting their fragility.

Yes, a designed life form can have paper clip values, but I don't think we'll encounter any naturally occurring beings like this. So our provincial little values may not be so provincial after all, but common on many planets.

I imagine a distant future with just a smattering of paper clip maximizers -- having risen in different galaxies with slightly different notions of what a paperclip is -- might actually be quite interesting. But even so, so what? Screw the paperclips, even if they turn out to be more elegant and interesting than us!

"Except to remark on how many different things must be known to constrain the final answer."

What would you estimate the probability of each thing being correct is?

Regarding this post and the complexity of value:

Taking a paperclip maximizer as a starting point, the machine can be divided up into two primary components: the value function, which dictates that more paperclips is a good thing, and the optimizer that increases the universe's score with respect to that value function. What we should aim for, in my opinion, is to become the value function to a really badass optimizer. If we build a machine that asks us how happy we are, and then does everything in its power to improve that rating (so long as it doesn't involve modifying our values or controlling our ability to report them), that is the only way we can build a machine that reliably encompasses all of our human values.

Any other route and we are only steering the future by proxy - via an approximation to our values that may be fatally flawed and make it impossible for us to regain control when things go wrong. Even if we could somehow perfectly capture all of our values in a single function, there is still the matter of how that value function is embedded via our perceptions, which may differ from the machine's, the fact that our values may continue to change over time and thereby invalidate that function, and the fact that we each have our own unique variation on those values to start with. So yes, we should definitely keep our hands on the steering wheel.

The analogy between the theory that humans behave like expected utility maximisers - and the theory that atoms behave like billiard balls could be criticised - but it generally seems quite appropriate to me.

In dealing with your example, I didn't "change the space of states or choices". All I did was specify a utility function. The input states and output states were exactly as you specified them to be. The agent could see what choices were available, and then it picked one of them - according to the maximum value of the utility function I specified.

The corresponding real world example is an agent that prefers Boston to Atlanta, Chicago to Boston, and Atlanta to Chicago. I simply showed how a utility maximiser could represent such preferences. Such an agent would drive in circles - but that is not necessarily irrational behaviour.

Of course much of the value of expected utility theory arises when you use short and simple utility functions - however, if you are prepared to use more complex utility functions, there really are very few limits on what behaviours can be represented.

The possibility of using complex utility functions does not in any way negate the value of the theory for providing a model of rational economic behaviour. In economics, the utility function is pretty fixed: maximise profit, with specified risk aversion and future discounting. That specifies an ideal which real economic agents approximate. Plugging in an arbitrary utility function is simply an illegal operation in that context.

@Jotaf: No, you misunderstood - guess I got double-transparent-deluded. I'm saying this:

  • Probability is subjectively objective
  • Probability is about something external and real (called truth)
  • Therefore you can take a belief and call it "true" or "false" without comparing it to another belief
  • If you don't match truth well enough (if your beliefs are too wrong), you die
  • So if you're still alive, you're not too stupid - you were born with a smart prior, so justified in having it
  • So I'm happy with probability being subjectively objective, and I don't want to change my beliefs about the lottery. If the paperclipper had stupid beliefs, it would be dead - but it doesn't, it has evil morals.

  • Morality is subjectively objective

  • Morality is about some abstract object, a computation that exists in Formalia but nowhere in the actual universe
  • Therefore, if you take a morality, you need another morality (possibly the same one) to assess it, rather than a nonmoral object
  • Even if there was some light in the sky you could test morality against, it wouldn't kill you for your morality being evil
  • So I don't feel on better moral ground than the paperclipper. It has human_evil morals, but I have paperclipper_evil morals - we are exactly equally horrified.

Jordan: "I imagine a distant future with just a smattering of paper clip maximizers -- having risen in different galaxies with slightly different notions of what a paperclip is -- might actually be quite interesting."

That's exactly how I imagine the distant future. And I very much like to point to the cyclic cellular automaton (java applet) as a visualization. Actually, I speculate that we live in a small part of the space-time continuum not yet eaten by a paper clip maximizer. Now you may ask: Why don't we see huge blobs of paper clip maximizers expanding on the night sky? My answer is that they are expanding with the speed of light in every direction.

Note: I abused the term paper clip maximizer somewhat. Originally I called these things Expanding Space Amoebae, but PCM is more OB.

@Eliezer: Can you expand on the "less ashamed of provincial values" part?

@Carl Shuman: I don't know about him, but for myself, HELL YES I DO. Family - they're just randomly selected by the birth lottery. Lovers - falling in love is some weird stuff that happens to you regardless of whether you want it, reaching into your brain to change your values: like, dude, ew - I want affection and tenderness and intimacy and most of the old interpersonal fun and much more new interaction, but romantic love can go right out of the window with me. Friends - I do value friendship; I'm confused; maybe I just value having friends, and it'd rock to be close friends with every existing mind; maybe I really value preferring some people to others; but I'm sure about this: I should not, and do not want to, worry more about a friend with the flu than about a stranger with cholera.

@Robin Hanson: HUH? You'd really expect natural selection to come up with minds who enjoy art, mourn dead strangers and prefer a flawed but sentient woman to a perfect catgirl on most planets?

This talk about "'right' means right" still makes me damn uneasy. I don't have more to show for it than "still feels a little forced" - when I visualize a humane mind (say, a human) and a paperclipper (a sentient, moral one) looking at each other in horror and knowing there is no way they could agree about whether using atoms to feed babies or make paperclips, I feel wrong. I think about the paperclipper in exactly the same way it thinks about me! Sure, that's also what happens when I talk to a creationist, but we're trying to approximate external truth; and if our priors were too stupid, our genetic line would be extinct (or at least that's what I think) - but morality doesn't work like probability, it's not trying to approximate anything external. So I don't feel so happier about the moral miracle that made us than about the one that makes the paperclipper.

Roko:

Not so fast. We like some of our evolved values at the expense of others. Ingroup-outgroup dynamics, the way we're most motivated only when we have someone to fear and hate: this too is an evolved value, and most of the people here would prefer to do away with it if we can.

The interesting part of moral progress is that the values etched into us by evolution don't really need to be consistent with each other, so as we become more reflective and our environment changes to force new situations upon us, we realize that they conflict with one another. The analysis of which values have been winning and which have been losing (in different times and places) is another fascinating one...

RH: "I have read and considered all of Eliezer's posts, and still disagree with him on this his grand conclusion. Eliezer, do you think the universe was terribly unlikely and therefore terribly lucky to have coughed up human-like values, rather than some other values?"

  • yes, it almost certainly was because of the way we evolved. There are two distinct events here:

  • A species evolves to intelligence with the particular values we have.

  • Given that a species evolves to intelligence with some particular values, it decides that it likes those values.

1 is an extremely unlikely event. 2 is essentially a certainty.

One might call this "the ethical anthopic argument"

I have read and considered all of Eliezer's posts, and still disagree with him on this his grand conclusion. Eliezer, do you think the universe was terribly unlikely and therefore terribly lucky to have coughed up human-like values, rather than some other values? Or is it only in the stage after ours where such rare good values were unlikely to exist?

What about "near-human" morals, like, say, Kzinti: Where the best of all possible words contains hierarchies, duels to the death, and subsentient females; along with exploration, technology, and other human-like activities. Though I find their morality repugnant for humans, I can see that they have the moral "right" to it. Is human morality, then, in some deep sense better than those?

It is better in the sense that it is ours. It is an inescapable quality of life as an agent with values embedded in a much greater universe that might contain other agents with other values, that ultimately the only thing that makes one particular set of values matter more to that agent is that those are the values that belong to that agent.

We happen to have as one of our values, to respect others' values. But this particular value happens to be self-contradictory when taken to its natural conclusion. To take it to its conclusion would be to say that nothing matters in the end, not even what we ourselves care about. Consider the case of an alien being whose values include disrespecting others' values. Is the human value placed on respecting others' values in some deep sense better than this being's?

At some point you have to stop and say, "Sorry, my own values take precedence over yours when they are incompatible to this degree. I cannot respect this value of yours." And what gives you the justification to do this? Because it is your choice, your values. Ultimately, we must be chauvinists on some level if we are to have any values at all. Otherwise, what's wrong with a sociopath who murders for joy? How can we say that their values are wrong, except to say that their values contradict our own?

Is human morality, then, in some deep sense better than those?

No. The point of meta-ethics as outlined on LW is that there is no "deeper sense", no outside perspective from which to judge moral views against one another.

I'm not sure if 'fragile' is the right word, removing one component might be devastating, but in my opinion, that more reflects on the importance of each piece, and not so much on the fragility of the actual system. The way I see it, it's something like a tower with 4 large beams for support, if one takes out a single piece, it would be worse than, say if one removed a piece from a tower with 25 smaller beams to support it.

But other than that, thank you very much for the informative article.

This is the key point on which I disagree with Eliezer. I don't disagree with what he literally says here, but with what he implies and what he concludes. The key context he isn't giving here is that what he says here only applies fully to a hard-takeoff AI scenario. Consider what he says about boredom:

Surely, at least boredom has to be a universal value. It evolved in humans because it's valuable, right? So any mind that doesn't share our dislike of repetition, will fail to thrive in the universe and be eliminated...

If you are familiar with the difference between instrumental values and terminal values, and familiar with the stupidity of natural selection, and you understand how this stupidity manifests in the difference between executing adaptations versus maximizing fitness, and you know this turned instrumental subgoals of reproduction into decontextualized unconditional emotions...

The things he lists here make an argument that, in the absence of competition, an existing value system can drift into one that doesn't mind boredom. But none of them address the argument he's supposedly addressing, that bored creatures will fail to compete and be eliminated. I infer that he dismisses that argument because the thing he's talking about having value-drift, the thing that appears in the next line, is a hard-takeoff AI that doesn't have to compete.

The right way to begin asking whether minds can evolve not to be bored is to sample minds that have evolved, preferably independently, and find how many of them mind boredom.

Earth has no independently-evolved minds that I know of; all intelligent life is metazoans, and all metazoans are intelligent. This does show us, however, that intelligence is an evolutionary ratchet. Unlike other phenotypic traits, it doesn't disappear from any lines after evolving. That's remarkable, and relevant: Intelligence doesn't disappear. So we can strike off "universe of mindless plankton" from our list of moderately-probable fears.

Some metazoans live lives of extreme boredom. For instance, spiders, sea urchins, molluscs, most fish, maybe alligators. Others suffer physically from boredom: parrots, humans, dogs, cats. What distinguishes these two categories?

Animals that don't become bored are generally small, small-brained, have low metabolisms, short lifespans, a large number of offspring, and live in a very narrow range of environmental conditions. Animals that become bored are just the opposite. There are exceptions: fish and alligators have long lifespans, and alligators are large. But we can see how these traits conspire to produce an organism that can afford to be bored:

Small-brained, short lifespan, large number of offspring, narrow range of environmental conditions: These are all conditions under which it is better for the species to adapt to the environment by selection or by environment-directed development than by learning. Insects and nematodes can't learn much except via selection; their brains appear to have identical neuron number and wiring within a species. Alligator brains weigh about 8 grams.

Low metabolism merely correlates with low activity, which is how I identified most of these organisms, equating "not moving" with "not minding boredom." Small correlates with short lifespan and small-brained.

These things require learning: long lifespan, a changing environment, and minimizing reproduction time. If an organism will need to compete in a changing environment or across many different environments, as birds and mammals do, they'll need to learn. If a mother's knowledge is encoded in a form that she can't transmit to her children, they'll need to learn.

This business of having children is difficult to translate to a world of AIs. But the business of adaptation is clear. Given that active, curious, intelligent, environment-transforming minds already exist, and given continued competition, only minds that can adapt to rapid change will be able to remain powerful. So we can also strike "world dominated by beings who build paperclips" off our list of fears, provided those conditions are maintained. All we need do is ensure continued competition. Intelligence will not de-evolve, and intelligence will keep the environment changing rapidly enough that constant learning will be necessary, and so will be boredom.

The space of possible minds is large. The space of possible evolved minds is much smaller. The space of possible minds co-evolved with competition is much smaller than that. The space X of possible co-evolved minds capable of dominating the human race is much smaller than that.

Let Y = the set of value systems that might be produced from trying to enumerate human "final" values and put them in a utility function which will be evaluated by a single symbolic logic engine, incorporating all types of values above the level of the gene (body, mind, conscious mind, kin group, social group, for starters), with context-free set-membership functions that classify percepts into a finite set of atomic symbols prior to considering the context those symbols will be used in, and that will be designed to prevent final values from changing. I take that as roughly Eliezer's approach.

Let f(Z) be a function over sets of possible value systems, which tells how many of them are not repugnant to us.

My estimation is that f(X) / |X| >> f(Y) / |Y|. Therefore, the best approach is not to try to enumerate human final values and code them into an AI, but to study how co-evolution works, and what conditions give rise to the phenomena we value such as intelligence, consciousness, curiosity, and affection. Then try to direct the future to stay within those conditions.