Secrets of the eliminati

Anyone who does not believe mental states are ontologically fundamental - ie anyone who denies the reality of something like a soul - has two choices about where to go next. They can try reducing mental states to smaller components, or they can stop talking about them entirely.

In a utility-maximizing AI, mental states can be reduced to smaller components. The AI will have goals, and those goals, upon closer examination, will be lines in a computer program.

But in the blue-minimizing robot, its "goal" isn't even a line in its program. There's nothing that looks remotely like a goal in its programming, and goals appear only when you make rough generalizations from its behavior in limited cases.

Philosophers are still very much arguing about whether this applies to humans; the two schools call themselves reductionists and eliminativists (with a third school of wishy-washy half-and-half people calling themselves revisionists). Reductionists want to reduce things like goals and preferences to the appropriate neurons in the brain; eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

I took a similar tack asking ksvanhorn's question in yesterday's post - how can you get a more accurate picture of what your true preferences are? I said:

I don't think there are true preferences. In one situation you have one tendency, in another situation you have another tendency, and "preference" is what it looks like when you try to categorize tendencies. But categorization is a passive and not an active process: if every day of the week I eat dinner at 6, I can generalize to say "I prefer to eat dinner at 6", but it would be non-explanatory to say that a preference toward dinner at 6 caused my behavior on each day. I think the best way to salvage preferences is to consider them as tendencies currently in reflective equilibrium.


A more practical example: when people discuss cryonics or anti-aging, the following argument usually comes up in one form or another: if you were in a burning building, you would try pretty hard to get out. Therefore, you must strongly dislike death and want to avoid it. But if you strongly dislike death and want to avoid it, you must be lying when you say you accept death as a natural part of life and think it's crass and selfish to try to cheat the Reaper. And therefore your reluctance to sign up for cryonics violates your own revealed preferences! You must just be trying to signal conformity or something.

The problem is that not signing up for cryonics is also a "revealed preference". "You wouldn't sign up for cryonics, which means you don't really fear death so much, so why bother running from a burning building?" is an equally good argument, although no one except maybe Marcus Aurelius would take it seriously.

Both these arguments assume that somewhere, deep down, there's a utility function with a single term for "death" in it, and all decisions just call upon this particular level of death or anti-death preference.

More explanatory of the way people actually behave is that there's no unified preference for or against death, but rather a set of behaviors. Being in a burning building activates fleeing behavior; contemplating death from old age does not activate cryonics-buying behavior. People guess at their opinions about death by analyzing these behaviors, usually with a bit of signalling thrown in. If they desire consistency - and most people do - maybe they'll change some of their other behaviors to conform to their hypothesized opinion.

One more example. I've previously brought up the case of a rationalist who knows there's no such thing as ghosts, but is still uncomfortable in a haunted house. So does he believe in ghosts or not? If you insist on there being a variable somewhere in his head marked $belief_in_ghosts = (0,1) then it's going to be pretty mysterious when that variable looks like zero when he's talking to the Skeptics Association, and one when he's running away from a creaky staircase at midnight.

But it's not at all mysterious that the thought "I don't believe in ghosts" gets reinforced because it makes him feel intelligent and modern, and staying around a creaky staircase at midnight gets punished because it makes him afraid.

Behaviorism was one of the first and most successful eliminationist theories. I've so far ignored the most modern and exciting eliminationist theory, connectionism, because it involves a lot of math and is very hard to process on an intuitive level. In the next post, I want to try to explain the very basics of connectionism, why it's so exciting, and why it helps justify discussion of behaviorist principles.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 12:36 PM
Select new highlight date
All comments loaded

I wonder:

if you had an agent that obviously did have goals (let's say, a player in a game, whose goal is to win, and who plays the optimal strategy) could you deduce those goals from behavior alone?

Let's say you're studying the game of Connect Four, but you have no idea what constitutes "winning" or "losing." You watch enough games that you can map out a game tree. In state X of the world, a player chooses option A over other possible options, and so on. From that game tree, can you deduce that the goal of the game was to get four pieces in a row?

I don't know the answer to this question. But it seems important. If it's possible to identify, given a set of behaviors, what goal they're aimed at, then we can test behaviors (human, animal, algorithmic) for hidden goals. If it's not possible, that's very important as well; because that means that even in a simple game, where we know by construction that the players are "rational" goal-maximizing agents, we can't detect what their goals are from their behavior.

That would mean that behaviors that "seem" goal-less, programs that have no line of code representing a goal, may in fact be behaving in a way that corresponds to maximizing the likelihood of some event; we just can't deduce what that "goal" is. In other words, it's not as simple as saying "That program doesn't have a line of code representing a goal." Its behavior may encode a goal indirectly. Detecting such goals seems like a problem we would really want to solve.

From that game tree, can you deduce that the goal of the game was to get four pieces in a row?

One method that would work for this example is to iterate over all possible goals in ascending complexity, and check which one would generate that game tree. How to apply this idea to humans is unclear. See here for a previous discussion.

Human games (of the explicit recreational kind) tend to have stopping rules isomorphic with the game's victory conditions. We would typically refer to those victory conditions as the objective of the game, and the goal of the participants. Given a complete decision tree for a game, even a messy stochastic one like Canasta, it seems possible to deduce the conditions necessary for the game to end.

An algorithm that doesn't stop (such as the blue-minimising robot) can't have anything analogous to the victory condition of a game. In that sense, its goals can't be analysed in the same way as those of a Connect Four-playing agent.

What I've heard is that, for an intelligent entity, it's easier to predict what will happen based on their goals rather than what they do.

For example, with the connect four game, if you manage to figure out that they always seem to get four in a row, and you never do when you play against them, before you can figure out what their strategy is, you know their goal.

Although you might have just identified an instrumental subgoal.

I suspect that "has goals" is ultimately a model, rather than a fact. To the extent that an agent's behavior maximizes a particular function, that agent can be usefully modeled as an optimizer. To the extent that an agent's behavior exhibits signs of poor strategy, such as vulnerability to dutch books, that agent may be better modeled as an algorithm-executer.

This suggests that "agentiness" is strongly tied to whether we are smart enough to win against it.

Reductionists want to reduce things like goals and preferences to the appropriate neurons in the brain; eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

Surely you mean that eliminativists take actions which, in their typical contexts, tend to result in proving that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

Surely you mean that there are just a bunch of atoms which, when interpreted as a human category, can be grouped together to form a being classifiable as "an eliminativist".

eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

Just because something only exists at high levels of abstraction doesn't mean it's not real or explanatory. Surely the important question is whether humans genuinely have preferences that explain their behaviour (or at least whether a preference system can occasionally explain their behaviour - even if their behaviour is truly explained by the interaction of numerous systems) rather than how these preferences are encoded.

The information in a jpeg file that indicates a particular pixel should be red cannot be analysed down to a single bit that doesn't do anything else, that doesn't mean that there isn't a sense in which the red pixel genuinely exists. Preferences could exist and be encoded holographically in the brain. Whether you can find a specific neuron or not is completely irrelevant to their reality.

Just because something only exists at high levels of abstraction doesn't mean it's not real or explanatory.

I have often stated that, as a physicalist, the mere fact that something does not independently exist -- that is, it has no physically discrete existence -- does not mean it isn't real. The number three is real -- but does not exist. It cannot be touched, sensed, or measured; yet if there are three rocks there really are three rocks. I define "real" as "a pattern that proscriptively constrains that which exists". A human mind is real; but there is no single part of your physical body you can point to and say, "this is your mind". You are the pattern that your physical components conform to.

It seems very often that objections to reductionism are founded in a problem of scale: the inability to recognize that things which are real from one perspective remain real at that perspective even if we consider a different scale.

It would seem, to me, that "eliminativism" is essentially a redux of this quandary but in terms of patterns of thought rather than discrete material. It's still a case of missing the forest for the trees.

But if whenever I eat dinner at 6I sleep better than when eating dinner at 8, can I not say that I prefer dinner at 6 over dinner at 8? Which would be one step over saying I prefer to sleep well than not.

I think we could have a better view if we consider many preferences in action. Taking your cyonics example, maybe I prefer to live (to a certain degree), prefer to conform, and prefer to procrastinate. In the burning-building situation, the living preference is playing more or less alone, while in the cryonics situation, preferences interact somewhat like oppsite forces and then motion happens in the winning side. Maybe this is what makes preferences seem like varying?

More explanatory of the way people actually behave is that there's no unified preference for or against death, but rather a set of behaviors. Being in a burning building activates fleeing behavior; contemplating death from old age does not activate cryonics-buying behavior.

YES. This so much.

Contemplating death from old age does activate fleeing behavior, though (at least in me), which is another of those silly bugs in the human brain. If I found a way to fix it to activate cryonics-buying behavior instead, I would probably have found a way to afford life insurance by now.

Eliminativism is all well and good if all one wants to do is predict. However, it doesn't help answer questions like "What should I do?", or "What utility function should we give the FAI?"

The same might be said of evolutionary psychology. In which case I would respond that evolutionary psychology helped us stop thinking in a certain stupid way.

Once, we thought that men were attracted to pretty women because there was some inherent property called "beauty", or that people helped their neighbors because there was a universal Moral Law to which all minds would have access. Once it was the height of sophistication to argue whether people were truly good but corrupted by civilization, or truly evil but restrained by civilization.

Evolutionary psychology doesn't answer "What utility function should we give the FAI?", but it gives good reasons to avoid the "solution": 'just tell it to look for the Universal Moral Law accessible to all minds, and then do that.' And I think a lot of philosophy progresses by closing off all possible blind alleys until people grudgingly settle on the truth because they have no other alternative.

I am less confident in my understanding of eliminativism than of evo psych, so I am less willing to speculate on it. But since one common FAI proposal is "find out human preferences, and then do those", if it turns out human preferences don't really exist in a coherent way, that sounds like an important thing to know.

I think many people have alluded to this problem before, and that the people seriously involved in the research don't actually expect it to be that easy, but a clear specification of all the different ways in which it is not quite that easy is still useful. The same is true for "what should I do?"

Interesting post throughout, but don't you overplay your hand a bit here?

There's nothing that looks remotely like a goal in its programming, [...]

An IF-THEN piece of code comparing a measured RGB value to a threshold value for firing the laser would look at least remotely like a goal to my mind.