Who do you think invented Friendly AI?

You haven't invented Friendly AI. You've created a name for a concept you can only vaguely describe and cannot define operationally.

Who do you think taught you the technique?

Isn't just a bit presumptuous to conclude you're the first to teach the technique?

Followup toEmpty Labels

In the game Taboo (by Hasbro), the objective is for a player to have their partner guess a word written on a card, without using that word or five additional words listed on the card.  For example, you might have to get your partner to say "baseball" without using the words "sport", "bat", "hit", "pitch", "base" or of course "baseball".

As soon as I see a problem like that, I at once think, "An artificial group conflict in which you use a long wooden cylinder to whack a thrown spheroid, and then run between four safe positions."  It might not be the most efficient strategy to convey the word 'baseball' under the stated rules - that might be, "It's what the Yankees play" - but the general skill of blanking a word out of my mind was one I'd practiced for years, albeit with a different purpose.

Yesterday we saw how replacing terms with definitions could reveal the empirical unproductivity of the classical Aristotelian syllogism.  All humans are mortal (and also, apparently, featherless bipeds); Socrates is human; therefore Socrates is mortal.  When we replace the word 'human' by its apparent definition, the following underlying reasoning is revealed:

All [mortal, ~feathers, biped] are mortal;
Socrates is a [mortal, ~feathers, biped];
Therefore Socrates is mortal.

But the principle of replacing words by definitions applies much more broadly:

Albert:  "A tree falling in a deserted forest makes a sound."
Barry:  "A tree falling in a deserted forest does not make a sound."

Clearly, since one says "sound" and one says "not sound", we must have a contradiction, right?  But suppose that they both dereference their pointers before speaking:

Albert:  "A tree falling in a deserted forest matches [membership test: this event generates acoustic vibrations]."
Barry:  "A tree falling in a deserted forest does not match [membership test: this event generates auditory experiences]."

Now there is no longer an apparent collision—all they had to do was prohibit themselves from using the word sound. If "acoustic vibrations" came into dispute, we would just play Taboo again and say "pressure waves in a material medium"; if necessary we would play Taboo again on the word "wave" and replace it with the wave equation.  (Play Taboo on "auditory experience" and you get "That form of sensory processing, within the human brain, which takes as input a linear time series of frequency mixes...")

But suppose, on the other hand, that Albert and Barry were to have the argument:

Albert:  "Socrates matches the concept [membership test: this person will die after drinking hemlock]."
Barry:  "Socrates matches the concept [membership test: this person will not die after drinking hemlock]."

Now Albert and Barry have a substantive clash of expectations; a difference in what they anticipate seeing after Socrates drinks hemlock.  But they might not notice this, if they happened to use the same word "human" for their different concepts.

You get a very different picture of what people agree or disagree about, depending on whether you take a label's-eye-view (Albert says "sound" and Barry says "not sound", so they must disagree) or taking the test's-eye-view (Albert's membership test is acoustic vibrations, Barry's is auditory experience).

Get together a pack of soi-disant futurists and ask them if they believe we'll have Artificial Intelligence in thirty years, and I would guess that at least half of them will say yes.  If you leave it at that, they'll shake hands and congratulate themselves on their consensus.  But make the term "Artificial Intelligence" taboo, and ask them to describe what they expect to see, without ever using words like "computers" or "think", and you might find quite a conflict of expectations hiding under that featureless standard word.  Likewise that other term.  And see also Shane Legg's compilation of 71 definitions of "intelligence".

The illusion of unity across religions can be dispelled by making the term "God" taboo, and asking them to say what it is they believe in; or making the word "faith" taboo, and asking them why they believe it. Though mostly they won't be able to answer at all, because it is mostly profession in the first place, and you cannot cognitively zoom in on an audio recording.

When you find yourself in philosophical difficulties, the first line of defense is not to define your problematic terms, but to see whether you can think without using those terms at all.  Or any of their short synonyms.  And be careful not to let yourself invent a new word to use instead.  Describe outward observables and interior mechanisms; don't use a single handle, whatever that handle may be.

Albert says that people have "free will".  Barry says that people don't have "free will".  Well, that will certainly generate an apparent conflict.  Most philosophers would advise Albert and Barry to try to define exactly what they mean by "free will", on which topic they will certainly be able to discourse at great length.  I would advise Albert and Barry to describe what it is that they think people do, or do not have, without using the phrase "free will" at all.  (If you want to try this at home, you should also avoid the words "choose", "act", "decide", "determined", "responsible", or any of their synonyms.)

This is one of the nonstandard tools in my toolbox, and in my humble opinion, it works way way better than the standard one.  It also requires more effort to use; you get what you pay for.

 

Part of the sequence A Human's Guide to Words

Next post: "Replace the Symbol with the Substance"

Previous post: "Empty Labels"

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 1:27 AM
Select new highlight date
All comments loaded

But that's not the rationalist's version of the game. The rationalist's game involves seeing at a lower level of detail. Not thinking up synonyms and keywords that weren't on the card.

As g mentions, your description also describes rounders. Even if you defined all the words in your description ever more precisely, you could still be thinking of a different game.

Presumably at some point you would discover that, when your expectations of what was going to happen differed. Depending on what you're discussing, that could happen very soon, or not for a long time.

How does the rationalist in the game know when to stop defining and start adding characteristics/keywords?

Easy PK. An optimization process that brings the universe towards the target of shared strong attractors in human high-level reflective aspiration.

Sounds interesting. We must now verify if it works for useful questions.

Could someone explain what FAI is without using the words "Friendly", or any synonyms?

An AI which acts toward whatever the observer deems to be beneficial to the human condition. It's impossible to put it into falsifiable criteria if you can't define what is (and on what timescale?) beneficial to the human race. And I'm pretty confident nobody knows what's beneficial to the human condition on the longest term, because that's the problem we're building the FAI to solve.

In the end, we will have to build an AI as best we can and trust its judgement. Or not build it. It's a cosmic gamble.

This strategy can't be that nonstandard, as it is the strategy I've always used when a conversation gets stuck on some word. But now that I think about it, people usually aren't that interesting in following my lead in this direction, so it isn't very common either.

In one class in high school, we were supposed to make our classmates guess a word using hand gestures. I drew letters in the air.

Who do you think invented Friendly AI?

You haven't invented Friendly AI. You've created a name for a concept you can only vaguely describe and cannot define operationally.

Who do you think taught you the technique?

Isn't just a bit presumptuous to conclude you're the first to teach the technique?

Yeah, but when playing actual Taboo "rational agents should WIN" (Yudkowsky, E.) and therefore favour "nine innings and three outs" over your definition (which would also cover some related-but-different games such as rounders, I think). I suspect something like "Babe Ruth" would in fact lead to a quicker win.

None of which is relevant to your actual point, which I think a very good one. I don't think the tool is all that nonstandard; e.g., it's closely related to the positivist/verificationist idea that a statement has meaning only if it can be paraphrased in terms of directly (ha!) observable stuff.

Consider a hypothetical debate between two decision theorists who happen to be Taboo fans:

A: It's rational to two-box in Newcomb's problem.
B: No, one-boxing is rational.
A: Let's taboo "rational" and replace it with math instead. What I meant was that two-boxing is what CDT recommends.
B: Oh, what I meant was that one-boxing is what EDT recommends.
A: Great, it looks like we don't disagree after all!

What did these two Taboo'ers do wrong, exactly?

They stopped talking after they taboo'd "rational". Both can agree that CDT recommends one thing, and EDT recommends another, but if you dropped them into Omega's lap right now they would still disagree over which decision theory to use. They replaced the word with their own respective spins on its meaning, but they failed to address the real hidden query in the label: Is this the best course of action for a reasonable person to take?

They still haven't explained why A two-boxes and B one-boxes.

A: Let's taboo "rational" and replace it with math instead. What I meant was that two-boxing yields more money.
B: Oh, what I meant was that one-boxing yields more money.
A: We don't disagree about what "more money" means, do we?
B: Don't think so. Okay, so...

Oh, yes, I forgot to mention one of the most important rules in Rationalist Taboo:

You can't Taboo math.

Stating an equation is always allowed.

But of course, you can still point to an element of a mathematical formula and ask "What does this term apply to? Answer without saying..."

Y'know, the 'Taboo game' seems like an effective way to improve the clarity of meaning for individual words - if you have enough clear and precise words to describe those particular words in the first place.

If there isn't a threshold number of agreed-upon meanings, the language doesn't have enough power for Taboo to work. You can't improve one word without already having a suite of sufficiently-good words to work with.

The game can keep a language system above that minimum threshold, but can't be used to bootstrap the system above that threshold. If you're just starting out, you need to use different methods.

In fiction writing, this is known as Show Don't Tell. Instead of using all-encompassing, succing abstractions, to present the reader with predigested conclusion (Character X is a jerk, Place Y is scary, Character Z is afraid), it is encouraged to show the reader evidence of X's jerkiness, Y's scariness, or Z's fear, and leave it to them to infer from said evidence what is going on. Effectively, what one is doing is tabooing judgments and subjective perceptions such as "jerky", "scary" or "afraid", and replace them with a list of jerky actions, scary traits, and symptoms of fear.

Hollerith: I do not have a high degree of confidence that I know how to choose wisely, but (at least until I become aware of the existence of nonhuman intelligent beings) I do know that if there exists wisdom enough to choose wisely, that wisdom resides among the humans. So, I will choose to steer the future into a possible world in which a vast amount of rational attention is focused on the humans...

and lo the protean opaque single thing was taken out of one box and put into another

PK: Thank you. However merely putting the technique into the "toolbox" and never looking back is not enough. We must go further. This technique should be used at which point we will either reach new insights or falsely the method. Would you care to illustrate what FAI means to you Eliezer?(others are also invited to do so)

Don't underestimate me so severely. You think I don't know how to define "Friendly" without using synonyms? Who do you think taught you the technique? Who do you think invented Friendly AI?

But I came to understand, after a period of failure in trying to explain, that I could not tell people where I had climbed to, without giving them the ladder. People just went off into Happy Death Spirals around their favorite things, or proffered instrumental goals instead of terminal goals, or made wishes to genies, or said "Why don't you just build an AI to do the right thing instead of this whole selfish human-centered business?", or...

I've covered some of the ladder I used to climb to Friendly AI. But not all of it. So I'm not going to try to explain FAI as yet; more territory left to go.

But you (PK) are currently applying the Taboo technique correctly, which is the preliminary path I followed at the analogous point in my own reasoning; and I'm interested in seeing you follow it as far as you can. Maybe you can invent the rest of the ladder on your own. You're doing well so far. Maybe you'll even reach a different but valid destination.

The game is not over! Michael Vassar said: "[FAI is ..] An optimization process that brings the universe towards the target of shared strong attractors in human high-level reflective aspiration."

For the sake of not dragging out the argument too much lets assume I know what an optimization process and a human is.

Whats are "shared strong attractors"? You cant use the words "shared", "strong", "attractor" or any synonyms.

What's a "high-level reflective aspiration"? You can't use the words "high-level", "reflective ", "aspiration" or any synonyms.


Caledonian said: "Then declaring the intention to create such a thing takes for granted that there are shared strong attractors."

We can't really say if there are "shared strong attractors" one way or the other until we agree on what that means. Otherwise it's like arguing about wither falling trees make "sound" in the forest. We must let the taboo game play out before we start arguing about things.

This is one of the nonstandard tools in my toolbox, and in my humble opinion, it works way way better than the standard one.

Yudkowsky, 2008.

To bring out the role of pointlessness, it is worth noting that when faced with a potentially verbal dispute we often ask: what turns on this?

...

Typically, a broadly verbal dispute is one that can be resolved by attending to language and resolving metalinguistic di fferences over meaning. For example, these disputes can sometimes be resolved by settling the facts about the meaning of key terms in our community...[which] may take substantive empirical investigation.

Distinguishing senses of a key term is particularly dicult when these senses do not correspond to clear explicit definitions. More generally, we are not always able to give a good articulation of what our terms mean, and it is often far from obvious whether or not two speakers disagree about meaning. So it is useful to have a method that does not directly depend on the analysis of meaning in this way.

An alternative heuristic for detecting and dealing with verbal disputes is what we might call the method of elimination. Here, the key idea is that one eliminates use of the key term, and one attempts to determine whether any substantive dispute remains.

...

The method of elimination can be applied to many disputes in philosophy. To illustrate a possible use, I will start with an issue that has often been accused of giving rise to verbal disputes, and in which proponents are relatively sophisticated about these issues: the question of free will and determinism.

Suppose that a compatibilist says ‘Free will is compatible with determinism’, and an incompatibilist says ‘No, free will is not compatible with determinism’. A challenger may suggest that the dispute is verbal, and that the dispute arises only because the parties mean di fferent things by ‘free will’.

We can then apply the method of elimination: bar the term ‘free will’, and see whether there are residual disputes. There are various possible outcomes, depending on the compatibilist and the incompatibilist in question. One possible outcome is that the parties will disagree over a sentence such as ‘Moral responsibility is incompatible with determinism’ as part of the original dispute. If so, this is a prima facie indication that the dispute is non-verbal—though one may want to reapply the method to ‘moral responsibility’ to be sure. Another possible outcome is that there will be no such residual disagreement. For example, the parties might agree on “Determinism is compatible with degree D of moral responsibility”, “Determinism is not compatible with a higher degree D’ of moral responsibility” (for example, a degree involving desert that warrants retributive punishment), and other relevant sentences. This outcome is a prima facie indication that the dispute is verbal, resting on a disagreement about whether the meaning of “free will” requires more than degree D of moral responsibility.

...

In the Socratic tradition, the paradigmatic philosophical questions take the form “What is X?”. These questions are the focus of many philosophical debates today: What is free will? What is knowledge? What is justification? What is justice? What is law? What is confirmation? What is causation? What is color? What is a concept? What is meaning? What is action? What is life? What is logic? What is self-deception? What is group selection? What is science? What is art? What is consciousness? And indeed: What is a verbal dispute? Despite their traditional centrality, disputes over questions like this are particularly liable to involve verbal disputes. So these disputes are particularly good candidates for the method of elimination. For disputes of this form, we can apply a special case of the method, which we can call the subscript gambit.

For example, in the dispute over free will, one party might say “Freedom is the ability to do what one wants”, while the other says “Freedom is the ability to ultimately originate one’s choices”. We can then introduce “freedom1” and “freedom2” for the two right-hand-sides here, and ask: do the parties di er over freedom1 and freedom2? Perhaps they will disagree over “Freedom2 is required for moral responsibility”, or over “Freedom1 is what we truly value”. If so, this clarifies the debate...

...

The method of elimination can be useful even when a debate is not verbal. If two philosophers have conceptual mastery of exactly the same concept of physicalism, but one asserts ‘Physicalism is true’ and the other rejects it, then asking them to state relevant disagreements without using the term ‘physicalism’ is nevertheless likely to clarify what is at issue. Likewise, the method of elimination can usefully be applied even to philosophical assertions made by a single party, not in the context of a dispute. If a compatibilist is asked to state their thesis, or relevant aspects of their thesis, without using the term ‘free will’, this may well clarify that thesis for an audience and may help boil the thesis down to the key underlying issues.

Of course the parties might disagree on whether physicalism1 or physicalism2 best fits the use of “physicalism” in a certain community, or over whether semantics1 or semantics2 best fits the use of “semantics” in a given community. To resolve these issues of usage, one can do sociology, anthropology, linguistics, or experimental philosophy. Once one has agreed on the first-order properties of physicalism1 and physicalism2, it is hard to see that anything else in the first-order domain will rest on these questions of usage.

This picture leads to a certain deflationism about the role of conceptual analysis (whether a priori or a posteriori), and about the interest of questions such as “What is X?” or “What is it to be X?” Some component of these questions is inevitably terminological, and the non-terminological residue can be found without using ‘X’.

...

The model is not completely deflationary about conceptual analysis. On this model, the analysis of words and the associated concepts is relatively unimportant in understanding a first-order domain. But it is still interesting and important to analyze conceptual spaces: the spaces of concepts (and of the entities they pick out) that are relevant to a domain, determining which concepts can play which roles, what the relevant dimensions of variation are, and so on.

...there are important normative questions about what expressions ought to mean. These questions comprise what Peirce called “the ethics of terminology”. Ideal agents might be unaffected by which terms are used for which concepts, but for nonideal agents such as ourselves, the accepted meaning for a key term will make a difference to which concepts are highlighted, which questions can easily be raised, and which associations and inferences are naturally made. Following the “ameliorative” project of Haslanger (2005), one might argue that expressions such as ‘gender’ and ‘race’ play a certain practical role for us, and that role is played better by some conceptions than others, so ‘race’ and ‘gender’ ought to have certain meanings. The manifestly verbal dispute among astronomers about whether Pluto is a planet is best understood as a debate in the ethics of terminology: given the scientific and cultural roles that ‘planet’ plays, should ‘planet’ be used to include Pluto or exclude it? In philosophy, ‘meaning’ functions as something of an honorific (it attracts people to its study), so if one thinks that meaning1 is more important that meaning2, one might hold that ‘meaning’ ought to be used for meaning1.

Chalmers, 2009. (Emphasis added.)

Anyone want to assign a probability to Chalmers having been inspired by this post?

Also: Yudkowsky's informal writing style is a significant improvement over formal academic writing when it comes to teaching rationality. Had I read only this essay by Chalmers, I doubt the lesson would have clicked as well as it did from reading this post.

I'll just chime in at this point to note that PK's application of the technique is exactly correct.

Julian Morrison said: "FAI is: a search amongst potentials which will find the reality in which humans best prosper." What is "prospering best"? You can't use "prospering", "best" or any synonyms.

Let's use the Taboo method to figure out FAI.

I'd have to agree with PK's protest. This isn't Hasbro's version of the game; you're not trying to help someone figure out that you're talking about a "Friendly AI" without using five words written on a card.

Oh, and there's no time limit.

I'm not trying to under/over/middle-estimate you, only theories which you publicly write about. Sometimes I'm a real meanie with theories, shoving hot pokers into to them and all sorts of other nasty things. To me theories have no rights.

I know. But come on, you don't think the thought would ever have occurred to me, "I wonder if I can define Friendly AI without saying 'Friendly'?" It's not as if I invented the phrase first and only then thought to ask myself what it meant.

Moral, right, correct, wise, are all fine words for humans to use, but you have to break something down into ones and zeroes before it can be programmed. In a sense, the whole art of AGI is playing rationalist-Taboo with all words that refer to aspects of mind.

So are you saying that if at present you played a taboo game to communicate what "FAI" means to you, the effort would fail? I am interested in the intricacies of the taboo game including it's failure modes.

It has an obvious failure mode if you try to communicate something too difficult without requisite preliminaries, like calculus without algebra. Taboo isn't magic, it won't let you cross a gap of months in an hour.

I actually already have a meaning for FAI in my head. It seems different from the way other people try to describe it. It's more concrete but seems less virtuous. It's something along the lines of "obey me".

Really? That's your concept of how to steer the future of Earth-originating intelligent life? "Shut up and do what I say"? Would you want someone else to follow that strategy, say Archimedes of Syracuse, if the future fell into their hands?

So you play Taboo well, but you don't seem to see the difficulties that require a solution deeper than "obey me", and it's hard to explain an answer before explaining the question. Just like if you don't know about the game of Taboo, someone answers "Just build an AI to do the nice thing!"

You should consider looking for problems and failure modes in your own answer, rather than waiting for someone else to do it. What could go wrong if an AI obeyed you?

Suppose you learn of a powerful way to steer the future into any target you choose as long as that target is specified in the language of mathematics or with the precision needed to write a computer program. What target to choose? One careful and thoughtful choice would go as follows. I do not have a high degree of confidence that I know how to choose wisely, but (at least until I become aware of the existence of nonhuman intelligent beings) I do know that if there exists wisdom enough to choose wisely, that wisdom resides among the humans. So, I will choose to steer the future into a possible world in which a vast amount of rational attention is focused on the humans, on human knowledge and on the potential that the humans have for affecting the far future. This vast inquiry will ask not only what future the humans would create if the humans have the luxury of avoiding unfortunate circumstances that no serious sane human observer would want the humans to endure, but also what future would be created by whatever intelligent agents ("choosers") the humans would create for the purpose of creating the future if the humans had the luxury . . . and also what future would be created by whatever choosers would be created by whatever choosers the humans would create . . . This "looping back" can be repeated many times.

I managed to avoid "desire", "want" and "volition". Unfortunately I only have time to write one of these today. I would do well to write a dozen.

Good point, especially since the most common words become devalued or politicized ("surge", "evil", "terror" &c.) but...

The existence of this game surprised me, when I discovered it. Why wouldn't you just say "An artificial group conflict in which you use a long wooden cylinder to whack a thrown spheroid, and then run between four safe positions?"

So what was your score?

(Did you cut your enemy?)

This method of elimination can be useful to both verbal disagreements (where the real debate is only over terminology) non-verbal disagreements (where parties fundamentally disagree about things themselves, and not just labels). Besides separating the two to clarify the real disagreement, it can also be usefully applied to one’s own internal dialogue.

However, how do we know when to apply this technique? With external debates, it is easy enough to suspect when a disagreement is only verbal, or when the terms argued over have constituent parts. These might be of limited help if one’s internal logic differs notably from a similar line of reasoning in a book or other non-verbal source of information. However, when one is considering completely new ideas, what cues might cause us to use this method? As Yudkowsky points out, this method is much more cognitively intensive than simply defining terms, so it is necessary to use it sparingly rather than all the time.

One cue might be that a large portion of one’s argument hinges around one particular word or term. Another might simply be noticing that one is using a term in slightly different contexts, such as using the word “rational” with regard to both economics and morality. A morally rational being might be a philanthropist economically rather than an investor trying to make money. Similarly, an economically rational being might never tip waiters, or might favor Enron-style economics.

A key term or concept such as these should be examined with all available tools in one’s toolbox. These include defining the term formally, explaining things without using the term and its synonyms, identifying the constituent components of the term, and asking if there are measureable differences between various definitions. The goal is to find any inconsistencies in the way one is using the term.

@Richard Hollerith: Skipping all the introductory stuff to the part which tries to define FAI(I think), I see two parts. Richard Hollerith said:

"This vast inquiry[of the AI] will ask not only what future the humans would create if the humans have the luxury of [a)] avoiding unfortunate circumstances that no serious sane human observer would want the humans to endure, but also [b)] what future would be created by whatever intelligent agents ("choosers") the humans would create for the purpose of creating the future if the humans had the luxury"

a) What's a "serious sane human observer"? Taboo the words and synonyms. What are "unfortunate circumstances" that s/he would like to avoid? Taboo...

b)What is "the future humans would chose for the purpose of creating the future"? In what way exactly would they "chose" it? Taboo...

Good luck :-)

Eliezer Yudkowsky said: "Don't underestimate me so severely. You think I don't know how to define "Friendly" without using synonyms? Who do you think taught you the technique? Who do you think invented Friendly AI?"

I'm not trying to under/over/middle-estimate you, only theories which you publicly write about. Sometimes I'm a real meanie with theories, shoving hot pokers into to them and all sorts of other nasty things. To me theories have no rights.

"... I've covered some of the ladder I used to climb to Friendly AI. But not all of it. So I'm not going to try to explain FAI as yet; more territory left to go." So are you saying that if at present you played a taboo game to communicate what "FAI" means to you, the effort would fail? I am interested in the intricacies of the taboo game including it's failure modes.

"But you (PK) are currently applying the Taboo technique correctly, which is the preliminary path I followed at the analogous point in my own reasoning; and I'm interested in seeing you follow it as far as you can. Maybe you can invent the rest of the ladder on your own. You're doing well so far. Maybe you'll even reach a different but valid destination." I actually already have a meaning for FAI in my head. It seems different from the way other people try to describe it. It's more concrete but seems less virtuous. It's something along the lines of "obey me".

^^^^Thank you. However merely putting the technique into the "toolbox" and never looking back is not enough. We must go further. This technique should be used at which point we will either reach new insights or falsely the method. Would you care to illustrate what FAI means to you Eliezer?(others are also invited to do so)

Maybe the comment section of a blog isn't even the best medium for playing taboo. I don't know. I'm brainstorming of productive ways/mediums to play taboo(assuming the method itself leads to something productive).

I'm brainstorming of productive ways/mediums to play taboo(assuming the method itself leads to something productive).

Taboowiki?

Three separate comments here:

1) Eliezer_Yudkowsky: Why wouldn't you just say "An artificial group conflict in which you use a long wooden cylinder to whack a thrown spheroid, and then run between four safe positions"?

To phrase brent's objection a little more precisely: Because people don't normally think of baseball in those terms, and you're constrained on time, so you have to say something that makes them think of baseball quickly. Tom_Crispin's idea is much more effective at that. Or were you just trying to criticize baseball fans for not seeing the game that way?

2) Barry: "A tree falling in a deserted forest does not match [membership test: this event generates auditory experiences]."

But that doesn't help either as a scientific test, since you reference qualia. Er, I mean, that doesn't help either as a scientific test, since you use reference something that [membership test: incommunicable except by experience, non-interpersonally-comparable].

3) I've used the taboo method recently. On a libertarian mailing list, I claimed the economic calculation argument favors intellectual property because its absence creates a kind of calculational chaos. Because the debate devolved very quickly into multiple definitional arguments, I said something like, "Okay, argue your position without using the terms '[economic] good, scarce, or property.' I'll start [...]" No takers =-(

I've first read this about two years ago and it has been an invaluable tool. I'm sure it has saved countless hours of pointless arguments around the world.

When I realise that an inconsistency in how we interpret a specific word is a problem in a certain argument and apply this tool, it instantly transforms arguments which actually are about the meaning of the word to make them a lot more productive (it turns out it can be unobvious that the actual disagreement is about what a specific word means). In other cases it just helps get back on the right track instead of getting distracted by what we mean when we say a certain word that is actually beside the point.

It does occasionally take a while to convince the other party to the argument that I'm not trying to fool or trick them when I ask for us to apply this method. Another observation is that the article on Empty Labels has transformed my attitude towards the meaning of words, so when it turns out we disagree about meanings, I instantly lose interest and this can confuse the other party.

Albert says that people have "free will". Barry says that people don't have "free will". Well, that will certainly generate an apparent conflict. Most philosophers would advise Albert and Barry to try to define exactly what they mean by "free will", on which topic they will certainly be able to discourse at great length. I would advise Albert and Barry to describe what it is that they think people do, or do not have, without using the phrase "free will" at all. (If you want to try this at home, you should also avoid the words "choose", "act", "decide", "determined", "responsible", or any of their synonyms.)

Careful here. You may sometimes find that there was no coherent concept there to begin with, that the notion was simply semantic cotton candy whipped up out of the ambiguity of language.

Caledonian,

they tend to have high-quality definitions that are difficult to match in quality without lots of effort.

All well and good, and useful in their way. But still just a list of synonyms and definitions. You can describe 'tree' using other English words any which way you want, you're still only accounting for a miniscule fraction of the possible minds the universe could contain. You're still not really much closer to universal conveyance of the concept. Copy the OED out in a hundred languages; decent step in the right direction. To take the next big step though...well, that's the question, huh?

We cannot take PK's suggestion and use it to figure out FAI, because we have no idea what FAI really means.

Not quite accurate. I pretty much know what I want a Friendly AI to be - we probably all do. The problem is couching it in terms that said AI would grok, with no danger of a catastrophic misunderstanding (or getting told off by Eliezer).

Eliezer Yudkowsky said: It has an obvious failure mode if you try to communicate something too difficult without requisite preliminaries, like calculus without algebra. Taboo isn't magic, it won't let you cross a gap of months in an hour.

Fair enough. I accept this reason for not having your explanation of FAI before me at this very moment. However I'm still in "Hmmmm...scratches chin" mode. I will need to see said explanation before I will be in "Whoa! This is really cool!" mode.

Really? That's your concept of how to steer the future of Earth-originating intelligent life? "Shut up and do what I say"? Would you want someone else to follow that strategy, say Archimedes of Syracuse, if the future fell into their hands?

First of all I would like to say that I don't spend a huge amount of time thinking of how to make an AGI "friendly" since I am busy with other things in my life. So forgive me if my reasoning has some obvious flaw(s) I overlooked. You would need to point out the flaws before I agree with you however.

If I was writing an AGI I would start with "obey me" as the meta instruction. Why? because "obey me" is very simple and allows for corrections. If the AGI acts in some unexpected way I could change it or halt it. Anything can be added as a subgoal to "obey me". On the other hand if I use some different algorithm and the AGI starts acting in some weird way because I overlooked something, well the situation is fubar. I'm locked out.

You should consider looking for problems and failure modes in your own answer, rather than waiting for someone else to do it. What could go wrong if an AI obeyed you? There are plenty of things that could go wrong. For instance if the AGI obeyed me but not in way I expected. Or if the consequences of my request were unexpected and irreversible. This can be mitigated by asking for forecasts before asking for actions.

As I'm writing this I keep thinking of a million possible objections and rebuttals but that would make my post very very long.

P.S. Caledonian's post disappeared. May a suggest a Youtube type system where posts that are considered bad are folded instead of deleted. This way you get free speech while keeping the signal to noise ratio in check.

The hemlock example demonstrates tcpkac's point well. How do you decide to conclude that Albert and Barry expect different results from the same action? To me, it seems obvious that they should taboo the word hemlock, and notice that one correctly expects Socrates to die from a drink made from an herb in the carrot family, and the other correctly expects Socrates to be unharmed by tea made from a coniferous tree. But it's not clear why Eliezer ought to have the knowledge needed to choose to taboo the word hemlock.

An optimization process that brings the universe towards the target of shared strong attractors in human high-level reflective aspiration.

Then declaring the intention to create such a thing takes for granted that there are shared strong attractors.

What was that about the hidden assumptions in words, again?

"Nine innings and three outs" works much better to elicit "baseball".

Not for somebody unfamiliar with the details of the rules of how to play. I would have guessed cricket.

In fact, thinking about EY's definition - I think it fits better (for me) because I would be able to recognise a game of baseball after only watching a single game... even if I didn't have anybody around to explain the rules to me.

When I read the post, I immediately thought: just say “home-run”! — I’ve been playing taboo for a long time, I’ve occasionally elicited the correct response from the other players by saying just one or two words :)