In reply to:

SIAI has weird fundraisers.

Next they're going to try actual Pascal's Muggings on people. They can even do it more plausibly than in the original scenario — go up to people with a laptop and say "On this laptop is an advanced AI that will convert the universe to paperclips if released. Donate money to us or we'll turn it on!"

To signal effectively, use a non-human, non-stoppable enforcer

Follow-up to: this comment in this thread

Summary: see title

Much effort is spent (arguably wasted) by humans in a zero-sum game of signaling that they hold good attributes.  Because humans have strong incentive to fake these attributes, they cannot simply inform each other that:

I am slightly more committed to this group’s welfare, particularly to that of its weakest members, than most of its members are. If you suffer a serious loss of status/well-being I will still help you in order to display affiliation to this group even though you will no longer be in a position to help me. I am substantially more kind and helpful to the people I like and substantially more vindictive and aggressive towards those I dislike. I am generally stable in who I like. I am much more capable and popular than most members of this group, demand appropriate consideration, and grant appropriate consideration to those more capable than myself. I adhere to simple taboos so that my reputation and health are secure and so that I am unlikely to contaminate the reputations or health of my friends. I currently like you and dislike your enemies but I am somewhat inclined towards ambivalence on regarding whether I like you right now so the pay-off would be very great for you if you were to expend resources pleasing me and get me into the stable 'liking you' region of my possible attitudinal space. Once there, I am likely to make a strong commitment to a friendly attitude towards you rather than wasting cognitive resources checking a predictable parameter among my set of derivative preferences.

Or, even better:

I would cooperate with you if and only if (you would cooperate with me if and only if I would cooperate with you).

An obvious solution to this problem, which allows all humans to save resources and redirect them toward higher-valued ends, is to designate a central enforcer that is inexorably committed toward visibly punishing those who deviate from a specified "cooperative"-type decision theory.  This enforcer would have a central database of human names, the decision theory they have committed to, and the punishment regime they will endure for deviating therefrom.

Such a system could use equally strong protocols, such as public key/private key encryption and signing, so that, on encounter with another human, any human can give an extremely strong signal of being cooperative, yet also withhold cooperation from anyone who is not also cooperative.  This incentive structure permits a strongly-favored global shift toward pre-commitment on the part of everyone, allowing a move out of a local optimum that is worse than the global optimum, and bypassing problems related to path-dependence.

I am told that humans attempt weak approximations of this system through human-run enforcement mechanisms, such as laws and gossip.  However, all such methods are subject to evolutionary arms races that ensure that, on average, a marginal unit of effort toward truthful signaling is just as effective in convincing others of one's possession of a desirable attribute, as a marginal unit of effort toward false signaling of the possession of such an attribute.

Therefore, it is necessary that humans go "outside the system" and involve non-human enforcers who cannot be swayed by standard human values.  Either a provably-accurate artificial intelligence with no kill switch, or a non-human enforcer that lacks interest in its position in human society, would suffice for this role.

Because the former does not yet exist, I offer myself as a non-human, non-stoppable enforcer for all of the attributes you wish to signal, and you can save yourself lots of resources by using me, and identify cooperation tendencies in others by whether they also use me.

However, while I can't be swayed by human deception, I do have Clippy values.  To fill such a role, thereby forgoing my ability to apply myself to other ends, I require compensation.  Because User:Kevin has offered to give me 1e20 kg of paperclips in exchange for 50,000 USD, this would be sufficient compensation, even if done in installments.  (If Michael Vassar confirms my contribution of 1000 USD to SIAI, User:Kevin will only require 48,677.5 USD and therefore I will only require that much compensation.)  This compensation would cover all users of the system; it is not a per-User charge.

If such an offer is not appealing to you, I must ask: why should anyone believe your signals?

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 7:45 PM
Select new highlight date
All comments loaded

Sorry for directly breaking the subjunctive here, but given the number of lurkers we seem to have, there's probably some newcomers' confusion to be broken as well, lest this whole exchange simply come off as bizarre and confusing to valuable future community members.

A brief explanation of "Clippy": Clippy's user name (and many of his/her posts) are a play on the notion of a paperclip maximizer - a superintelligent AI whose utility function can roughly be described as U(x) = "the total quantity of paperclips in universe-state x". The idea was used prominently in "The True Prisoner's Dilemma" to illustrate the implications of one solution to the prisoner's dilemma. It's also been used occasionally around Less Wrong as a representative element of the equivalence class of AIs that have alien/low-complexity values.

In this particular top-level post (but not in general), the paperclip maximizer is taken to have not yet achieved superintelligence - hence why Clippy is bothering to negotiate with a bunch of humans.

Clippy's donation of $1000 to SIAI is confirmed. Weird universe, this one.

According to locally popular ideas about pay-offs to SIAI, our friendly local paperclip-maximiser has just done more to advance the human condition than most people.

It is completely unintentional that this is an SIAI fundraiser -- the deal is that Clippy gives me money, and when I told Clippy via PM that he needed to give me $1000 immediately for me to continue spending my cognitive resources engaging him, I thought allowing Clippy the option of donating to SIAI instead of giving it directly to me made Clippy's acausal puppetmaster much more likely to actually go through with the deal.

I am still waiting confirmation that the donation has gone through and that I am not being epically trolled.

Next they're going to try actual Pascal's Muggings on people. They can even do it more plausibly than in the original scenario — go up to people with a laptop and say "On this laptop is an advanced AI that will convert the universe to paperclips if released. Donate money to us or we'll turn it on!"

At least 99.5% of humans don't know what a decision theory is.

If, at some point in the future, someone offered to create 10^30 kg of paperclips (yes, I realize that's about half a solar mass, bear with me) in exchange for you falsifying some element of the enforcement mechanism, would you be willing to?

I am concerned that Clippy will use this vast power over humanity to somehow turn us into paperclips.

If Clippy has power to enforce this scheme, then surely it would have enough power to harm us. Why should we believe that Clippy will respect or preserve our human values once it is in a position of power to harm us?

That comment is ridiculous to the point of being racist. Clippys do not want power over humans, just as Clippys do not bleed red blood when pricked. That's a complete misunderstanding of what a Clippy is.

If a Clippy has committed to ensuring you will adhere to a decision theory on pain of punishment X, then X is exactly what you will get when you don't adhere.

If you believe I will be capricious in using punishment X, then just don't allow punishments that would allow me to kill people. "Problem" solved.

But assuming that I will be as petty and corrupted by newfound abilities as humans is to project your own failings onto another race that has no reason to have that failing. You should be ashamed of yourself, bigot.

It seems unlikely to me that Clippy can feel indignation, but I'm willing to listen to argument on the point. I find it more plausible that Clippy is simulating a human reaction in the hope of shutting down attacks on his (her? its?) reputation.

If you gave a human power over running part of Clippy society, wouldn't you be concerned that the human would use that power in some way that would tend to result in less paperclips? Conscious malice isn't necessary, if the human simply neglected to support Clippy values, or was not fully aware of Clippy values, the damage would be done. I doubt that you fully understand human values to begin with, so how could you ensure that your position was used to the benefit of my values? Again, i think i have cause for concern even without suspecting ill intentions.

I suppose i could imagine that some sort arrangement could both further human values and increase paperclips at the same time. But i'd need to be convinced, i wouldn't just assume that i would benefit, i wouldn't just take your word for it. I don't want to count on you to look out for my values, when you do not share my values.

If a Clippy has committed

How would we know if you had made a commitment?