The first underwater test of an atom bomb, in the US at Bikini Atoll in the Pacific, July 24, 1946.
Also, the LessWrong.com Frontpage this Petrov Day.

We failed.

The Frontpage was taken down for 24 hours.

Launch Attempts

At 1:26 am, ~275 users got an email (and 20 minutes later a PM) from me, with their unique launch codes. At this time, the Big Red Button was on the Frontpage, and codes could be submitted.

Two users attempted to submit launch codes. It was made clear that any such attempts would be deanonymised, as the button itself said "This is not an anonymous action".

The users were: Grotace at 2:09 am, who submitted the random string "kdjssndksjnd" to no effect, and Chris_Leong at 4:33 am, who submitted his personalised launch codes, taking down the site.

What happened?

Note that the following includes private messaging metadata about a number of users. All users (except for the attacker) were asked for their explicit consent for it to be included, and it would have been anonymized had they declined, even if it was clearly deducible from the public record. The LessWrong team only looks into private messaging metadata when there is a suspicion of sockpuppeting, phishing, or other forms of abuse, or when permission has been given by the user (e.g. for debugging). Additionally, Chris Leong commented on a draft of this post before it was published.

On receiving his codes, Chris Leong posted on LessWrong and Facebook asking whether he should take down the site.

He also wrote a comment on LessWrong:

Should I press the button or not? I haven't pressed the button at the current time as it would be disappointing to people if they received the email, but someone pressed it while they were still asleep.

He did not get replies on the LessWrong comment before he took the site down. The first two comments on the Facebook post were as follows.

Commenter 1

I kind of feel like you should take it down for a day. No individual should be trusted with the ability to destroy the world. We are all driven by personal incentive mechanisms and other systems must be developed to ensure it’s not that easy for one person to bring so much chaos into the world.

Also... when’s the next time you’re going to get the opportunity to take down a massive site for a day legally?

Commenter 2

LessWrong is archived elsewhere so the damage by bringing it down (especially just for a day) is not that much. Equally, there isn't much benefit that you would get by bringing it down. As the website says, if you decide to enter your launch codes, it's not anonymous. So the question becomes one of social capital.

After the second comment, the following happenned:

  • A LW account was created with the username "petrov_day_admin_account".
  • From 4:21 to 4:34 am (Pacific Time), this user opened 6 conversations, sending only 4 messages.
  • They sent one message each to Chris_Leong, Raemon, Habryka, and Liron.
  • They all contained identical body text with the name changed for each user (although Liron’s message began with “Hello Chris_Leong”).
  • Chris Leong's message was sent at 4:26 am.
  • A conversation was opened with adamzerner, and a second conversation was opened with Chris_Leong, but no messages were sent, as by that time it was no longer necessary.

Here were the next three comments on Chris's Facebook post. (By all accounts the relevant text below matches the text in the messages sent to the above users.)

Commenter 2

Well, I sent out this message to a lot of users, now the site has gone down, but it doesn't say who pressed the button so it could still be a coincidence:

Hello <NAME>,

You are part of a smaller group of 30 users who has been selected for the second part of this experiment. In order for the website not to go down, at least 5 of these selected users must enter their codes within 30 minutes of receiving this message, and at least 20 of these users must enter their codes within 6 hours of receiving the message. To keep the site up, please enter your codes as soon as possible. You will be asked to complete a short survey afterwards.

Chris Leong

Damn it, you got me! Well done

Commenter 2

No problem :D Maybe the lesson is that if you can destroy the world you shouldn't let people know? Not really sure, just thought it would be fun.

Chris Leong

Well played!

And that is the story of how LessWrong was taken down for 24 hours by Chris Leong, on Petrov Day.

What went wrong?

One of the most important fact about this year is how many users were given codes. Last year had success with ~125 users getting codes, and this year the total was more than doubled to ~275 users. This weakened the selection filter which ultimately failed.

Well-intentioned does not mean secure

The first mistake was in how I chose which users to entrust with codes.

The main buckets I made were high karma users I know and have a judge of character (~140), high karma users I don't currently know very well (~50), and low karma users I know and have a judge of character (~80).

I weighed my selection toward users using real names instead of pseudonyms, as reputation is a factor people track when deciding whether their actions are good or bad.

For high karma users I don't know very well, I spent about 5 hours reading through their comments and posts, asking myself questions like the following:

  • Does this user write comments in good-faith?
  • Does this user put in time and effort to explain their ideas in the comment sections?
  • Does this user write short, snarky comments?
  • Does this user try to help the site grow, such as by proposing and giving feedback on new features?
  • Does this user have a grudge against LessWrong?
  • Does this user seem to understand and partake in the culture and sense-of-humor of LessWrong?

Overall, I picked users on the basis I thought they would not choose to take the website down. I didn't consider for a moment that it would not be a matter of direct choice at all, but that someone would fall for a standard phishing attack. 

A classic mistake on my part. In that it is the opening paragraph of Eliezer's classic paper Cognitive Biases Potentially Affecting Judgment of Global Risks:

All else being equal, not many people would prefer to destroy the world. Even faceless corporations, meddling governments, reckless scientists, and other agents of doom require a world in which to achieve their goals of profit, order, tenure, or other villainies. If our extinction proceeds slowly enough to allow a moment of horrified realization, the doers of the deed will likely be quite taken aback on realizing that they have actually destroyed the world. Therefore I suggest that if the Earth is destroyed, it will probably be by mistake.

So the first lesson is that if you want to entrust others with powers that can be easily abused or used destructively, it is not sufficient for them to be well-intentioned, it is also necessary for them to be hard enough to trick that the payoff isn't worth it for any external adversary.

Users saw different meanings to the ritual

The launch codes gave users the ability to take down the Frontpage for 24 hours, a page that a few thousand people visit every day. A single user being unable to use it is a simple irritation; once you multiply this by thousands it becomes a non-trivial communal resource.

However, while many users understood that they were being given responsibility for the commons, many users did not see a clear connection to Petrov's situation, whose choices were different in many ways to the LessWrong Big Red Button. Furthermore Neel Nanda commented he felt that the communications around it felt like “RPG flavor text” which suggested it was not to be taken seriously. This confusion made the day less meaningful for a number of people, and in particular some people felt confused about how it could possibly symbolize Petrov's choice and thus thought it was "just a game".

To respond to that briefly: when you are given responsibility for a communal resource, even if the person giving it to you says “It’s fine to destroy it – play along!” then you’re still supposed to think for yourself about whether they’re right. One of the core virtues of Petrov day is how Petrov took responsibility for launching nuclear armageddon, and he didn’t “assume the people in charge knew best” or “just do what expected of him”. So in some ways I feel like this challenge was unfair in the same way that reality is unfair, and it is a question about whether people noticed their responsibility to the commons without being told that they were supposed to take responsibility. On the other hand, in some ways this was a harder challenge than Petrov’s, because the stakes were measured in a much smaller communal resource and not in the lives of a billion people, and because Petrov had more lead time to think before the time came when he had to make the decision.

Nonetheless, several users requested that the relation between the exercise and Petrov’s decision be made more explicitly before next year, which seems quite reasonable to me. So in advance of next year, I’ll write up an explanation of why I think the Big Red Button is an accurate symbolic setup to Petrov's situation.

I did not check who submitted launch codes last year

Note that this section includes private data about a user (Chris Leong). The user was asked for their explicit consent before it was included, and had they declined then this section would have been redacted. It is further the case that Chris Leong commented on a draft of this post before it was published.

Last year, Chris_Leong pressed the Big Red Button, guessed an incorrect launch code, and submitted it.

(He was not given real launch codes last year.)

I did not think to check the group of 7 accounts with non-zero karma who submitted random launch codes last year, and ensure that they were not in this year's batch. I will be doing so in future.

Chris was the only one in that batch who was given codes this year. To me, this is suggestive evidence that of the people in this year's batch Chris was the most likely to press the Big Red Button, and causes me to update that had he in particular not been in the batch, there quite likely would not have been a next person to press the button instead.

In a comment on a draft of this post, Chris noted that he doesn't recollect doing this last year, and also asked me what message was displayed next to the button last year. 

The answer is that there was no message next to the button last year, it looked like this:

The Big Red Button in 2019.

Indeed, last year I know of a user (not Chris Leong) who visited the site, clicked the red button, and entered and submitted codes, before finding out what the button did.

As a result of that user, this year we changed the red button to the following, so that mistake would not happen again.

The Big Red Button in 2020.

I nonetheless think that playing with the button is a clear sign that I shouldn't have included someone in this year's batch.

I should not have given launch codes to Chris

Chris is well-intentioned, but this did not mean he could be entrusted with this communal resource. A number of factors in retrospect show that I should not have given the codes to Chris.

  • He didn’t think he had a responsibility to the commons in this situation. He treated it like a game, even though he was given the ability to destroy a non-trivial communal resource.
  • He failed to model that he was not secure and that people would like to trick him on this matter, so posted that he had codes in a place where someone tricked him into using them.
  • He didn’t realize that a lot of users cared seriously about this tradition and this exercise, which could have led him to rethink the above two bullets.

By the way, if you'd like to read Chris's account of what happened, see his post On Destroying The World.

Looking Ahead

This is the 14th year of Petrov Day, and the 2nd year that we have observed Petrov Day with the Big Red Button ritual on LessWrong. On many continents there were a number of annual ceremonies celebrating Petrov Day, even in this socially distanced time, remembering how fragile the world is and our responsibility to not destroy it.

I learned a lot this year. I think many of us did.

(Thank you to everyone who participated in the lively discussion.)

I definitely updated upward on how many counterfactual worlds ended this way in the Cold War, where some third party to the conflict attempted to trick one of the players into instigating an attack, perhaps by posing as a relevant authority and saying it was a 'training exercise'.

I think this is one of the more innocuous ways that the site could've gone down (not through malice, but through lack of taking it seriously, light trolling, and a security failure).

I hadn't though about red-teaming the site. We often learn the most important principles the hard way. While pentesting your friends for their private details and other low-consequence effects is great, I don't think much of people who unilaterally try to take down something more like a public utility for 24 hours. But it is certainly a valuable addition to the setup, and I'll look into setting up some red-teaming next year.

Last year I assigned a 40% probability to the site going down, and it didn't. This year I assigned a 20% probability to the site going down, and it did.

I think had this happened the first year we tried it, I would expect a common response to be that it was obvious that we would fail, and that we were far too credulous if we that thought we could give codes to over one hundred rationalists and one of them wouldn't take down the site.

Then, after last year's success, I received many messages saying that it was clear nobody would take it down this year. So in some ways this has been good for making the challenge level clear – this is a real coordination problem, that we can fail, and isn't overdetermined in either direction.

Onwards and upwards. This was a collective coordination problem for LessWrong, and we’re 1 for 2. We'll return next Petrov Day, for another round of sitting around and not pressing destructive buttons, alongside our ceremonies and celebrations.

So far we've had one Petrov Day on LessWrong without pressing the Big Red Button.

I hope there are many more to come.


Afterword

In spite of Chris taking down the site this week, I still appreciate Chris's many positive contributions to LessWrong and our associated communities, and for everyone else's sake I'll briefly list some of them here.

I'm grateful for all of the above contributions Chris has made, his contribution to LessWrong has been clearly net positive.

New Comment
63 comments, sorted by Click to highlight new comments since:

Below the account by Chris, I listed multiple meta-games stacked on top of the Petrov button, namely

  1. press the button or not
  2. take it as a game | as a serious ritual | as a serious experiment
  3. cooperate or defect on the implicit rule allowing play behaviour ~ "you are allowed to play and experiment in games and this is safe. it is understood actions you take within the game will not be used as an evidence of intent outside of the game". (imagine I play a game of chess with someone and interpret my opponent taking my pieces as literarily trying to harm me)
  4. the meta-game of making the game interesting; cf munchkin
  5. the meta-game of making the experiment valuable for learning
  6. coordination about which of these games we are playing
     

Overall for me one take-away is LW community should get better at game #6. 

While "use of LW for 24h" is at stake at game #1, I would argue there are actually higher stakes at some of the other games. 

For example, if most people take it as a serious ritual at game #2, the warning text attached to the button should maybe state also "apart from blowing up the site, you will also blow up some part of your social credit, opportunities and trust". Coordination failure at game #2 can also lead to a situation where someone understands the situation as a game, decides for some reason it better to press the button, and faces social repercussions from people who choose "serious ritual" or "serious experiment" in #2. I can imagine this has somewhat large tail-risk, including for example someone leaving the community entirely, or causing more drama and psychological pain than the payoff in game #1.

For some people failures at game #6 can touch things like "thou should not make people participate in serious and potentially harmful psychological experiments without clear consent". 

Ultimately game #5 is maybe the most important where this community learning wrong intuitions about xrisk on S1 level could be at stake. 

I appreciate how Ben handled this: it was nice for him to let me comment before he posted and for him to also add some words of appreciation at the end.

Regarding point 2, since I was viewing this in game mode I had no real reason to worry about being tricked. Avoiding being tricked by not posting about it would have been like avoiding losing in chess by never making the next move.

I guess other than that, I'd suggest that even a counterfactual donation of $100 to charity not occurring would feel more significant than the frontpage going down for a day. Like the current penalty feels like it was specifically chosen to be insignificant.

Also, I definitely would have taken it more seriously if I realised it was serious to people. This wasn't even in my zone of possibility.

I’d suggest that even a counterfactual donation of $100 to charity not occurring would feel more significant than the frontpage going down for a day.

This suggests an interesting idea: A charity drive for the week leading up to Petrov Day, on condition that the funds will be publicly wasted if anyone pushes the button (e.g. by sending bitcoin to a dead-end address, or donating to two opposing politicians' campaigns).

I'll just throw in my two cents here and say that I was somewhat surprised by how serious the Ben's post is. I was around for the Petrov Day celebration last year, and I also thought of it as just a fun little game. I can't remember if I screwed around with the button or not (I can't even remember if there was a button for me).

Then again, I do take Ben's point: a person does have a responsibility to notice when something that's being treated like a game is actually serious and important. Not that I think 24 hours of LW being down is necessarily "serious and important".

Overall, though, I'm not throwing much of a reputation hit (if any at all) into my mental books for you.

Indeed, last year I know of a user (not Chris Leong) who visited the site, clicked the red button, and entered and submitted codes, before finding out what the button did.

As a result of that user, this year we changed the red button to the following, so that mistake would not happen again.

If I showed up cold (not as a person who'd actually been issued codes and not having advance knowledge of the event), and saw even the 2020 button with "Click here to destroy Less Wrong", it would never cross my mind that clicking it would actually have any effect on the site, regardless of what it says.

I'd assume it was some kind of joke or an opportunity to play some kind of game. My response would be to ignore it as a waste of time... but if I did click on it for some reason, and was asked for a code, I'd probably type some random garbage to see what would happen. Still with zero expectation of actually affecting the site.

Who would believe what it says? It's not even actually true; "destroy" doesn't mean "shut down for a day".

The Web is full of jokey "red buttons", and used to have more, so the obvious assumption is that any one you see is just another one of those.

The Web is full of jokey "red buttons"

This point is important.  LW is the type of website I would expect to do some quirky applet that gives an object lesson in iterated game theory, or something, and as a person still trying to learn on LW's key concepts, I would be inclined to try to play.  Thus, the fact that chris pressed the button last year and entered fake codes is neutral evidence to me.  It's what I would have done if I were playing what I thought was an instructive game.

By contrast, Petrov believed (I assume correctly or at least justifiably) that all of the "buttons" available to him were real buttons that would cause the Soviets to launch at least one nuke and probably hundreds.

I don't have a good solution for the problem of "Convince a relative newcomer to the site, solely via the button, that the button is real and will do what it says it does".  For me, as such a newcomer, this is because HPMOR and SSC (my entrees into the community) have such a screwball sense of humor.  

Thus, the more times you try to say, "For real, clicking this button will actually cause the front page to go down for 24 hours," or "Lots of people are going to be annoyed by this," or "Clicking this button will tend to show the community that its xrisk calculations are not unreasonably high," or even "Petrov would be ashamed of you (epistemic status: confident)," the more it looks like a game, because most of the Internet is so seamless that multiple caveats come off as satire.

Maybe you could add links to the postmortems for Petrov Day 2019 and 2020 in the button?

He treated it like a game, even though he was given the ability to destroy a non-trivial communal resource.

I want to resist this a bit. I actually got more value out of the "blown up" frontpage then I would have from a normal frontpage that day. A bit of "cool to see what the LW devs prepared for this case", a bit of "cool, something changed!" and some excitement about learning something.

Yeah, I'm not sure if the net effect of the blown-up frontpage is positive or negative for me, but it's definitely dominated by enjoyability/learning from posts about it, rather than the inability to see the frontpage for a day. (Very similar to how the value of a game is dominated by the value of the time spent thinking about it, while playing.) I didn't even try to access the frontpage on the relevant day, but I don't think it would have been much of an inconvenience, anyway; I could have just gone to the EA forum or my RSS feed for similar content (or to greaterwrong for the same content, if that was still up).

This doesn't mean that it's a good idea to blow up the frontpage because it's more fun, or whatever. I think it's probably better to not blow up the frontpage, but the case for this is based on meta-level things like ~trust and ~culture, and I think you do need to go to that level to make a convincing consequentialist case for not blowing up the frontpage. The stakes just aren't high enough that the direct consequences dominate. (And it's hard to raise the stakes until that's false, because that would mean we're risking more than we stand to gain.)

Unfortunately, this makes the situation pretty disanalagous to Petrov. Petrov defied the local culture (following orders) because he thought that reporting the alarm would have bad consequences. But in the lesswrong tradition, the direct consequences matter less than the effects on the local culture; and the reputational consequences point in the opposite direction, encouraging people to not press the button.

(Though from skimming the wikipedia article, it's unclear exactly how much Petrov's reputation suffered. It seems like he was initially praised, then reprimanded for not filing the correct paperwork. He's been quoted both as saying that he wasn't punished, and that he was made a scapegoat.)

FWIW, I thought the 'Doomsday phishing' attack was absolutely brilliant. Hey! Sometimes people will deceive you about which things will end the world! May we all stay on our toes.

Do we know who sent the phish, or have corroboration that it actually happened?  My favorite scenario (well, not really favorite, as I don't understand the payoffs for destruction.  Most theoretically-interesting, perhaps) is where chris_leong is playing another level game, hoping that benito will believe it's a misclick, and continue to cooperate (by including them next year) despite knowing that tit-for-tat is a generally good strategy.

I would love to hear from the phisher (using that account if they want it to be anonymous, including if it's chris_leong themselves) how they think of the exercise and what motivated them to do so.  And how they picked their victim(s) - I got a code, but not a phish, and I feel left out!

 I'd also be interested to hear from any additional targets how they reacted and whether they detected the phish or just followed their heart to not press the button.  

The guy left two comments on Chris's post, here and here.

Thanks.  Chris is certainly playing a deep game...

 

( kidding!  I have no reason to believe that it isn't exactly as claimed.)

Two different formulations of the problem that Chris faced:

  1. Chris got a message saying that he had to enter the codes, or else the frontpage would be destroyed. He believed it, and thought that he had to enter the codes to save the frontpage. Arguably, if he had destroyed the frontpage by inaction (ie., if the message had been real, and failing to enter the codes would've caused the destruction of the frontpage) he would have been far less chastised by the local culture than if he had destroyed the frontpage by action (ie., what actually happened). In this case, is it more in the spirit of Petrov to take the action that your local culture will blame you the least for, or the action that you honestly think will save the frontpage?
  2. Chris got a message that he had to enter the codes, or else bad things would happen, just like Petrov got a sign that the US had launched nukes, and that the russian military needed to be informed. The message wasn't real, and in fact, the decision with the least bad consequences was to ignore it. In this case, is it more in the spirit of Petrov to consider that a message might not be what it's claiming to be (and accurately determining that it's not) or to just believe it?

I don't know what the take-away is. Maybe we should celebrate Petrov's skepticism/perceptiveness more, and not just his willingness to not defer to superiors.

You make a good point that I haven't seen posted elsewhere:

Chris got a message that he had to enter the codes, or else bad things would happen, just like Petrov got a sign that the US had launched nukes, and that the russian military needed to be informed

The attacker targeting Chris made this year's Petrov Day experiment more accurate to history than the one in 2019. In 2019 there was no incentive to push the button. In 2020 there was an apparent, but false incentive to do so, just like there was for Petrov.

This was also mentioned in the comments of On Destroying the World

I'm happy to have seen the discussion of this, and I'm glad you're running this ritual (performance? experiment? game? Even with all this discussion, it still likely means different things to different people, and I believe that's OK).  I'm honored to have been given a code, even though it was over by the time I knew about it.

I look forward to next year's post-mortem, especially your selection of participants.  I genuinely can't predict whether you'll include Chris_Leong.  If you exclude them, that's some indication that you're selecting against people who think and write about the topic, making it a less impressive feat if the button goes unpressed.  If you include them, that's (probably) an increased risk of another failure.

I also look forward to seeing if I qualify.  The only reason I didn't press it this year is that I didn't give it any thought either way, and by the time I paid attention it was already over.  I suspect that next year I will be more involved.  Exactly what that means will depend a lot on the framing and discussion of the event.  

At the base, I still am uncertain exactly how to take it.  I remain absolutely convinced that human-factors root cause of the outage is "feature pushed to production which was designed to make the homepage unavailable, worked as intended."  The next level of analysis would be "why is it considered acceptable to write such code?", not "why would someone be so foolish as to enter a code which a site admin went to a lot of effort to send them?".  

Without a whole lot of discussion and agreement about the purpose and value of the exercise, how is it even in the realm of understandable that we would judge someone more harshly for entering the code than for devising the system and e-mailing codes to hundreds of people?  

Lol, I like this comment. I am gonna write that post, setting out a full explanation for the tradition...

Minor observation: increasing the number of users who received codes this year is analogous to the nuclear proliferation problem.

(I hadn't thought of that. I like it.)

One other difference between 'this thing we have' and 'that thing Petrov had' is that the day was entirely ordinary for him.

We live in our bubble where we have a High Stakes Day on which we are given a very specific choice. We expect it... we anticipate it; we can turn it into a game or even worse than a game. He had expected a normal shift at work (and beyond that we should no more assign thoughts to him than to a man who decided to buy a specific kind of sandwich. Just because we are more drawn to visualizing something beautiful doesn't make us any less wrong about it.)

I propose to deconstruct the tradition. Make it a random day, send codes to random people, and don't explain what will happen beyond "the site will go down for 24 hours".

I like this idea. It also adds the political element back into the equation. Rather than Petrov Day, we have Petrov Protocols.

You mentioned petrov_day_admin_account, but I got a message from a user called petrovday:

Hello John_Maxwell,

You are part of a smaller group of 30 users who has been selected for the second part of this experiment. In order for the website not to go down, at least 5 of these selected users must enter their codes within 30 minutes of receiving this message, and at least 20 of these users must enter their codes within 6 hours of receiving the message. To keep the site up, please enter your codes as soon as possible. You will be asked to complete a short survey afterwards.

I saw the message more than 6 hours after it was sent and didn't read it very carefully. The possibility of phishing didn't occur to me, and I assumed that this new smaller group thing would involve entering a different code into a different page. Anyway, it was a useful lesson in being more aware of phishing attacks.

I don't think much of people who unilaterally try to take down something more like a public utility for 24 hours.

If you're going to appeal to consequences, you are as much at fault for getting the site taken down as the attacker. It is reasonable for the attacker to assume that you would not do something you think is bad. He can conclude that you must not be assigning blame based on consequences, aka it's a game.

Would it also be reasonable for a user to expect that the administrator of a site would not expose it to being shut down by some random person, if the administrator did not see the matter as a game?

That's a reasonable assumption, but it's wrong in this case. Ben greatly values both LessWrong staying up and this serious experiment celebrating Petrov day. But the experiment can be serious only if he commits to shutting down the site when somebody enters the codes. Ben thought there was only a 20% chance of that happening. So the other reasonable conclusion is:

Value of Petrov Day Experiment > 0.2 * Value of LessWrong not going down for a day

And Ben acted accordingly.

Yes, that follows logically and was part of what I was trying to communicate.

It is reasonable for the attacker to assume that you would not do something you think is bad.

It is reasonable and it is wrong. The point of Petrov Day is that Petrov didn't do what was reasonable, he actually managed to take the right action. Perhaps it's not much criticism to say someone doesn't live up to Petrov's standards, but nonetheless, if you blindly trusted the admin team, you didn't live up to Petrov's standards.

I disagree with this. Yes, the admins put a valuable resource at risk, which in their estimation has a non-trivial negative impact if it is taken down. But this was done in the hopes of gaining something more important than what they risked: a meaningful signal of trust, not just in any particular person, but trust in an entire culture and community (i.e., the LessWrong community), trust that selected people from this community can be given the power to do harm, and have things not go poorly. This common knowledge is very valuable (especially when this community is actively currently dealing with an existential risk that is just as dangerous, even more so, as nuclear war is/was), more than what the admins put at stake to hopefully earn this trust.

In contrast, a participant in Petrov Day who fails to live up to expectations, not only causes a harm which is intended (by the organizers) to be non-trivial, but also sacrifices our chance at building this trust, a chance we won't have again for 12 more months.

One reason I'm dissatisfied with the LW Petrov Day Button is that I consider it to equivocate between Petrov's role (where unilateral action against an imperfect command system is good and correct) and the role of nuclear actors on the country-sized abstraction level (where unilateral action on the part of any actor is bad).

I think the phishing that occurred this year caused a good approximation of Petrov’s role (especially since it constituted a command-and-control failure on the part of the LW team!) Therefore, I was interested in sketching a preliminary design for a scenario that is analogous to taking on the role of a nuclear-armed actor:

  1. Make it a non-trivial action for individuals to "make" a button -- in particular, have them pay LW karma
  2. Have the consequences accrue against some specific other user (potentially a very large decrease in karma or a tempban)
  3. Allow people to announce first-strike and second-strike plans in advance
  4. Let people "buy" a smaller likelihood that your own bombs go off accidentally (less karma = more likely/more karma = less likely)
  5. Let people "buy" more and bigger bombs
  6. Allow people a method to coordinate methods to check other peoples’ “safety" and "proliferation amounts”
  7. Only cause problems "for everyone" if enough bombs go off, at some threshold unknown to most people (to take into account uncertainty over how bad nuclear winter would be)
  8. Allow people to use, or add safety measures to, “bombs” they “purchased” during last year’s event

 

There are pros and cons to this approach over the current approach. Some I thought of:

Pro: More closely resembles reality, meaning that the people who are treating the event as an experiment may be able to draw better conclusions.

Pro: An increase in the stakes. I expect people to care more about their own LW Karma going down, or personally not being able to use the site, than the front page going down for a day, which only trivially decreases usability.

Pro: Better incentive alignment. If participation in the game entails paying karma, and you care about your karma, the best move for a group to coordinate on would be not to play!

Pro: Resistance to trolls — if the karma threshold to “purchase” a bomb is sufficiently high, and if people are excluded from playing who started their accounts within a certain period of time, the only people who would be able to play are community members.

Pro: People with more karma have more “ability” to participate, like the current system. This may be a good check for whether karma actually maps to cooperation traits that Ben/Ray/Oliver seem to care about.

Con: More complicated. If you’re interested in using the event to communicate a simple lesson about unilateral action being bad, this experiment will probably not serve that goal as well.

Con: No ability to opt out. To better approximate the real problems with nuclear proliferation, any account should have to be able to get “nuked.” The current system arguably also has this problem, but because the stakes are lower people are much less likely to get upset or have their experience with the website ruined.

Con: If point 8 is included, it makes the experiment less contained, which may make it harder to draw conclusions from the experiment or replicate it.

I wonder if commenter 2 had been given a code themselves, would the site still have gone down? They likely would have been less incentivized to sabotage the experiment simply because it's no longer a challenge. I suspect they were primarily motivated by the excitement of successfully trolling/red-teaming LW. It's not fun when the admin gives you an easy way to do it, but if it's challenge (you have to fool someone to take the site down), then that's a different story.

So perhaps the problem is not that the launch codes were given to too many people but rather to too few.

He said:

Beyond that, loyalty and trust are also very important to me. If the admins had trusted me with the launch codes, I wouldn’t have nuked the site (intentionally).

But, well. While it seems plausible to me that he's telling the truth... I also think the actions he took are evidence that he's the kind of person who would say that kind of thing whether or not it's true.

at one point I saw a clipper game pass where you were an AI maximizing paperclips; I didn't play it as I thought that was the goal, but turned out it was just a normal game, not a symbolic one (although I still haven't played)

This game is worth playing if you still haven't. I promise you it has a satisfying ending

https://www.decisionproblem.com/paperclips/

Thanks!

But your comment is indistinguishable from social engineering trying to make me play a game where the winning condition is to not play :) It's not you that I don't trust, it's your reference class :)

Also, I've bet 200 CAD : 250 CAD with a friend that ze would play a video game again before I would (months ago).

"It's not you that I don't trust, it's your reference class"

This is a great line, I'm stealing it. If we had a place for rationality quotes on this site, or rationalists out of context, I'd submit this.

I bypassed the front page "being down" before I realised it was supposed to be "down". Any "damage" seems to require a significant symbolic play-along. No harm, no faul from my point of view to a large extent.

If you ever want to play for real stakes, make the button take down LW permanently. Have a script that erases all the code + data with no way to recover. Plus have all the developers pre-commit to erasing the code on their computers, etc...

I do think it would be really scary and make us feel grateful for what we have.

I don't think it'd be worth it, but I do think it's plausible that we want to try very big stakes like that in future years.

I think another lesson to learn from this is that it's important to ask questions well. If Chris provided the relevant context for his question (the texts of both messages he recieved), we would understand what he's talking about and maybe could stop him. 

BTW I'm not angry at Chris or judging him or anything like that, just wanted to say that it's a generally helpful life skill.

So in some ways this has been good for making the challenge level clear – this is a real coordination problem, that we can fail, and isn't overdetermined in either direction.

I think every single part of this exercise, and all of the responses to it, have been extremely informative.  

  1. The priors of hundreds of thoughtful people have been updated substantially.  
  2. Others learned that they cared about something (either the site or the Petrov Day button) more than they thought they did, and, being thoughtful, learned something about themselves by thinking about why they cared so much.  
  3. At least one person (chris) received an object lesson in security that I doubt ze will forget.  
  4. The rest of us had the opportunity to learn that lesson as well, although lessons not learned in blood are soon forgotten, and for a given individual, "losing the front page for one day" is far less blood than what it cost chris.   
  5. I updated my priors about myself.  If you'd asked me in 2019, after the successful no-press, whether I would have pressed it with codes, I probably would have said no.  However, seeing the amount of thoughtfulness generated by what went down in 2020, if I get codes in 2021, I will give serious thought to pressing the button because doing so:
    1. will, no matter the reason, instill even further caution and security-paranoia in AI researchers, which is stressful for them but beneficial for humanity.  
    2. causes the community to learn something about itself which can't be predicted in advance but is almost certainly accurate.
    3. shows that if it's something I would consider, it's something certain types of AI would consider, too, if we're not careful.  AI risk researchers are aware of that problem, but pressing the button gives one the opportunity to make them feel it in their guts.

I wonder if there is instrumental value in pre-committing to being a Petrov Day ruiner unless certain types of conditions are met.  "I am coming to the San Francisco Petrov Day 2022 celebration and will press the button five minutes before the end of the ceremony unless I first come to understand how we can solve the problem of whether an autonomous vehicle should swerve, and thereby kill the 18-year-old pregnant valedictorian, or not swerve, and thereby kill the 84-year-old semi-retired Medicin Sans Frontieres who hand-carves toys for kids at the orphanage."  

However, seeing the amount of thoughtfulness generated by what went down in 2020, if I get codes in 2021, I will give serious thought to pressing the button because doing so will, no matter the reason, instill even further caution and security-paranoia in AI researchers, which is stressful for them but beneficial for humanity.

The reason that it induces stress to see the website go down is that it tells me something about the external world. Please do not hack my means of understanding the external world to induce mental states in me that you assume will be beneficial to humanity.

Very much this. Indeed, creating common knowledge that the people around me will be able to wield potentially destructive power without trying to leverage that power into taking resources away from me, or trying to force me to change my mind on something, is one of the things I want out of Petrov Day, since it's definitely a thing I am worried about. 

I understand and sympathize with the desire to know that people around you can hold that power without abusing it. I would also like to know that.  

But it's only ever a belief about the average behavior of the people in the community.  It should update when new information becomes available.  The button is a test of your belief.  Each decision made by each person to press or not to press the button is information that should feed your model of how probable it is that people can hold the power without abusing it.  A bunch of people staring at a red button with candles lit nearby can be an iterated Prisoner's Dilemma depending on the participants' utility functions.

If I decide not to Petrov-Ruin out of a desire to protect your belief that people can hold the power without abusing it, and I make that change because I care about you and your suffering as a fellow human and think your life will be much worse if my actions demolish that belief, then a successful Petrov Day is at risk of becoming another example of Goodhart's Law. 

I think, anyway. Sometimes my prose comes off as aggressive when I'm just trying to engage with thoughtful people.  I swear, on SlateStarCodex's review of Surfing Uncertainty, that I'm typing in good faith and could have my mind changed on these issues.

If I decide not to Petrov-Ruin out of a desire to protect your belief that people can hold the power without abusing it, and I make that change because I care about you and your suffering as a fellow human and think your life will be much worse if my actions demolish that belief, then a successful Petrov Day is at risk of becoming another example of Goodhart's Law.

TBC, I think you're supposed to not Petrov-ruin so as to not be destructive (or to leverage your destructive power to modify habryka to be more like you'd like them to be). My interpretation of habryka is that it would be nice if (a) it were actually true that this community could wield destructive power without being destructive etc and (b) everybody knew that. The problem with wielding destructive power is that it makes (a) false, not just that it makes (b) false.

It sounds like we draw the line differently on the continuum between persuasion and brain hacking.  I'd like to hear more about why you think some or all parts of this are is  hacking so I can calibrate my "I'm probably not a sociopath" prior.   

Or maybe we are diverging on what things one can legitimately claim are the purposes of Petrov Day.   If the purpose of an in-person celebration is "Provide a space for community members to get together, break bread, and contemplate, and the button's raises the tension high enough that it feels like something is at stake but there's no serious risk that the button will be pressed," then I'm wrong, I'm being cruel, and some other forum is more appropriate for my Ruiner to raise zir concern.  But I don't get the sense that there's a community consensus on this.  

On the contrary, the fact that all attendees are supposed to leave in silence if the button is pressed suggests that the risk is meant to be real and to have consequences.

I'd like to hear more about why you think some or all parts of this are hacking so I can calibrate my "I'm probably not a sociopath" prior.

I don't think that you doing this would be "brain hacking". But your plan to press the button in order to make me be more cautious and paranoid works roughly like this: you would decide to press the button, so as to cause me to believe that the world is full of untrustworthy people, so that I make different decisions. Here's my attempt to break down my complaint about this:

  • You are trying to manipulate my beliefs to change my actions, without checking if this will make my beliefs more accurate.
  • Your manipulation of my beliefs will cause them to be inaccurate: I will probably believe that the world is contains careless people, or people with a reckless disregard for the website staying up. But actually what's going on is the world contains people who want me to be paranoid.
  • To the extent that I do figure out what's going on and do have true beliefs, then you're just choosing whether I can have accurate beliefs in the world where things are bad, vs having accurate beliefs in the world where things are good. But it's better for things to be good than bad.
  • If I have wrong beliefs about the distribution of people's trustworthiness (or the ways in which people are untrustworthy), I will actually make worse decisions about which things to prioritize in AI security. You seem to believe the converse, but I doubt you have good reasons to think that.

On the contrary, the fact that all attendees are supposed to leave in silence if the button is pressed suggests that the risk is meant to be real and to have consequences.

Yes. Pressing the button makes life worse for your companions, which is the basic reason that you shouldn't do it.

I genuinely, sincerely appreciate that you took the time to make this all explicit, and I think you assumed a more-than-reasonable amount of good faith on my part given how lathered up I was and how hard it is to read tone on the Internet.

I think the space we are talking across is "without checking if this will make my beliefs more accurate."  Accuracy entails "what do I think is true" but also "how confident am I that it's true".  Persuasion entails that plus "will this persuasion strategy actually make their beliefs more accurate".  In hindsight, I should have communicated why I thought what I proposed would make people's beliefs about humanity more accurate.

However, the response to my comments made me less confident that the intervention would be effective at making those beliefs more accurate.  Plus, given the context, you had little reason to assume that my truth+confidence calculation was well-calibrated.    

There's also the question of whether the expected value of button-pressing exceeds the expected life-worsening, and how confident a potential button-presser is in their answer and the magnitude of the exceeding.  I do think that's a fair challenge to your final thought.

Thanks again.

Please don't threaten to actively destroy things I care about in order to get what you want from me.

I'm missing something fundamental here given that this is your view as organizer, so I'm going to go back to lurking for a while.

EtA: I think I changed my mind; see comment below.

I suggest a penalty of 1000 karma points to Chris.

I suggest for Chris to reply to this comment to provide us a karma sink.

meta: I predict this comment to be downvoted.

Normally to be effective incentives are announced beforehand.

That, said I'm curious to see the results of this, will your comment be downvoted more or will this comment be? In fact, I'll even add a handicap, I just gave your comment a strong upvote. To be honest, I think most people are past the anger stage and more focused on how to improve it next year.

I disagree. Having a culture of retroactively rewarding and punishing people is an important precedent to set.

I'd say 58% my comment receives more downvotes:) But I stand by it -- until convinced otherwise with arguments, not points!:

actually, maybe I already changed my mind. see my reply to Ben's comment.

I also appreciate that you played along, so part of me also want to upvote your comment

Expectation that having principles that are just given up when push comes to shove is a really dangerous precent to set. It would turn any talk in a stressful moment to be unreliable "cheap talk".

Good job with the meta-prediction - it gave me enough pause to think, so I didn't actually downvote the suggestion.  I strongly disagree with it - Chris has provided hours of entertainment and a useful demonstration that this (whatever it is) isn't trivial.  I don't pay attention to other's karma generally, but it looks like they've gained fewer than 150 points for their explanation and comments, so I'll likely drop a few more upvotes on.

I think if we instill this norm, it might create an incentive

People keep asking for incentives, sometimes to incentivise people doing it, sometimes to incentivise people not to do it.. I just don't think it's the point, incentive in either direction. We make a destructive button, and we find out whether people press it, and we try to all live up to the standard the Petrov set, of taking responsibility for the consequences even though nobody told him to.

hummmmm. ok maybe I agree. plus, maybe the more interesting function to optimize is who to select to hold a code rather than how to create proper incentives.

Yeah, that's the main point. To have a large group of people who live up to Petrov.

I would find it really inspiring for us to be able to select a large group of people that don't press the button "just because it's our symbolic tradition" :)