Announcement: AI alignment prize winners and next round

We (Zvi Mowshowitz, Vladimir Slepnev and Paul Christiano) are happy to announce that the AI Alignment Prize is a success. From November 3 to December 31 we received over 40 entries representing an incredible amount of work and insight. That's much more than we dared to hope for, in both quantity and quality.

In this post we name six winners who will receive $15,000 in total, an increase from the originally planned $5,000.

We're also kicking off the next round of the prize, which will run from today until March 31, under the same rules as before.

The winners

First prize of $5,000 goes to Scott Garrabrant (MIRI) for his post Goodhart Taxonomy, an excellent write-up detailing the possible failures that can arise when optimizing for a proxy instead of the actual goal. Goodhart’s Law is simple to understand, impossible to forget once learned, and applies equally to AI alignment and everyday life. While Goodhart’s Law is widely known, breaking it down in this new way seems very valuable.

Five more participants receive $2,000 each:

  • Tobias Baumann (FRI) for his post Using Surrogate Goals to Deflect Threats. Adding failsafes to the AI's utility function is a promising idea and we're happy to see more detailed treatments of it.
  • Vadim Kosoy (MIRI) for his work on Delegative Reinforcement Learning (1, 2, 3). Proving performance bounds for agents that learn goals from each other is obviously important for AI alignment.
  • John Maxwell (unaffiliated) for his post Friendly AI through Ontology Autogeneration. We aren't fans of John's overall proposal, but the accompanying philosophical ideas are intriguing on their own.
  • Alex Mennen (unaffiliated) for his posts on legibility to other agents and learning goals of simple agents. The first is a neat way of thinking about some decision theory problems, and the second is a potentially good step for real world AI alignment.
  • Caspar Oesterheld (FRI) for his post and paper studying which decision theories would arise from environments like reinforcement learning or futarchy. Caspar's angle of attack is new and leads to interesting results.

We'll be contacting each winner by email to arrange transfer of money.

We would also like to thank everyone who participated. Even if you didn't get one of the prizes today, please don't let that discourage you!

The next round

We are now announcing the next round of the AI alignment prize.

As before, we're looking for technical, philosophical and strategic ideas for AI alignment, posted publicly between now and March 31, 2018. You can submit your entries in the comments here or by email to We may give feedback on early entries to allow improvement, though our ability to do this may become limited by the volume of entries.

The minimum prize pool this time will be $10,000, with a minimum first prize of $5,000. If the entries once again surpass our expectations, we will again increase that pool.

Thank you!

(Addendum: I've written a post summarizing the typical feedback we've sent to participants in the previous round.)

36 comments, sorted by
magical algorithm
Highlighting new comments since Today at 11:33 AM
Select new highlight date
Moderation Guidelinesexpand_more

Wow, thanks a lot guys!

I'm probably not the only one who feels this way, so I'll just make a quick PSA: For me, at least, getting comments that engage with what I write and offer a different, interesting perspective can almost be more rewarding than money. So I definitely encourage people to leave comments on entries they read--both as a way to reinforce people for writing entries, and also for the obvious reason of making intellectual progress :)

I definitely wish I had commented more on them in general, and ran into a thing where a) the length and b) the seriousness of it made me feel like I had to dedicate a solid chunk of time to sit down and read, and then come up with commentary worth making (as opposed to just perusing it on my lunch break).

I'm not sure if there's a way around that (posting things in smaller chunks in venues where it's easy for people to comment might help, but my guess isn't the whole thing)

Just sent you some more feedback. Though you should also get comments from others, because I'm not the smartest person in the room by far :-)

This is freaking awesome - thank you so much for doing both this one and the new one.

Added: I think this is a really valuable contribution to the intellectual community - successfully incentivising research, and putting in the work on your end (assessing all the contributions and giving the money) to make sure solid ideas are rewarded - so I've curated this post.

Added2: And of course, congratulations to all the winners, I will try to read all of your submissions :-)

Thanks to you and Oliver for spreading the news about this!


This may sound very silly, but it had not occurred to me that blog posts might count as legitimate entries to this, and if I had realized that I might have tried to submit something. Writing this mostly in case it applies to others too.

It's sort of weird how "blogpost" and "paper" feel like such different categories, especially when AFAICT, papers tend to be, on average, less convenient and more poorly written blogposts.

The funny thing is that if you look at some old papers, they read a lot more like blog posts than modern papers. One of my favorite examples is the paper where Alan Turing introduced what's now known as the Turing test, and whose opening paragraph feels pretty playful:

I propose to consider the question, "Can machines think?" This should begin with definitions of the meaning of the terms "machine" and "think." The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous, If the meaning of the words "machine" and "think" are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, "Can machines think?" is to be sought in a statistical survey such as a Gallup poll. But this is absurd.

Blog posts don't have the shining light of Ra around them, of course.

If your blog posts would benefit from being lit with a Ra-coloured tint, we'd be more than happy to build this feature for you.

<adds to list of LessWrong April Fools Day ideas>

The distinction between papers and blog posts is getting weaker these days - e.g. is an ML blog with the shining light of Ra that's intended to be well-written and accessible.

The shining light of Ra may be doing useful work if the paper is peer-reviewed. Especially if it made it through the peer review process of a selective journal.

Fair, but a world where we can figure out how to bestow the shining light of Ra on selectively peer reviewed, clearly written blogposts seems even better. :P

Qiaochu, I'd love to see an entry from you in the current round.

Cool, this looks better than I'd been expecting. Thanks for doing this! Looking forward to next round.

Thank you Luke! I probably should've asked before, but if you have any ideas how to make this better organizationally, please let me know.

Datum: The existence of this prize has spurred me to put actual some effort into AI alignment, for reasons I don't fully understand--I'm confident it's not about the money, and even the offer of feedback isn't that strong an incentive, since I think anything worthwhile I posted on LW would get feedback anyway.

My guess is that it sends the message that the Serious Real Researchers actually want input from random amateur LW readers like me.

Also, the first announcement of the prize rules was in one ear and out the other for me. Reading this announcement of the winners is what made it click for me that this is something I should actually do. Possibly because I had previously argued on LW with one of the winners in a way that made my brain file them as my equal (admittedly, the topic of that was kinda bike-sheddy, but system 1 gonna system 1).

Awesome! I hadn't seen Caspar's idea, and I think it's a neat point on its own that could also lead in some new directions.

Edit: Also, I'm curious if I had any role in Alex's idea about learning the goals of a game-playing agent. I think I was talking about inferring the rules of checkers as a toy value-learning problem about a year and a half ago. It's just interesting to me to imagine what circuituitous route the information could have taken, in the case that it's not independent invention.

I don't think that was where my idea came from. I remember thinking of it during AI Summer Fellows 2017, and fleshing it out a bit later. And IIRC, I thought about learning concepts that an agent has been trained to recognize before I thought of learning rules of a game an agent plays.

I'm curious if these papers / blogs would have been written at some point anyway, or if they happened because of the call to action? And to what extend was the prize money a motivator?

Congratulations to the winners! Everyone, winners or not, submitted some great works that inspire a lot of thought.

Would it be possible for all of us submitters to get feedback on our entries that did not win so that we can improve entries for the next round?

I'll mention that one of the best ways for people to learn what sorts of submissions can get accepted, is to read the winning submissions in detail :-)

Hi Berick! I didn't send you feedback because your entry arrived pretty close to the deadline. But since you ask, I just sent you an email now.

Thanks! I didn't expect feedback prior to the first round closing since, as you said, my submission was (scarily) close to the deadline.

I have a paper which preprint was uploaded in December 2017 but which is expected to be officially published in the beggining of 2018. Is it possible to suggest the text to this round of the competition? The text in question is:

"Military AI as a Convergent Goal of Self-Improving AI"

Alexey Turchin & Denkenberger David

In Artificial Intelligence Safety and Security.

Louiswille: CRC Press (2018)

Okay, this project overshot my expectations. Congratulations to the winners!

Can we get a list of links to all the submitted entries from the first round?

This is a bit contentious but I don't think we should be the kind of prize that promises to publish all entries. A large part of the prize's value comes from restricting our signal boost to only winners.

You're right, keeping it the cream of the crop is a better idea.

I wouldn't mind feedback as well if possible. Mainly because I only dabble in AGI theory and not AI. So i'm curious to see the differance in thoughts/opinion/ fields, or however you wish to put it. Thanks in advance., and thanks to the contest host/judges. I learned a lot more about the (human) critic process then I did before.