Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I'll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:

(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."
(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)

Sequences

Agency: What it is and why it matters

AI Timelines

Takeoff and Takeover in the Past and Future

Posts

Sorted by New

5Daniel Kokotajlo's Shortform

360

61Self-Awareness: Taxonomy and eval suite proposal

3mo

264AI Timelines

6mo

58Linkpost for Jan Leike on Self-Exfiltration

8mo

106Paper: On measuring situational awareness in LLMs

8mo

39AGI is easier than robotaxis

9mo

61Pulling the Rope Sideways: Empirical Test Results

10mo

39What money-pumps exist, if any, for deontologists?

55The Treacherous Turn is finished! (AI-takeover-themed tabletop RPG)

41My version of Simulacra Levels

13Kallipolis, USA

Wiki Contributions

Comments

elifland's Shortform

Daniel Kokotajlo17h42

I feel like this should be a top-level post.

Teaching CS During Take-Off

Daniel Kokotajlo2d44

Gotcha. A tough situation to be in.

What about "Keep studying and learning in the hopes that (a) I'm totally wrong about AGI timelines and/or (b) government steps in and prevents AGI from being built for another decade or so?"

What about "Get organized, start advocating to make b happen?"

Teaching CS During Take-Off

Daniel Kokotajlo2d92

Yep. Or if we wanna nitpick and be precise, better than the best humans at X, for all cognitive tasks/skills/abilities/jobs/etc. X.
>50%.

Teaching CS During Take-Off

Daniel Kokotajlo2d2614

I think that's a great answer -- assuming that's what you believe.

For me, I don't believe point 3 on the AI timelines -- I think AGI will probably be here by 2029, and could indeed arrive this year. And even if it goes well and humans maintain control and we don't get concentration-of-power issues... the software development skills your students are learning will be obsolete, along with almost all skills.

Applying refusal-vector ablation to a Llama 3 70B agent

Daniel Kokotajlo5d64

In general, Llama 3 70B is a competent agent with appropriate scaffolding, and Llama 3 8B also has decent performance.

I'm curious about, and skeptical of, this claim. If you set it up in an Auto-GPT-esque scaffold with connections to the internet and ability to edit docs and make forum comments and emails and so forth, and set it loose with some long-term goal like "accumulate money" or "befriend people" or whatever... does it actually chug along for hours and hours moving vaguely in the right direction, or does it e.g. get stuck pretty quickly or go into some sort of confused doom spiral?

simeon_c's Shortform

Daniel Kokotajlo6d11416

The latter. Yeah idk whether the sacrifice was worth it but thanks for the support. Basically I wanted to retain my ability to criticize the company in the future. I'm not sure what I'd want to say yet though & I'm a bit scared of media attention.

simeon_c's Shortform

Daniel Kokotajlo7d2222

To clarify: I did sign something when I joined the company, so I'm still not completely free to speak (still under confidentiality obligations). But I didn't take on any additional obligations when I left.

Unclear how to value the equity I gave up, but it probably would have been about 85% of my family's net worth at least. But we are doing fine, please don't worry about us.

Uncontrollable Super-Powerful Explosives

Daniel Kokotajlo9d20

Fair enough.

Q&A on Proposed SB 1047

Daniel Kokotajlo9d21

What do you mean "of this type?" Why not just say "laws" full stop? What type are you referring to?

Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

Daniel Kokotajlo10dΩ690

Do you have any sense of whether or not the models thought they were in a simulation?