Thoughts on AI Safety Camp

Charlie Steiner

Thoughts on AI Safety Camp

9 min read13th May 20228 comments

32

AI Safety CampPostmortems & RetrospectivesAI

Early this year I interviewed a sample of AISC participants and mentors, and spent some time thinking about the problems the AI safety research community is facing, and have changed my mind about some things.

AI Safety Camp is a program that brings together applicants into teams, and over about a hundred hours of work those teams do AI safety-related projects that they present at the end (one project made it into a Rob Miles video). I think it's really cool, but what exactly it's good for depends on a lot of nitty gritty details that I'll get into later.

Who am I to do any judging? I'm an independent alignment researcher, past LW meetup organizer, physics PhD, and amateur appliance repairman. What I'm not is a big expert on how people get into alignment research - this post is a record of me becoming marginally more expert.

The fundamental problem is how to build an ecosystem of infrastructure that takes in money and people and outputs useful AI safety research. Someone who doesn't know much about AISC (like my past self) might conceive of many different jobs it could be doing within this ecosystem:

Educating relative newcomers to the field and getting them more interested in doing research on AI alignment.
Providing project opportunities that are a lot like university class projects - contributing to the education of people in the process of skilling up to do alignment research.
Providing potentially-skilled and potentially-interested people a way to "test their fit" to see if they want to commit to doing more AI alignment work.
Catalyzing the formation of groups and connections that will persist after the end of the camp.
Helping skilled and interested people send an honest signal of their alignment research skills to future employers and collaborators.
Producing object-level useful research outputs.

In addition to this breakdown, there's orthogonal dimensions of what parts of AI safety research you might specialize to support:

Conceptual or philosophical work.
Machine learning projects.
Mathematical foundations.
Policy development.
Meta-level community-building.

Different camp parameters (length, filters on attendees, etc.) are better-suited for different sorts of projects. This is why AISC does a lot of machine learning projects, and why there's a niche for AISC alum Adam Shimi to start a slightly different thing focused on conceptual work (Refine).

III

Before talking to people, I'd thought AISC was 35% about signalling to help break into the field, 25% about object-level work, and 15% about learning, plus leftovers. Now I think it's actually 35% about testing fit, 30% about signalling, and 15% about object-level work, plus different leftovers.

It's not that people didn't pick projects they were excited about, they did. But everyone I asked acknowledged that the length of the camp wasn't that long, they weren't maximally ambitious anyhow, and they just wanted to produce something they were proud of. What was valuable to them was often what they learned about themselves, rather than about AI.

Or maybe that's too pat, and the "testing fit" thing is more about "testing the waters to make it easier to jump in." I stand by the signalling thing, though. I think we just need more organizations trying to snap up the hot talent that AISC uncovers.

Looking back at my list of potential jobs for AISC (e.g. education, testing fit, catalyzing groups, signalling) I ordered them roughly by the assumed skill level of the participants. I initially thought AISC was doing things catered to all sorts of participants (both educating newcomers and helping skilled researchers signal their abilities, etc.), while my revised impression is that they focus on people who are quite skilled and buy into the arguments for why this is important, but don't have much research experience (early grad school vibes). In addition to the new program Refine, another thing to compare to might be MLSS, which is clearly aimed at relative beginners.

When I talked to AISC participants, I was consistently impressed by them - they were knowledgeable about AI safety and had good ML chops (or other interesting skills). AISC doesn't need to be in the business of educating newbies, because it's full of people who've already spent a year or three considering AI alignment and want to try something more serious.

The size of this demographic is actually surprisingly large - sadly the organizers who might have a better idea didn't talk to me, but just using the number applying to AISC as the basis for a Fermi estimate (by guessing that only 10-20% of people who want to try AI alignment research had the free time and motivation to apply) gets you to >2000 people. This isn't really a fixed group of people, either - new people enter by getting interested in AI safety and learning about AI, and leave when they no longer get much benefit from the fit-testing or signalling in AISC. I would guess this population leaves room for ~1 exact copy of AISC (on an offset schedule), or ~4 more programs that slightly tweak who they're appealing to.

Most participants cut their teeth on AI alignment through independent study and local LW/EA meetup groups. People are trying various things (see MLSS above) to increase the amount of tooth-cutting going on, and eventually the end game might be to have AI safety just be "in the water supply," so that people get exposed to it in the normal course of education and research, or can take a university elective on it to catch up most of the way to the AISC participants.

The people I talked to were quite positively disposed to AISC. At the core, people were glad to be working on projects that excited them, and liked working in groups and with a bit of extra support/motivational structure.

Some people attended AISC and decided that alignment research wasn't for them, which is a success in its own way. On average, I think attending made AI alignment research feel "more real," and increased peoples' conviction that they could contribute to it. Several people I talked to came away with ideas only tangentially related to their project that they were excited to work on - but of course it's hard to separate this from the fact that AISC participants are already selected for being on a trajectory of increasing involvement in AI safety.

In contrast, the mentorship aspect was surprisingly (to me) low-value to people. Unless the mentor really put in the hours (which most understandably did not), decisions about each project were left in the hands of the attendees, and the mentor was more like an occasional shoulder angel plus useful proofreader of their final report. Not pointless, but not crucial. This made more sense as I came to see AISC as not being in the business of supplying education from outside.

Note that in the most recent iteration that I haven't interviewed anyone from, the format of the camp has changed - projects now come from the mentors rather than the groups. I suspect this is intended to solve a problem where some people just didn't pick good projects and ran into trouble. But it's not entirely obvious whether the (probable) improvement of topics dominates the effects on mentor and group engagement etc., so if you want to chat about this in the comments or with me via video call, I have more questions I'd be interested to ask.

Another thing that people didn't care about that I'd thought they would was remote vs. in-person interaction. In fact, people tended to think they'd prefer the remote version (albeit not having tried both). Given the lower costs and easier logistics, this is a really strong point in favor of doing group projects remotely. It's possible this is peculiar to machine learning projects, and [insert other type of project here] would really benefit from face to face interaction. But realistically, it looks like all types should experiment with collaborating over Discord and Google Docs.

What are the parameters of AISC that make it good at some things and not others?

Here's a list of some possible topics to get the juices flowing:

Length and length variability.
Filtering applicants.
Non-project educational content.
Level of mentor involvement.
Expectations and evaluation.
Financial support.
Group size and formation conditions.
Setting and available tools.

Some points I think are particularly interesting:

Length and length variability: Naturally shorter time mandates easier projects, but you can have easy projects across a wide variety of sub-fields. However, a fixed length (if somewhat short) also mandates lower-variance projects, which discourages the inherent flailing around of conceptual work and is better suited to projects that look more like engineering.

Level of mentor involvement: Giving participants more supervision might reduce length variability pressure and increase the object-level output, but reduce the signalling power of doing a good job (particularly for conceptual work). On the other hand, participating in AISC at all seems like it would still be a decent sign of having interesting ideas. The more interesting arguments against increasing supervision are that it might not reduce length variability pressure by much (mentors might have ideas that are both variable between-ideas and that require an uncertain amount of time to accomplish, similar to the participants), and might not increase the total object-level output, relative to the mentor and participants working on different topics on the margin.

Evaluation: Should AISC be grading people or giving out limited awards to individuals? I think that one of its key jobs is certainly giving honest private or semi-private feedback to the participants. But should it also be helping academic institutions or employers discriminate between participants to increase its signalling power? I suspect that with current parameters there's enough variation in project quality to serve as a signal already if necessary, and trying to give public grades on other things would be shouldering a lot of trouble with perverse incentives and hurt feelings for little gain.

You can get lots of variations on AISC's theme by tweaking the parameters, including variations that fill very different niches in the AI safety ecosystem. For example, you could get the ML for Alignment Bootcamp with different settings of applicant filtering, educational content, group size, and available tools.

On the other hand, there are even more different programs that would have nontrivial values of "invisible parameters" that I never would have thought to put on the list of properties of AISC (similar to how "group size" might be an invisible parameter for MLAB). These parameters are merely an approximate local coordinate system for a small region of infrastructure-space.

What niches do I think especially need filling? For starters, things that fit into a standard academic context. We need undergrad- and graduate-level courses developed that bite off various chunks of the problems of AI alignment. AISC and its neighbors might tie into this by helping with the development of project-based courses - what project topics support a higher amount of educational content / teacher involvement, while still being interesting to do?

We also need to scale up the later links in the chain, focused on the production of object-level research. Acknowledging that this is still only searching over a small part of the space, we can ask what tweaks to the AISC formula would result in something more optimized for research output. And I think the answer is that you can basically draw a continuum between AISC and a research lab in terms of things like financial support, filtering applicants, project length, etc. Some of these variables are "softer" than others - it's a lot easier to match MIRI on project length than it is to match them on applicant filtering.

VII

Should you do AISC? Seems like a reasonable thing for me to give an opinion about, so I'll try to dredge one up.

You should plausibly do it IF:

(

You have skills that would let you pull your weight in an ML project.

You've looked at the AISC website's list of topics and see something you'd like to do.

)

AND

You know at least a bit about the alignment problem - at the very least you are aware that many obvious ways to try to get what we want from AI do not actually work.

AND

(

You potentially want to do alignment research, and want to test the waters.

You think working on AI alignment with a group would be super fun and want to do it for its own sake.

You want to do alignment research with high probability but don't have a signal of your skillz you can show other people.

)

This is actually a sneakily broad recommendation, and I think that's exactly right. It's the people on the margins, those who aren't sure of themselves, the people who could only be caught by a broad net that most benefit from something like this. So if that's you, think about it.

AI Safety CampPostmortems & RetrospectivesAI

Frontpage

32

Thoughts on AI Safety Camp

New Comment

8 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:26 AM

[-]Karl von Wendt2y140

As a participant, I probably don't fit the "typical" AISC profile: I'm a writer, not a researcher (even though I've got a Ph.D. in symbolic AI), I'm at the end of my career, not the beginning (I'm 61). That I'm part of AISC is due to the fact that this time, there was a "non-serious" topic included in the camp's agenda: Designing an alignment tabletop role-playing game (based on an idea by Daniel Kokotajlo). Is this a good thing?

For me, it certainly was. I came to AISC mostly to learn and get connections into the AI alignment community, and this worked very well. I feel like I know a lot less about alignment than I thought I knew at the start of the camp, which is a sure sign that I learned a lot. And I made a lot of great and inspiring contacts, even friendships, some of which I think will stay long after the camp is over. So I'm extremely happy and grateful that I had the opportunity to participate.

But what use am I to AI alignment? Well, together with another participant, Jan Kirchner, I did try to contribute an idea, but I'm not sure how helpful that is. However, one thing I can do: As a writer, I can try to raise awareness for the problem. That is the reason I participated in the first place. I see a huuuuuge gap between the importance and urgency of AI alignment and the attention it gets outside the community, among people who probably could do something about it, e.g. politicians and "established" scientists. For example, in Germany, we have the "Institut für Technikfolgenabschätzung" (ITAS) which claims on its website to be the leading institute for technology assessment. I asked them whether they are working on AI alignment. Apparently, they aren't even aware that there IS a problem. The same seems to be true for the scientific establishment in the rest of Germany and the EU.

You may question how helpful it is to get people like them to work on alignment. But I think that if we hope to solve the problem in time, we need as much attention on it as possible. There are some smart people at ITAS and elsewhere, and it would be great to get them to work on the problem, even if it seems a bit late. Maybe we need just one brilliant idea, and the more people are searching for it, the more likely it is to find it, I think. It could also be that there is no solution, in which case it is even more important that as many people as possible agree on that, the more established and accepted, the better. If we need regulation, or try to implement a global ban or freeze on AGI research, we need as much support as possible.

So that's what I'm trying to do, with my limited outreach outside of the AI alignment community. My participation in AISC taught me many things and helped me get my message straight. A lot of it will probably find its way into my next novel. And maybe our tabletop RPG will also help spreading the message. All in all, I think it was a good idea to broaden the scope of AISC a bit, and I recommend doing it again. Thank you very much, Remmelt, Daniel, and all the others for taking me in!

[-]Chris_Leong2y70

I think it's great that you're thinking about how you can use your writing skills to further alignment. If you're thinking about contacting politicians or people who are famous, I'd suggest reaching out to CEA's community health team first for advice on how to ensure this goes well.

[-]Karl von Wendt2y20

Thank you, I will!

[-]Remmelt2y120

I am the program coordinator of AI Safety Camp. Let me respond with personal impressions / thoughts:

Apologies, Charlie, that we did not get to call before you wrote this post. Busy months for me, and I had misinterpreted your request as you broadly reaching out to interview organisers of various programs.

First, respect for the thoroughness and consideration of your writing:

It is useful to get an outside perspective of how AI Safety Camp works for participants.
- In this sense, I am glad that we as organisers did not get to talk with you yet, which might have 'anchored' this post more on our notions of what the camp is about.
  - Hoping that you and I can still schedule a reverse interview, where I can listen to and learn from your ideas!
- Noting that we also welcome honest criticism of AI Safety Camp that could help us rethink or improve the format and/or the way we coordinate editions.
  - I would personally value if someone could do background research at least half as well as Charlie and play devil's advocate: come up with arguments against AISC's current design or 'set parameters' being any good for helping certain (potential) participants to contribute to AI existential safety.
  - Write a quality post and it will get my strong upvote at least!
Glad to have your ideas on parameters to tweak and what to consider focussing on doing well so we can serve new participants better (to come to contribute at the frontiers of preventing the existential risk posed by AI developments).
- For example, you made me think that maybe the virtual edition could be adapted to cater for remote ML engineering teams in particular.
- Where conceptual research in a group setting may just tend to work better through spontaneous chats and flip-chart scribbles at a physical retreat.
I find myself broadly agreeing with most of your descriptions of whom the camp is for and how we serve our participants.

On ways the camp serves participants looking to contribute to AI x-safety research:

35% about testing fit, 30% about signalling, and 15% about object-level work, plus different leftovers.

The relative weighting above matches my impressions, at least for past editions (AISC 1-5).
- Having said that, making connections with other aspiring researchers (fellow participants, organisers, speakers, research authors) mattered a lot for some alumni's career trajectories.
  - I am not sure how to even introduce a separate weight for 'networking' given the overlap with 'signalling' and 'testing fit' and leftovers like 'heard about a grant option; started an org'.
  - BTW your descriptions related to 'testing fit' resonated with me!
    > What was valuable to them was often what they learned about themselves, rather than about AI....
    > Some people attended AISC and decided that alignment research wasn't for them, which is a success in its own way. On average, I think attending made AI alignment research feel "more real," and increased peoples' conviction that they could contribute to it. Several people I talked to came away with ideas only tangentially related to their project that they were excited to work on - but of course it's hard to separate this from the fact that AISC participants are already selected for being on a trajectory of increasing involvement in AI safety.
- Also, in the 'leftovers' bucket, there is a lot of potential for tail events – where people's experiences at the camp either strongly benefit or strongly harm their future collaborations on research for preventing technology-induced existential risks.
  For example:
  - Benefits: Research into historically overlooked angles (e.g. alignment impossibility arguments, human biology-based alignment) sparks new insights and reflections that shift the paradigm within which next generation AI x-safety researchers conduct their research.
  - Harms: We serve alcohol at a fun end-of-camp party, fail at monitoring and checking in with participants, and then someone really ends up ignoring another person's needs and/or crosses their personal boundaries.
- Finally, I would make an ends-means distinction here:
  - I would agree that at past formats, the value object-level work during the program seemed proportionally smaller than the value of testing fit and networking.
  - At the same time, I and other past organisers believe that individual participants actually trying to do well-considered rigorous research together helps a lot with them making non-superficial non-fluky traction toward actually contributing at the frontiers of the community's research efforts.
For future physical editions (in Europe, US, and Asia-Pacific):
- I would guess that...
  - signalling (however we define this and its benefits/downsides) is held about constant.
  - object-level work (incl. deconfusing meta-level questions) and testing fit (incl. for working on a new research probles, if already decided on a research career) swap weights.
- I.e. 35% about object-level work, 30% about signalling, 15% about testing fit, plus different leftovers (leaving aside 'networking' and 'camp tail events').
Note that edition formats have been changing over time (as you mentioned yourself):
- The first camp was a rather grassroots format where participants already quite knowledgeable / connected / experienced in AI safety research could submit their proposals and gather into teams around a chosen research topic.
- At later editions, we admitted participants who had spent less time considering what research problems to work on, and we managed to connect at most a few teams per camp with a fitting mentor (mostly, we provided a list of willing mentors a team could reach out to after the team already had decided on a research topic).
- At the sixth edition and current edition, we finally rearranged and refined the format to serve our mentors better. The current virtual edition involved people applying to collaborate on an open research problem picked by one of our mentors (progress of the teams so far are mixed, but based on recent feedback about 3 mentors were a little negative, and 3 others were somewhat to very positive about having mentored a team).
- The next physical edition in Europe will be about creating ~6 research spaces for some individual past participants and reviewers – who are somewhat experienced at facilitating research – to invite over independent researchers to dig together into an arguably underexplored area of research (on the continuum you mentioned, AISC7 is nearer to the end of a research lab).

On your points re: parameters of AISC that make it good at some things and not others:

Length and length variability: Naturally shorter time mandates easier projects, but you can have easy projects across a wide variety of sub-fields. However, a fixed length (if somewhat short) also mandates lower-variance projects, which discourages the inherent flailing around of conceptual work and is better suited to projects that look more like engineering.

Current camp durations:
- For the yearly virtual edition (early Jan to early June), the period is roughly 5 months – from initial onboarding and discussions to research presentations and planning post-camp steps.
- For the upcoming physical edition, the period is roughly 2 months (Sep to Oct).
Some/all future editions of AISC may specialise in enabling research in less-explored areas and based on less-explored paradigms (meaning higher variance in the value of the projects' outputs).
- In which case, the length and/or intensity of research (in terms of hours per week, one-on-one interactions) at editions will go up.

Level of mentor involvement: ...The more interesting arguments against increasing supervision are that it might not reduce length variability pressure by much (mentors might have ideas that are both variable between-ideas and that require an uncertain amount of time to accomplish, similar to the participants),

This seems more likely than not to me (holding constant how 'exploratory' the area of research is).
- In part because half or more of the teams at the current virtual edition ended up exploring angles that were different than mentors had planned for.

... and might not increase the total object-level output, relative to the mentor and participants working on different topics on the margin.

This is definitely the case in the short run (i.e. over the 5-month period of the virtual edition).
I feel very unsure when it comes to the long run (i.e. 'total object-level output' over the decades following the participants' collaboration with a mentor at an edition).
- Overall, I guess the current average degree of mentor involvement is >10% more likely to increase 'total output' than not.
  - Where the reference of comparison is quick mentor feedback on the team's initial proposal and any later draft on their research findings.
  - Where average output on the upside ('increase in total') is higher than it is lower on the downside ('decrease in total') when working from established alignment research paradigms.
    - Along with a decrease in the likelihood that team outputs lead to any important paradigmatic changes in AI x-safety research.
- Also need to account for that a mentor will occasionally meet a capable independent-minded researcher at the camp and afterwards continue to collaborate with them individually (this seems probably the case for ~2 participants at the current virtual edition).

Evaluation: Should AISC be grading people or giving out limited awards to individuals?

In the early days of AISC, we discussed whether to try to evaluate participant performance during the camp so we could recommend and refer alumni to senior researchers in the community.
- We decided against internal evaluations because that could break apart the collaborative culture at the camp.
- Basically it could leave people feeling discomfortable about 'getting watched', and encourage some individuals to compete with each other to display 'their work' (also who am I kidding: for organisers to manage evaluation of performance on this variety of open problems?).

Nuances on a few of your considerations:

AISC doesn't need to be in the business of educating newbies, because it's full of people who've already spent a year or three considering AI alignment and want to try something more serious.

There are interesting trade-offs between 'get newcomers up to speed' vs. 'foster cognitive diversity':
- There are indeed more aspiring contributors now who spent multiple years considering work in AI existential safety. Also, there are now other programs specialising in bringing students and others up to speed with AI concepts and considerations, like AGI Safety Fundamentals and MLSS (and to a lesser extent, the top-university-named 'existential risk initiative' programs).
  - So agreed there that AISC does not have a comparative advantage in educating newcomers there, and also that this 'part of the pipeline' is no longer a key bottleneck.
  - We never have been in the business of educating people (despite mention of 'teach' on our website, which I've been thinking of rewriting). Rather, people self-study, apply and do work on their own initiative.
  - In this sense, AISC can offer people who self-studied or e.g. completed AGISF a way to get their hands dirty on a project at the yearly virtual camp (and from there go on to contribute to a next AISC edition, say, or apply for SERI MATS).
- On the other hand, my sense is that most of the people in the crowd you mentioned share roughly similar backgrounds (in terms of STEM disciplines and in being WEIRD: Western, Educated, from Industrialised & Democratic nations).
  - Many of the aspiring AI x-safety researchers to me appear to broadly share similar inclinations in how they analyse and perceive problems in the world around them (with corresponding blindspots – I tried to summarise some here).
  - The relatively homogenous reasoning culture of our community is concerning in the sense that where the AI x-safety community collectively shares the same blindspots (reflected in forum 'Schelling points' of topics and discussions), individuals participating will tend to overlook any crucial considerations there (in those blindspots) that are relevant for us to help prevent AI developments from destroying human society and all of biological life.
  - We as organisers look to reach out to and serve individual persons who can bring in their diverse research disciplines, skills and perspectives. We are more accommodating here in terms of how much time such diverse applicants have spent upfront reading about and engaging with AI existential safety research (given that they would have heard less about our community's work), and try where we can to assist persons individually with getting up to speed.
  - Here, your suggested 'program fit' guideline definitely applies:
    > You know at least a bit about the alignment problem - at the very least you are aware that many obvious ways to try to get what we want from AI do not actually work.

This isn't really a fixed group of people, either - new people enter by getting interested in AI safety and learning about AI, and leave when they no longer get much benefit from the fit-testing or signalling in AISC. I would guess this population leaves room for ~1 exact copy of AISC (on an offset schedule), or ~4 more programs that slightly tweak who they're appealing to.

Potential participants need to distinguish programs and work out which would serve their needs better. We as organisers need to keep org scopes clear. So I am not excited about the 'exact copy' angle (of course, it would also get forced and unrealistic if someone tries to copy over the cultural nuances and the present organisers' ways of relating with and serving participants).
I would be curious to explore ideas for new formats with anyone who noticed a gap in what AISC and other AIS research training programs do, and who is considering trying out a pilot for a new program that takes a complementary angle on the AI Safety Camp. Do message me!

[-]Charlie Steiner2y30

Thanks for this mammoth comment!

[-]Remmelt2y10

Happy to. Glad to hear any follow-up thoughts you have!

[-]Remmelt2y30

Ah, adding this here:

I personally do not tend to think of AISC as converting money and interested people into useful research. For me, that conjures up the image of a scaleable machine where you can throw in more inputs to spit out more of the output.

I view AISC more as designing processes together that tend toward better outcomes (which we can guess at but do not know about beforehand!).
Or as a journey of individual and shared exploration that people – specifically aspiring researchers– go through who are committed to ensuring unsafe AI does not destroy human society and all of biological life.

[-]Linda Linsefors4mo20

I just found this post (yesterday) while searching the EA Forum archives for something else.

I've been co-organising AISC1 (2018), AISC8 (2023) and AISC9 (2024). This means that I was not involved when this was posted which is why I missed it.

What you describe fits very well with my own view of AISC, which is reassuring.

Moderation Log