Ryan Kidd

Give me feedback! :)

Current

Past

  • Ph.D. in Physics from the University of Queensland (2017-2022)
  • Group organizer at Effective Altruism UQ (2018-2021)

Wiki Contributions

Comments

Do you think there are some cultural things that ought to be examined to figure out why scaling labs are so much more attractive than options that at-least-to-me seem more impactful in expectation?


As a naive guess, I would consider the main reasons to be:

  • People seeking jobs in AI safety often want to take on "heroic responsibility." Work on evals and policy, while essential, might be seen as "passing the buck" onto others, often at scaling labs, who have to "solve the wicked problem of AI alignment/control" (quotes indicate my caricature of a hypothetical person). Anecdotally, I've often heard people in-community disparage AI safety strategies that primarily "buy time" without "substantially increasing the odds AGI is aligned." Programs like MATS emphasizing the importance of AI governance and including AI strategy workshops might help shift this mindset, if it exists.
  • Roles in AI gov/policy, while impactful at reducing AI risk, likely have worse quality-of-life features (e.g., wages, benefits, work culture) than similarly impactful roles in scaling labs. People seeking jobs in AI safety might choose between two high-impact roles based on these salient features without considering how many others making the same decisions will affect the talent flow en masse. Programs like MATS might contribute to this problem, but only if the labs keep hiring talent (unlikely given poor returns on scale) and the AI gov/policy orgs don't make attractive offers (unlikely given METR and Apollo pay pretty good wages, high status, and work cultures comparable to labs; AISIs might be limited because government roles don't typically pay well, but it seems there are substantial status benefits to working there).
  • AI risk might be particularly appealing as a cause area to people who are dispositionally and experientially suited to technical work and scaling labs might be the most impactful place to do many varieties of technical work. Programs like MATS are definitely not a detriment here, as they mostly attract individuals who were already going to work in technical careers, expose them to governance-adjacent research like evals, and recommend potential careers in AI gov/policy.

Cheers, Akash! Yep, our confirmed mentor list updated in the days after publishing this retrospective. Our website remains the best up-to-date source for our Summer/Winter plans.

Do you think this is the best thing for MATS to be focusing on, relative to governance/policy?

MATS is not currently bottlenecked on funding for our current Summer plans and hopefully won't be for Winter either. If further interested high-impact AI gov mentors appear in the next month or two (and some already seem to be appearing), we will boost this component of our Winter research portfolio. If ERA disappeared tomorrow, we would do our best to support many of their AI gov mentors. In my opinion, MATS is currently not sacrificing opportunities to significantly benefit AI governance and policy; rather, we are rate-limited by factors outside of our control and are taking substantial steps to circumvent these, including:

  • Substantial outreach to potential AI gov mentors;
  • Pursuing institutional partnerships with key AI gov/policy orgs;
  • Offering institutional support and advice to other training programs;
  • Considering alternative program forms less associated with rationality/longtermism;
  • Connecting scholars and alumni with recommended opportunities in AI gov/policy;
  • Regularly recommending scholars and alumni to AI gov/policy org hiring managers.

We appreciate further advice to this end!

Do you think there are some cultural things that ought to be examined to figure out why scaling labs are so much more attractive than options that at-least-to-me seem more impactful in expectation?

I think this is a good question, but it might be misleading in isolation. I would additionally ask:

  • "How many people are the AISIs, METR, and Apollo currently hiring and are they mainly for technical or policy roles? Do we expect this to change?"
  • "Are the available job opportunities for AI gov researchers and junior policy staffers sufficient to justify pursuing this as a primary career pathway if one is already experienced at ML and particularly well-suited (e.g., dispositionally) for empirical research?"
  • "Is there a large demand for AI gov researchers with technical experience in AI safety and familiarity with AI threat models, or will most roles go to experienced policy researchers, including those transitioning from other fields? If the former, where should researchers gain technical experience? If the latter, should we be pushing junior AI gov training programs or retraining bootcamps/workshops for experienced professionals?"
  • "Are existing talent pipelines into AI gov/policy meeting the needs of established research organizations and think tanks (e.g., RAND, GovAI, TFS, IAPS, IFP, etc.)? If not, where can programs like MATS/ERA/etc. best add value?"
  • "Is there a demand for more organizations like CAIP? If so, what experience do the founders require?"

Of the scholars ranked 5/10 and lower on value alignment, 63% worked with a mentor at a scaling lab, compared with 27% of the scholars ranked 6/10 and higher. The average scaling lab mentors rated their scholars' value alignment at 7.3/10 and rated 78% of their scholars at 6/10 and higher, compared to 8.0/10 and 90% for the average non-scaling lab mentor. This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).

I also want to push back a bit against an implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research; this seems manifestly false from my conversations with mentors, their scholars, and the broader community.

It seems plausible to me that at least some MATS scholars are somewhat motivated by a desire to work at scaling labs for money, status, etc. However, the value alignment of scholars towards principally reducing AI risk seems generally very high. In Winter 2023-24, our most empirical research dominated cohort, mentors rated the median scholar's value alignment at 8/10 and 85% of scholars were rated 6/10 or above, where 5/10 was “Motivated in part, but would potentially switch focus entirely if it became too personally inconvenient.” To me this is a very encouraging statistic, but I’m sympathetic to concerns that well-intentioned young researchers who join scaling labs might experience value drift, or find it difficult to promote safety culture internally or sound the alarm if necessary; we are consequently planning a “lab safety culture” workshop in Summer. Notably, only 3.7% of surveyed MATS alumni say they are working on AI capabilities; in one case, an alumnus joined a scaling lab capabilties team and transferred to working on safety projects as soon as they were able. As with all things, maximizing our impact is about striking the right balance between trust and caution and I’m encouraged by the high apparent value alignment of our alumni and scholars.

We additionally believe:

  1. Advancing researchers to get hired at lab safety teams is generally good;
  2. We would prefer that the people on lab safety teams have more research experience and are more value-aligned, all else equal, and we think MATS improves scholars on these dimensions;
  3. We would prefer lab safety teams to be larger, and it seems likely that MATS helps create a stronger applicant pool for these jobs, resulting in more hires overall;
  4. MATS creates a pipeline for senior researchers on safety teams to hire people they have worked with for up to 6.5 months in-program, observing their compentency and value alignment;
  5. Even if MATS alumni defect to work on pure capabilities, we would still prefer them to be more value-aligned than otherwise (though of course this has to be weighed against the boost MATS gave to their research abilities).

Regarding “AI control,” I suspect you might be underestimating the support that this metastrategy has garnered in the technical AI safety community, particularly among prosaic AGI safety thought leaders. I see Paul’s decision to leave ARC in favor of the US AISI as a potential endorsement of the AI control paradigm over intent alignment, rather than necessarily an endorsement of an immediate AI pause (I would update against this if he pushes more for a pause than for evals and regulations). I do not support AI control to the exclusion of other metastrategies (including intent alignment and Pause AI), but I consider it a vital and growing component of my strategy portfolio.

It’s true that many AI safety projects are pivoting towards AI governance. I think the establishment of AISIs is wonderful; I am in contact with MATS alumni Alan Cooney and Max Kauffman at the UK AISI and similarly want to help the US AISI with hiring. I would have been excited for Vivek Hebbar’s, Jeremy Gillen’s, Peter Barnett’s, James Lucassen’s, and Thomas Kwa’s research in empirical agent foundations to continue at MIRI, but I am also excited about the new technical governance focus that MATS alumni Lisa Thiergart and Peter Barnett are exploring. I additionally have supported AI safety org accelerator Catalyze Impact as an advisor and Manifund Regrantor and advised several MATS alumni founding AI safety projects; it's not easy to attract or train good founders!

MATS has been interested in supporting more AI governance research since Winter 2022-23, when we supported Richard Ngo and Daniel Kokotajlo (although both declined to accept scholars past the training program) and offered support to several more AI gov researchers. In Summer 2023, we reached out to seven handpicked governance/strategy mentors (some of which you recommended, Akash), though only one was interested in mentoring. In Winter 2023-24 we tried again, with little success. In preparation for the upcoming Summer 2024 and Winter 2024-25 Programs, we reached out to 25 AI gov/policy/natsec researchers (who we asked to also share with their networks) and received expressions of interest from 7 further AI gov researchers. As you can see from our website, MATS is supporting four AI gov mentors in Summer 2024 (six if you count Matija Franklin and Philip Moreira Tomei, who are primarily working on value alignment). We’ve additionally reached out to RAND, IAPS, and others to provide general support. MATS is considering a larger pivot, but available mentors are clearly a limiting constraint. Please contact me if you’re an AI gov researcher and want to mentor!

Part of the reason that AI gov mentors are harder to find is that programs like the RAND TASP, GovAI, IAPS, Horizon, ERA, etc. fellowships seem to be doing a great job collectively of leveraging the available talent. It’s also possible that AI gov researchers are discouraged from mentoring at MATS because of our obvious associations with AI alignment (it’s in the name) and the Berkeley longtermist/rationalist scene (we’re talking on LessWrong and operate in Berkeley). We are currently considering ways to support AI gov researchers who don’t want to affiliate with the alignment, x-risk, longtermist, or rationalist communities.

I’ll additionally note that MATS has historically supported much research that indirectly contributes to AI gov/policy, such as Owain Evans’, Beth Barnes’, and Francis Rhys Ward’s capabilities evals, Evan Hubinger’s alignment evals, Jeffrey Ladish’s capabilities demos, Jesse Clifton’s and Caspar Oesterheldt’s cooperation mechanisms, etc.

Yeah, that amount seems reasonable, if on the low side, for founding a small org. What makes you think $300k is reasonably easy to raise in this current ecosystem? Also, I'll note that larger orgs need significantly more.

I think the high interest in working at scaling labs relative to governance or nonprofit organizations can be explained by:

  1. Most of the scholars in this cohort were working on research agendas for which there are world-leading teams based at scaling labs (e.g., 44% interpretability, 17% oversight/control). Fewer total scholars were working on evals/demos (18%), agent foundations (8%), and formal verification (3%). Therefore, I would not be surprised if many scholars wanted to pursue interpretability or oversight/control at scaling labs.
  2. There seems to be an increasing trend in the AI safety community towards the belief that most useful alignment research will occur at scaling labs (particularly once there are automated research assistants) and external auditors with privileged frontier model access (e.g., METR, Apollo, AISIs). This view seems particularly strongly held by proponents of the "AI control" metastrategy.
  3. Anecdotally, scholars seemed generally in favor of careers at an AISI or evals org, but would prefer to continue pursuing their current research agenda (which might be overdetermined given the large selection pressure they faced to get into MATS to work on that agenda).
  4. Starting new technical AI safety orgs/projects seems quite difficult in the current funding ecosystem. I know of many alumni who have founded or are trying to found projects who express substantial difficulties with securing sufficient funding.

Note that the career fair survey might tell us little about how likely scholars are to start new projects as it was primarily seeking interest in which organizations should attend, not in whether scholars should join orgs vs. found their own.

Can you estimate dark triad scores from the Big Five survey data?

You might be interested in this breakdown of gender differences in the research interests of the 719 applicants to the MATS Summer 2024 and Winter 2024-25 Programs who shared their gender. The plot shows the difference between the percentage of male applicants who indicated interest in specific research directions from the percentage of female applicants who indicated interest in the same.

The most male-dominated research interest is mech interp, possibly due to the high male representation in software engineering (~80%), physics (~80%), and mathematics (~60%). The most female-dominated research interest is AI governance, possibly due to the high female representation in the humanities (~60%). Interestingly, cooperative AI was a female-dominated research interest, which seems to match the result from your survey where female respondents were less in favor of "controlling" AIs relative to men and more in favor of "coexistence" with AIs.

This is potentially exciting news! You should definitely visit the LISA office, where many MATS extension program scholars are currently located.

Last program, 44% of scholar research was on interpretability, 18% on evals/demos, 17% on oversight/control, etc. In summer, we intend for 35% of scholar research to be on interpretability, 17% on evals/demos, 27% on oversight/control, etc., based on our available mentor pool and research priorities. Interpretability will still be the largest research track and still has the greatest interest from potential mentors and applicants. The plot below shows the research interests of 1331 MATS applicants and 54 potential mentors who have applied for our Summer 2024 or Winter 2024-25 Programs.

Load More