Slowing AI: Foundations

Zach Stein-Perlman

Variables

"Slowing AI" is not monolithic. Many interventions have relevant side effects, positive or negative. There are many considerations, desiderata, and goals adjacent to slowing AI, and different interventions realize them to different degrees. This section is a list and discussion of variables. The goal of this section is to help readers recognize the consequences of possible interventions and thus better evaluate them, and to help generate or optimize interventions by identifying variables to affect. The goal of each subsection is to be an introduction to the relevant variable: how it relates to slowing AI and how else it matters.

List

Variables related to slowing AI are:

Timeline (to risky AI) length
Crunch time length
Relative positions of particular (kinds of) actors
Multipolarity^[1]
Attitudes on AI risk (among labs and the ML research community, and also states, other actors, and the public)
Technical safety research performed
Strategy & governance research performed
How much alignment tax is paid for particular models
Operational adequacy and safety practices at leading labs
Coordination ability and inclination among leading labs
The extent to which leading labs could scale up compute
How safe the architecture of powerful AI is
The extent to which leading labs develop risky vs unthreatening kinds of systems
Deployment safety
Which (kinds of) actors exert influence over AI
West-China relation
How well powerful AI will be used
Attitudes on AI
How various actors feel about the AI safety community
AI misuse risks
Non-AI risks
Field-building, community-building, and influence-gaining

This list aims to be essentially exhaustive.^[2]

Variables overlap. Maybe this list should instead be a tree or directed acyclic graph to show relationships between variables; the first two sections are weird because they relate to almost all of the other variables, which are mostly discussed in their own subsections.

Many "all else equal"s omitted.

1. Slower AI progress would be good

Powerful AI appearing later than it will by default would reduce misalignment risk and total x-risk (on current margins, in expectation, all else equal).

I expect that powerful AI will appear substantially sooner than I would prefer. Reasons include

labs and researchers have incentives or apparent-myopic-incentives to cause faster progress, independent of what is impartially optimal;
independent of individuals' preferences, the ML community has a culture of making progress and publishing results;
by default, AI risk can occur due to unilateral action; and
Alignment is harder than most AI labs/researchers believe (or their beliefs don't fully coherently translate into preferences or actions) (and moreover they are likely to fail to fully update before their attitudes are catastrophic). (Crucial question: why exactly might labs go too fast by default?)

Slowing AI allows more time for alignment research, field-building, influence-gaining, affecting labs' attitudes, governance interventions, and (sign unclear) governance happening.

Subvariables not discussed elsewhere include inputs to AI, notably:

Compute accessibility
1. Cost of compute
2. Regulation
Talent
1. Talent pool
2. How talent is allocated between labs and to labs vs places-that-aren't-AI-labs
3. International migration

2. Time is more valuable near the end

"Crunch time" is ambiguous; I vaguely mean the periods of strategic clarity, open windows of opportunity, and powerful models existing.^[3]

Time is more valuable when AI capability-level is greater. In particular, more time with powerful models

Is directly useful for empirical safety research;
Informs lots of research by giving more time with better information about what critical models/training will look like and maybe general clearsightedness about AI risk; and
Gives more time for advocacy, interventions, and (sign unclear) actors' actions during a particularly high-leverage period for interventions and actors' actions.

(but the magnitude of these benefits, and of side effects of slowing, is cruxy and uncertain).

Slowing AI will be more tractable near the end insofar as greater risk awareness facilitates slowing. It will be less tractable insofar as incentives for AI progress seem greater, AI progress and commercialization has more momentum, more influence is exerted by actors outside the AI safety community, maybe multipolarity among labs and AI-service-builders is greater, and maybe actions require serial time.

Note that capabilities increase after deployment, as people figure out how to use deployed systems and build apps and prompt bureaucracies. And undeploying AI systems is hard.

Some interventions would add time near the end.

Some interventions would slow AI soon but entail less slowing or even negative slowing near the end (e.g., perhaps convincing labs to spend less on compute now makes them increase spending faster later, or more speculatively perhaps slowing the West now causes the West to be more worried about China in the future). (But some interventions would clearly slow AI now and later, e.g. causing labs to publish less.)

Also gives more time to pay the alignment tax for particular models, but that's its own variable.

Also gives more high-leverage time for normal governance, but the sign of that is unclear.

Related concepts: takeoff speed, emergency brakes.

3. Beware of differentially slowing safer labs

Some interventions would slow some actors more than others.

It would be bad to differentially slow more safety-conscious actors.

It would probably be bad to differentially slow Western relative to Chinese actors:^[4]

Western labs are better on technical safety in expectation;
Western labs are ahead (and seem likely to remain ahead) and can better coordinate with other Western labs in expectation;
If America or the West enact great regulations, it would be better if all of the leading labs were bound by them, which requires all of the leading labs to be American or Western; and
Western and Chinese labs being closer would lead to worse state involvement in AI research in expectation.

A "safer" lab can cause an existential catastrophe, but it is more likely to slow down near the end, pay the alignment tax for any critical models it develops, and deploy cautiously.

Differentially slowing leading labs is bad because it increases multipolarity (see "Beware of increasing multipolarity" below).

Differentially slowing actors that would use powerful AI well is bad (see "Using powerful AI well" below).

Related subsections: "Operational adequacy," "Focus on slowing risky AI," "Deployment safety," "More time near the end helps labs pay the alignment tax," "Who influences AI."

4. Beware of increasing multipolarity

In this context, a world is more "multipolar" if more labs are near the frontier of AI capabilities and lead times between labs are smaller.^[5] Multipolarity is bad because

It directly causes faster progress;
It makes coordination more difficult
- It makes labs less able and willing to pay a given amount of alignment tax,
- It may make race dynamics stronger but racing is not well understood;
It increases the probability that a naive actor will take unilateral action that is bad (intentionally or accidentally).

Slowing AI can affect multipolarity

If multipolarity naturally increases over time;^[6] or
By particular interventions differentially slowing leading labs or non-leading labs
- Through various regulations,
- By affecting the cost of compute,
- By affecting the diffusion of ideas, or
- By affecting private data.

Miscellaneous uncertainties:

How do various interventions/actions affect multipolarity– e.g. how much does decreasing the diffusion of ideas differentially slow non-leading labs, or how much does a temporary ceiling on training compute differentially slow leading labs?
How do AI labs proliferate? How is this affected by actions of labs and the US government?
How far behind is China; when would it build dangerous AI? And how does that depend on actions of Western labs and governments?
Racing
- What would happen if there were many more labs pushing the frontier on the path to dangerous AI?
- How much faster could labs go now?
- What coordination could happen (especially near the end), and how does multipolarity affect that? E.g. OpenAI + DeepMind + Anthropic agreeing on evals or otherwise paying the alignment tax is easier than many labs coordinating to do so.

Similar-sounding term: "multipolar scenario" or "multipolar takeoff" refers to there being multiple powerful AI systems or AI-enabled actors, while my "multipolarity" refers to the number of labs, including before powerful AI appears.

5. Risk awareness is good

Labs (and researchers) being more concerned about AI risk slows AI.

Labs (and researchers) being more concerned about AI risk improves AI safety (independent of slowing AI).

Slowing AI generally makes labs (and researchers) be more concerned about AI risk, I think, by giving more time for empirical safety work, relevant demonstrations, advocacy, and researchers to realize what's true. But some factors suggest slowing AI would entail less concern about AI risk: prophecies of doom fail to be realized, and separately perhaps labs and researchers will like the AI safety community less (see "Relationships with actors" below).

On researchers' attitudes, see the 2022 Expert Survey on Progress in AI (Stein-Perlman et al. 2022) and AI Risk Discussions (Gates et al. 2023). On public opinion, see Surveys of US public opinion on AI (Stein-Perlman 2022). Note that survey responses seem to miss some aspects of respondents' attitudes.

The leaders of OpenAI, DeepMind, and Anthropic seem pretty reasonable as far as I know.^[7]

Warning signs are mostly good

A "warning sign" (or "warning sign for actor X") is an event or process that causes many people to be worried about AI accidents (or that makes AI risk legible or salient to actor X).

Warning signs would mostly help slow AI and have mostly positive side effects for AI safety. But they're very non-monolithic.

Some events could make AI power legible but not make (reasonable threat models for) AI risk legible.

Perhaps some events could make AI risk legible to some actors but not others, in particular to labs and the ML research community but not states or the public. Risk awareness is multidimensional.

Warning signs seem to mostly help slow AI by causing actors to slow down and to slow each other down. But insofar as they make AI power more salient, they could cause actors to be more excited about powerful AI.

Warning signs may also make labs more risk-aware or more incentivized to appear risk-aware.

Slowing AI makes warning signs more likely: it gives more time for

A warning sign to occur in general,
Moderately-powerful systems to be integrated into applications and for AI products/services to proliferate, and
AI safety research and other work to make AI risk more legible.

Note that trying to do something that looks harmful has huge negative consequences for your ability to influence people. Scary demos are acceptable at least in some cases; I don't know when they are effective.^[8]

Related concepts: warning shot, galvanizing event, wakeup (to powerful AI), fire alarm, meme.^[9]

6. More time (especially near the end) helps safety research

More time implies more technical safety research. Moreover:

Research quality. Much of technical safety research grows more effective as
- More powerful AI models become available and
- We gain more clarity about what very powerful AI systems will look like
- (and maybe over time as alignment research matures as a field and more serial research is completed, but on the other hand low-hanging fruit is taken over time).
Research rate. Research done per time grows over time insofar as the field grows. And (related to some interventions to slow AI) if AI safety is more legible (or prestigious, if the right concept is prestigious) it will receive more work. (And maybe legibility and prestige naturally increase over time, but we can lump that with the rest of the exogenous growth of research rate.) (Note that insofar as research is not parallelizable and serial time is necessary, research rate is less important.)

7. More time (especially near the end) helps strategy/governance research & interventions

This variable is similar to the previous one. But strategy/governance research and interventions are aided near the end mostly by strategic clarity rather than the existence of powerful models. And note that technical safety research is ~directly valuable while governance research requires intervention to create value.

New resources, affordances, or windows of opportunity will probably appear during crunch time. We can plan for them to some extent in advance.

Work on slowing AI causes slowing, and slowing gives more time for work on slowing. But this cycle is not very important because the multiplication factor seems small.

8. More time near the end helps labs pay the alignment tax

To make an AI model safe, under one frame, the developer of the model must pay the alignment tax. If there is more time near the end, the developer is able and willing to pay more and is thus more likely to successfully pay the alignment tax.

9. Operational adequacy

Operational adequacy is directly good for AI safety. If it naturally improves over time (which seems uncertain), slowing AI improves it and so it is an additional consideration in favor of slowing AI. Additionally, improving some dimensions of operational adequacy would slow AI.

10. Coordination is good

Some aspects of AI development are analogous to the prisoner's dilemma. Perhaps in some situations, a lab would not want to slow down unilaterally, but would prefer everyone slowing down to nobody doing so. Improving coordination means helping labs slow down together (mostly by helping them slow down together when they want to; also by helping them understand the situation they're in and want to slow down). It is good for coordination to be easier and more likely.

Coordination ability includes:

Commitment ability / verification ability / transparency;
Good coordination mechanisms having been discovered/proposed;
Relevant coordination methods being legal (and sufficiently clearly legal that the fear of illegality doesn't dissuade coordination);^[10] and
Perhaps the inverse of multipolarity, although that could be treated separately (as part of the difficulty of coordination).

Increasing coordination ability or inclination increases labs' ability to slow (risky) AI.

How slowing affects coordination is unclear. Perhaps slowing naturally increases coordination ability and inclination. Perhaps slowing also naturally increases multipolarity, making coordination more difficult. Additionally, slowing causes who leading labs and relevant actors are to be more resampled from the distribution of possibilities, which is bad insofar as the current landscape of leading labs and relevant actors is relatively good.

Related: transparency, coordination mechanisms, benefit-sharing mechanisms, arms control treaties.

11. Quickly scaling up compute

If labs can quickly scale up compute of the largest training runs, they might be able to quickly make much more powerful systems; progress might be fast. Insofar as slower progress is better, it would be better if labs couldn't quickly scale up compute much. So if an intervention increases labs' ability to quickly scale up compute later, that's bad.

Something like this variable is often assumed to be a major determiner of the length of crunch time. Insofar as labs are trying to go fast near the end, it is important (and if it's sufficiently large, more labs could be relevant, increasing effective multipolarity).

It is also commonly suggested that smoother and more gradual progress is safer for reasons other than extending crunch time.

Some policy regimes aimed at slowing AI would limit training compute. If such regimes were reversed or evaded, training compute could scale up quickly.^[11]

If increasing capabilities today didn't shorten timelines, it would be (at least prima facie) good because it gives more time with more powerful models and better strategic and technical clarity. Unfortunately, increasing capabilities today generally shortens timelines. (But I don't think anyone has a great model of what drives long-term AI progress.)

Related concept: "compute overhang" or "hardware overhang" (there is no consensus definition of these terms^[12]).

Equivalent variable: how much more compute labs could use if they chose to. Similar variable: how much more compute labs would use if they believed very powerful AI was near.

12. Architectures safety

Advanced cognitive capabilities seem likely to be achieved by machine learning. Within machine learning, some paths would be safer than others. (Or they could be achieved through whole-brain emulation, neuromorphic AI, brain-computer interfaces (unlikely), or just genetic engineering.)

Some slowing interventions would differentially accelerate certain paths to advanced cognitive capabilities.

It might be good to differentially accelerate certain paths. I don't know which paths, though. It depends on their safety and relationships between paths (in particular, whole-brain emulation largely overlaps with neuromorphic AI, so accelerating whole-brain emulation would accelerate neuromorphic AI).

13. Focus on slowing risky AI

Some kinds of systems at various levels of abstraction (e.g. perhaps large language models, reinforcement learning agents, models that demonstrate scary capabilities in an evaluation, and agents) are risky (on various threat models). Most systems are much less risky.

Some interventions differentially slow progress on risky AI.

Slowing risky AI could occur due to labs' choices (for safety reasons or for prestige), or researchers' choices (especially if united or with strong affordances for collective action), or being imposed by government, or being incentivized in various ways.

Ideally, safety standards could prevent actors from doing risky things while minimally impairing their ability to do safe things.

Related concept: differential technological development.^[13]

14. Deployment safety

Some ways that powerful AI systems could be deployed would be safer. Deployment safety is affected by decisions within leading labs, their relationships, and properties-of-the-world like the number of leading labs.^[14] Slowing AI, and particular interventions related to slowing AI, could affect deployment safety in predictable ways.

15. Who influences AI

Maybe it is better for some (kinds of) actors to exert more influence/control over AI, e.g. because some labs are safer than others.

Maybe there are side effects when some (kinds of) actors exert more influence over AI, e.g. the West acting differently if China exerts more influence.

Maybe interventions (e.g., AI risk advocacy to government) affect which (kinds of) actors exert influence over AI. This matters insofar as it affects safely directly and insofar as it slows AI.

Maybe some (kinds of) actors (e.g., states) naturally exert more influence over AI over time, so slowing AI causes them to exert more influence. On the other hand, maybe some kinds of slowing cause some (kinds of) actors (e.g., states) to be slower to wake up to AI.

This variable overlaps with "attitudes on AI" and "deployment safety" and "how well powerful AI will be used" and "warning signs."

16. West-China relation

AI development is led by the Western cultural and political sphere. The other sphere reasonably likely to be able to develop powerful AI soon is China. This situation is likely to continue for at least a decade.

This variable is affected by the capabilities difference between the leading Western labs vs the leading Chinese labs, the levers that the West and China have over each other's AI development, and the attitudes of actors (especially states and labs) about the other sphere. A generalization of this variable would be the set of relations between all such spheres.

Related: it seems easier for two Western labs to coordinate than a Western lab and a Chinese lab.

West vs China lead can be affected by slowing-interventions. For example, US export controls differentially slow China and US domestic regulation differentially slows the US.

The West vs China relation (including lead but also attitudes) affects slowing by affecting incentives/racing and by affecting state action.

17. Using powerful AI well

It would be better if powerful AI is ultimately used better. This can be affected by affecting who controls powerful AI or by affecting how wise, well-informed, etc. whatever actors will control powerful AI will be.

Particular slowing-related interventions can affect these variables. Slowing also naturally affects the (kinds of) actors that will control powerful AI (e.g., maybe states tend to be more involved over time). Slowing might also naturally affect attitudes on AI in particular ways.

18. Attitudes on AI

The attitudes of relevant actors, particularly states, are important.

Government involvement in AI is multidimensional; some possible actions would clearly be good. Regardless of the overall level or sign of government involvement, presumably it can be nudged to do better things.

Advocacy to government has several downside risks.^[15] In particular, you may end up convincing people about AI capabilities more than risk; perhaps "AI is really powerful" is an easy-to-understand, necessary part of the case for "AI is really risky."

Insofar as we don't know what it would be good for states to do on AI risk, advocacy to government has limited upside (although perhaps it will later be useful to have done, or perhaps government can generate good things to do).

I feel uncertain about the effects of advocacy:

I feel uncertain about how advocacy would affect attitudes and
I feel uncertain about behavior as a function of attitudes.

This variable is related to slowing AI because

Attitudes affect slowing,
Slowing generally affects attitudes insofar as they naturally change over time (in particular, actors appreciate AI more and and want to get more involved), and
Particular slowing-related interventions affect attitudes in particular ways.

Related concept: wakeup (to powerful AI).

19. Relationship with actors

Relevant actors liking the AI safety community is good insofar as it improves AI safety.

Some interventions would make labs and researchers like the safety community less, like perhaps:

Interventions that don't seem justified to labs and researchers as slowing scary AI (especially interventions that don't differentially slow scary AI),
Interventions that are adversarial or norm-violating, or
Interventions around a public campaign to slow AI that involves AI safety people and is unpopular with labs and researchers.

I don't know what affects states' attitudes on the AI safety community.

A related framing of this variable is: how events make various groups look, particularly AI safety vs other groups trying to affect the future of AI.

The AI safety community is not perceived as monolithic; an actor could have different attitudes on different parts or aspects of it.

20. Slowing AI increases AI misuse risks

Slowing AI causes there to be more time for AI misuse to occur before powerful AI can solve that.

This is important

Directly,
Insofar as AI misuse is a risk factor for other risks, and
Insofar as AI misuse (or fear of it) affects attitudes on AI.

21. Slowing AI increases non-AI risks

Non-AI risks are mostly technological risks and great power conflict. Slowing AI causes there to be more time for such risks to occur before powerful AI can solve them (and improving technology generally seems to be increasing risk-per-time, if anything). And in particular, slowing AI may increase tension between great powers.

This is important directly and insofar as non-AI risk is a risk factor for other risks.

22. Field-building, community-building, influence-gaining

AI safety field-building, community-building, and influence-gaining may help slow AI.

Slowing AI gives more time for AI safety field-building, community-building, and influence-gaining. However, the influence exerted by others over AI increases over time (see "Who influences AI" above).

Miscellanea

Many of these variables are related to various definitions of "takeoff speed." There is no consensus definition of "takeoff speed." Many takeoff-related variables are largely exogenous, but slowing-related interventions might affect

What labs want,
What systems labs are allowed to train or deploy,
Labs' ability to scale up compute (or maybe other inputs),
Coordinated slowing,
How much research happens,
How efficiently research turns into progress-acceleration, or maybe
Homogeneity in takeoff.

Takeoff-related variables affects many variables– for example, time to powerful AI, how valuable that time is, paying the alignment tax, racing, multipolarity, misuse, attitudes, risk awareness, and government actions.

Miscellaneous considerations for interventions include:

Option value, irreversibility, & lock-in;
Information value;
Attention hazard; and
Reputation hazard & backlash.

Frames

This section is about analytic frames on slowing AI.^[16] If a thing is interesting and surprising, it can be useful to focus on that thing and investigate its implications for what's true, what to think more about, and how to organize your thoughts. This section is about approaches or perspectives to try on and possible things to pay attention to, not asserting what's true. A good frame is one that is useful.

The goal of this section is to give readers a variety of sometimes-useful frames to try on.

Extend crunch time the periods of strategic clarity, open windows of opportunity, and almost-dangerous models^[17]

Near the end, we may have

General strategic clarity, informing research and interventions;^[18]
Open windows of opportunity for interventions;
Better ability to do technical safety research.

So increasing time near the end—and more specifically, time with those properties—is quite valuable in expectation. This frame prompts consideration: what are necessary and sufficient conditions for progress stopping near the end? What determines how valuable time near the end is?

Another angle on this frame is that a big goal is to "lay the groundwork for slowing down in the future, when extra time is most needed."

Slowing seems good; side effects are mixed

Slowing AI is good. Slowing AI and lots of interventions aimed at slowing AI have super-important side effects, sometimes more important than their effect on slowing AI. And "slowing AI" is not monolithic; slowing progress now has different effects from slowing progress later. To choose and optimize interventions, we should find slowing-adjacent variables (or desiderata or considerations) and understand their interactions and upshots.

Related proposition and questions that could provoke more thoughts: a longer timeline to powerful AI is better on the margin. What are the positive and negative factors? What timeline would be optimal?

Related proposition and questions that could provoke more thoughts: labs go too fast by default. Why does going-too-fast occur? What determines how labs behave? What can we intervene on to escape the bad default state? And what are the inputs to AI progress (at various levels of abstraction)?

~Nobody wants to destroy the world

Approximately nobody wants to destroy the world. "If the Earth is destroyed, it will probably be by mistake."^[19] But someone accidentally destroying the world would also be weird in a way that is worth really noticing. The possibility suggests a very natural intervention: helping the people who might accidentally destroy the world understand their actions.

Slowing AI doesn't have to involve conflict with AI labs: it can be about helping labs promote everyone's preferences by giving them information and tools to help them slow down.

What would be possible if there was much greater risk awareness?

This frame is somewhat plan-y; it inherently suggests a particular theory of victory and a particular class of plans.

Ignorance, externalities, culture, and racing

Three simple factors make labs go too fast: labs misunderstanding or underestimating AI risk, the negative externality^[20] of AI catastrophe, and researchers' and labs' culture of making progress and publishing results. Additionally, there seems to be a "racing" phenomenon where a lab tries to make faster progress if it has competitors; this phenomenon is poorly understood.

This frame identifies causes of going-too-fast and so suggests that interventions target those causes.

Risk awareness might improve a lot

Risk awareness and attitudes on AI safety among labs and ML researchers currently seems poor. It may be much better in the future, and there will be tractable opportunities to make it much better, as AI gets more powerful, there's more safety research, and there's more research focused on making AI risk legible to labs and researchers.

Greater risk awareness seems to facilitate labs slowing down near the end and to create new opportunities for helping them do so. (And independent of slowing AI, it seems to facilitate labs paying the alignment tax.)

Respondents to the 2022 Expert Survey on Progress in AI (Stein-Perlman et al. 2022) were surprisingly pro-safety and doomy.^[21] But clearly their attitudes are insufficient to change much.

Incentives^[22]

Buy-in from labs helps slow AI. So focus on interventions and actions-for-labs with lots of safety per cost-to-labs. And look for ways to align labs' incentives with the world and to incentivize labs to increase safety. Make sure labs are left/given options that they want to take and are safe.

Consider a diagram with x-axis time, y-axis profitability, and two curves representing profitability over time in the two scenarios default and slow AI. The integral of the difference is the cost to labs of slowing. To get companies to accept slowing, it is necessary and sufficient to incentivize those companies to accept that cost.

I mostly don't endorse the previous paragraph. I think it might constrain one's thinking because roughly

Labs care about some long-term goal rather than the integral of profitability
- So maybe replace "profitability" with "preference (under default preferences)"
Some interventions look like changing labs' preferences
- Especially interventions like informing labs about AI risk
Some interventions look like preparing to slow down later

Slow risky AI

Slowing AI is good insofar as it slows risky AI. Slowing all AI is unnecessary; the real goal is slowing risky AI. This frame raises the question: what does that observation imply for interventions?

Prevent risky AI from being trained

The condition nobody trains risky systems is sufficient for victory, and on some views it is ~necessary. It brings to mind different problems, goals, variables, considerations, desiderata, levers, affordances, and interventions than slow AI or even slow risky AI, so it can be a useful reframing.

For example, it could let you identify big goals as develop the capacity to identify all risky training runs and cause all risky training runs to be identified and stopped.

This frame is very plan-y; it inherently suggests a particular theory of victory and a particular class of plans.

Focus on preparing for crunch time

Windows of opportunity will be much more open in the future due to strategic clarity, various actors' risk awareness, more powerful models, etc. So for now you should focus on planning and gaining influence/resources/etc., not intervening. What will be different in the future? What opportunities could you have in the future, if you prepare well?

I think this frame is less valid in the context of slowing AI than for AI strategy in general, but may still be useful.

AI inputs increase fast

Effective training compute for the largest training runs increases quickly,^[23] so decreasing inputs on the margin doesn't buy much time. Buying substantial time requires more discrete changes.

Unilateralism

The problem is that many actors will be able to unilaterally end the world. The solution is to decrease the number of decisions that would end the world if done wrong (and influence those decisions).

^{^}
Roughly meaning the number of labs near the frontier of capabilities, not the number of powerful AI(-enabled) actors.
^{^}
Or rather, it aims to be exhaustive at a certain level of abstraction. For example, it doesn't include how good the AI regulatory regime is, but I think it can give an account of that in terms of more basic variables.
^{^}
See Michael Aird's "Rough notes on 'crunch time'" (unpublished).
^{^}
Holding China constant, is it better for the West to be faster or slower than the default? It's complicated. Ideally the West would be faster now and much slower later. But since we can't just choose to slow down near the end we should be wary of causing speeding now. Speeding now is generally net-negative by costing time without buying time near the end, although speeding-now interventions that cause proportionate slowing-near-the-end could be good.
^{^}
For some purposes, what's relevant is an analogue of multipolarity that includes groups making LLM applications in addition to labs making frontier models.
^{^}
It seems to have grown historically, and maybe in the future small groups without much compute can use others' training-compute-intensive tools to make powerful stuff. If
1. big training runs reach more like $10B soon without tech companies getting much richer or AI being super commercially applicable and
2. models are somehow not deployed in a way that lets others build on them,
that would limit multipolarity.
^{^}
Of course seeming reasonable is instrumentally convergent, and I don't know much about them, but I'm not aware of much evidence of unreasonableness. Sam Altman is sometimes vilified but I think it's more productive to understand him as a human who probably has some incorrect beliefs and is currently making some small mistakes and might someday make large mistakes but is likely to avoid mistakes-foreseeable-by-the-AI-safety-community if that community successfully informs labs and the ML research community about AI risk. I'm also open to evidence of unreasonableness, of course.
^{^}
One somewhat scary demo that failed to change anything (as far as I know) was Dual use of artificial-intelligence-powered drug discovery, which ex ante would have seemed to be a strong candidate for a wakeup call for AI misuse risk.
Separately, note that some people trying to do scary demos may advance the frontier of AI capabilities in general or particular capabilities that cause harm if misused. They may also increase AI hype. They may also draw attention to AI risk in a way that leads to an unproductive response (e.g. focused on LAWS or misuse), distracting from more important threat models and governance challenges.
^{^}
See generally Michael Aird's "Warning shots, galvanizing events, etc." (unpublished).
^{^}
Antitrust law is often mentioned as maybe making some relevant coordination illegal; I do not have a good sense of what antitrust law prohibits regarding AI labs (or what labs believe about what is legal).
^{^}
Suppose for illustration that by default a lab would 10x training compute every year, but it is artificially capped at 2x. After time under this policy, the lab would want to scale up faster, to 'catch up' to where it would have been by default. (But not all the way– the 10x in this illustration bakes in positive endogeneities of AI progress, which are mostly not relevant to a lab 'catching up.') If
1. scaling up fast is technically feasible and
2. the policy is reversed, or the lab is able and willing to violate the policy, or the lab leaves the policy's jurisdiction,
then the lab will scale up training compute by much more than 10x per year until it has largely 'caught up.' Otherwise, a compute overhang doesn't cause faster progress.
^{^}
See Zach Stein-Perlman's Taboo "compute overhang" (2023). Regardless, most definitions are not very analytically useful or decision-relevant. As of April 2023, the cost of compute for an LLM's final training run is around $40M. This is tiny relative to the value of big technology companies, around $1T. I expect compute for training models to increase dramatically in the next few years; this would cause how much more compute labs could use if they chose to to decrease.
I notice I am confused:
- I think final training runs have only cost <$100M, why hasn't anyone done a $1B training run?
- Why hasn't training compute scaled up much since AlphaGo Zero or GPT-3? See Epoch's Compute Trends Across Three Eras of Machine Learning (2022).
^{^}
See Jonas Sandbrink et al.'s Differential Technology Development: A Responsible Innovation Principle for Navigating Technology Risks (2022) and Michael Aird's "Slower tech development" can be about ordering, gradualness, or distance from now (2021).
^{^}
See Holden Karnofsky's Nearcast-based "deployment problem" analysis (2022) and Racing through a minefield: the AI deployment problem (2022).
^{^}
See Alex Lintz's "AGI risk advocacy" (unpublished).
^{^}
"An analytic frame is a conceptual orientation that makes salient some aspects of an issue, including cues for what needs to be understood, how to approach the issue, what your goals and responsibilities are, what roles to see yourself as having, what to pay attention to, and what to ignore." Zach Stein-Perlman's Framing AI strategy (2023).
^{^}
See Michael Aird's "Rough notes on 'crunch time'" (unpublished).
^{^}
On the other hand, the world will likely be more confusing, weird, and quickly-changing.
^{^}
Eliezer Yudkowsky's Cognitive Biases Potentially Affecting Judgment of Global Risks (2008).
^{^}
Negative externalities cause an activity to happen too much since some of the costs are borne by others.
^{^}
Respondents see "Stuart Russell's problem" as hard and important. Median P(doom) is 5–10% depending on how you ask. 69% of respondents want more priority on AI safety research (up from 49% in the 2016 survey). See the survey for relevant definitions.
^{^}
Inspired but not necessarily endorsed by Alex Gray.
^{^}
Training compute for the largest training runs seems not to have increased much in the recent past; there has not been much growth since AlphaGo Zero in 2017. See Epoch's Compute Trends Across Three Eras of Machine Learning (2022). But it seems likely to increase quickly in the future.

[-]Richard Korzekwa 1y40

Regardless, most definitions [of compute overhang] are not very analytically useful or decision-relevant. As of April 2023, the cost of compute for an LLM's final training run is around $40M. This is tiny relative to the value of big technology companies, around $1T. I expect compute for training models to increase dramatically in the next few years; this would cause how much more compute labs could use if they chose to to decrease.

I think this is just another way of saying there is a very large compute overhang now and it is likely to get at least somewhat smaller over the next few years.

Keep in mind that "hardware overhang" first came about when we had no idea if we would figure out how to make AGI before or after we had the compute to implement it.

[-]Zach Stein-Perlman1y40

Agree in part.

I have the impression that for reasons I don't fully understand, scaling up training compute isn't just a matter of being willing to spend more. One does not simply spend $1B on compute.

Ideas and training compute substitute for each other sufficiently well enough that I don't think it's useful to talk about "[figuring] out how to make AGI before or after we [have] the compute to implement it." (And when "'hardware overhang' first came about" it had very different usage, e.g. the AI Impacts definition.)

[-]M. Y. Zuo1y10

Unilateralism
The problem is that many actors will be able to unilaterally end the world. The solution is to decrease the number of decisions that would end the world if done wrong (and influence those decisions).

Why would the 'number of decisions that would end the world if done wrong' be necessarily above zero?

Also, wouldn't labs be incentivized to trumpet whatever is most beneficial for their goals and downplay whatever is most deleterious? How would the true state within the lab be monitored instead of what's communicated?

[-]Zach Stein-Perlman1y20

This frame makes the nonobvious/uncertain assertion that "many actors will be able to unilaterally end the world" I'm not interested in arguing whether by default many actors would be able to unilaterally build an AI that ends the world here. Insofar as that assertion is true, one kind of solution or way of orienting to the problem is to see it as a problem of unilateralism.

In response to your last paragraph: yeah, preventing a lab from building dangerous AI requires hard action like effectively monitoring it or controlling a necessary input to dangerous AI.

Thanks for the answer. Your frame is interesting.

I'm not quite sure if 'hard action like effectively monitoring it or controlling a necessary input to dangerous AI' is realistic to implement everywhere there's a concentration of researchers.

How do you envision this being realizable outside of extreme scenarios such as a world dictatorship?

I largely agree! Maybe we can get a stable policy regime of tracking hardware and auditing all large training runs with model evals that can identify unsafe systems. Maybe the US government can do intermediate stuff like tracking hardware and restricting training compute.

But mostly this frame is about raising questions or suggesting orientations or helping you notice if something appears.

(By the way, I roughly endorse this frame less than the others, which is why it's at the end.)

Maybe we can get a stable policy regime of tracking hardware and auditing all large training runs with model evals that can identify unsafe systems. Maybe the US government can do intermediate stuff like tracking hardware and restricting training compute.

Wouldn't this need to be done worldwide, near simultaneously, to be effective?

I'm not sure doing it in one country will move the needle much.

To some extent, yes, it would need US + Europe (including UK) + China. A strong treaty is necessary for some goals.
I'd guess that the US alone could buy a year.
One Western government doing something often causes other Western governments to do it.

(Edit in response to reply: I don't think we have important disagreements here, so ending the conversation.)

I'm still skeptical it'll be anywhere near that easy. India, Japan, Korea, and some other countries are also coming on to the scene and will likely need to be included in any future deal.

Plus even when they're at the negotiating table, the parties are incentivized to stall as long as possible to cut the best possible deal, because they won't all be identically worried to the same degree.

And there's nothing to stop them from walking away at any time. Even the folks in Europe may want to hold out if they feel like they can extract the maximum concessions elsewhere from both U.S. and China.

[-]Christopher King1y10

I think another important consideration is not slowing alignment research. Slowing AI is useless if it slows alignment by the same amount!

(Not exactly: slowing increases time for making sense of AI, increasing risk awareness, governance, paying the alignment tax, influence-gaining and so forth– but also for misuse, safety-unconcerned actors exerting more influence, maybe multipolarity increasing, and so forth.)

Slowing AI: Foundations

45

Variables

List

1. Slower AI progress would be good

2. Time is more valuable near the end

3. Beware of differentially slowing safer labs

4. Beware of increasing multipolarity

5. Risk awareness is good

Warning signs are mostly good

6. More time (especially near the end) helps safety research

7. More time (especially near the end) helps strategy/governance research & interventions

8. More time near the end helps labs pay the alignment tax

9. Operational adequacy

10. Coordination is good

11. Quickly scaling up compute

12. Architectures safety

13. Focus on slowing risky AI

14. Deployment safety

15. Who influences AI

16. West-China relation

17. Using powerful AI well

18. Attitudes on AI

19. Relationship with actors

20. Slowing AI increases AI misuse risks

21. Slowing AI increases non-AI risks

22. Field-building, community-building, influence-gaining

Miscellanea

Frames

Extend crunch time the periods of strategic clarity, open windows of opportunity, and almost-dangerous models[17]

Slowing seems good; side effects are mixed

~Nobody wants to destroy the world

Ignorance, externalities, culture, and racing

Risk awareness might improve a lot

Incentives[22]

Slow risky AI

Prevent risky AI from being trained

Focus on preparing for crunch time

AI inputs increase fast

Unilateralism

45

Unilateralism

Extend crunch time the periods of strategic clarity, open windows of opportunity, and almost-dangerous models^[17]

Incentives^[22]