LESSWRONG
LW

How can one rationally have very high or very low probabilities of extinction in a pre-paradigmatic field?

1 min read30th Apr 202315 comments

39

Probability & StatisticsRationalityAIWorld Modeling

It is generally accepted in the local AI alignment circles that the whole field is pre-paradigmatic, in the Kuhnian sense (phase 1, as summarized here, if longer reading is not your thing). And yet, plenty of people are quite confident in their predictions of either doom or fizzle. A somewhat caricature way of representing their logic is, I think, "there are so many disjunctive ways to die, only one chance to get it right, and we don't have a step-by-step how-to, so we are hooped" vs "this is just one of many disruptive inventions whose real impact can only be understood way down the road, and all of them so far have resulted in net benefit, AI is just another example" (I have low confidence in the accuracy of the latter description, feel free to correct.) I can see the logic in both of those, what I do not see is how one can rationally have very high or very low confidence, given how much inherent uncertainty there is in our understanding of what is going on.

My default is something more cautious, akin to Scott Alexander's https://slatestarcodex.com/2019/06/03/repost-epistemic-learned-helplessness/

where one has to recognize their own reasoning limitations in the absence of hard empirical data, not "The Lens That Sees Its Flaws", but more like "The Lens That Knows It Has Flaws" without necessarily being able to identify them.

So, how can one be very very very sure of something that has neither empirical confirmation, nor sound science behind it? Or am I misrepresenting the whole argument?

New to LessWrong?

Getting Started

How can one rationally have very high or very low probabilities of extinction in a pre-paradigmatic field?

30jacob_cannell

New Comment

15 comments, sorted by

Click to highlight new comments since: Today at 5:58 AM

Taboo "rationally".

I think the question you want is more like: "how can one have well-calibrated strong probabilities?". Or maybe "correct". I don't think you need the word "rationally" here, and it's almost never helpful at the object level -- it's a tool for meta-level discussions, training habits, discussing patterns, and so on.

To answer the object-level question... well, do you have well-calibrated beliefs in other domains? Did you test that? What do you think you know about your belief calibration, and how do you think you know it?

Personally, I think you mostly get there by looking at the argument structure. You can start with "well, I don't know anything about proposition P, so it gets a 50%", but as soon as you start looking at the details that probability shifts. What paths lead there, what don't? If you keep coming up with complex conjunctive arguments against, and multiple-path disjunctive arguments for, the probability rapidly goes up, and can go up quite high. And that's true even if you don't know much about the details of those arguments, if you have any confidence at all that the process producing those is only somewhat biased. When you do have the ability to evaluate those in detail, you can get fairly high confidence.

That said, my current way of expressing my confidence on this topic is more like "on my main line scenarios..." or "conditional on no near-term giant surprises..." or "if we keep on with business as usual...". I like the conditional predictions a lot more, partly because I feel more confident in them and partly because conditional predictions are the correct way to provide inputs to policy decisions. Different policies have different results, even if I'm not confident in our ability to enact the good ones.

Taboo "rationally".

I used that description very much intentionally. As in, use your best Bayesian estimate, formal or intuitive.

As to the object level, "pre-paradigmatic" is essential. The field is full of unknown unknowns. Or, as you say "conditional on no near-term giant surprises..." -- and there have been "giant surprises" in both directions recently, and likely will be more soon. It seems folly to be very confident in any specific outcome at this point.

[-]ChristianKl1y1410

I think there's a dynamic where people who feel more uncertain are less likely to speak up than people who feel less uncertain. I would expect that if you poll people in a LW or EA census then only a minority of the people will answer with >99% doom.

I don't think you can have particularly high confidence one way or the other without just thinking about AI in enough detail to have an understanding of the different ways that AI development could end up shaking out in. There isn't a royal road.

Both the "doom is disjunctive" and "AI is just like other technologies" arguments really need a lot more elaboration to be convincing, but—personally I find the argument that AI is different from other technologies pretty obvious and I have a hard time imagining what the counterargument would be.

[-]Hastings1y30

If you are piloting an airliner that has lost all control authority except for engine throttle, what you need is a theory that predicts how sequences of throttle positions will map to aircraft trajectories. If your understanding of throttle-guided-flight is preparadigmatic, then you won't be able to predict with any confidence how long your flight will last, or where it will end up. However, you can predict from first principles that it will eventually come to a stop, and notice that only a small fraction of possible stopping scenarios are favorable.

[-]jacob_cannell1y3016

But how did you determine you were probably "piloting an airliner that has lost all control"?

Whereas , if you can't steer a ship, you end up bobbing harmlessly.

[-]kolmplex1y21

I might be misunderstanding some key concepts but here's my perspective:

It takes more Bayesian evidence to promote the subjective credence assigned to a belief from negligible to non-negligible than from non-negligible to pretty likely. See the intuition on log odds and locating the hypothesis.

So, going from 0.01% to 1% requires more Bayesian evidence than going from 10% to 90%. The same thing applies for going from 99% to 99.99%.

A person could reasonably be considered super weird for thinking something with a really low prior has even a 10% chance of being true, but it isn't much weirder to think something has a 10% chance of being true than a 90% chance of being true. This all feels wrong in some important way, but mathematically that's how it pans out if you want to use Bayes' Rule for tracking your beliefs.

I think it feels wrong because in practice reported probabilities are typically used to talk about something semantically different than actual Bayesian beliefs. That's fine and useful, but can result in miscommunication.

Especially in fuzzy situations with lots of possible outcomes, even actual Bayesian beliefs have strange properties and are highly sensitive to your priors, weighing of evidence, and choice of hypothesis space. Rigorously comparing reported credence between people is hard/ambiguous unless either everyone already roughly agrees on all that stuff or the evidence is overwhelming.

Sometimes the exact probabilities people report are more accurately interpreted as "vibe checks" than actual Bayesian beliefs. Annoying, but as you say this is all pre-paradigmatic.

I feel like I am "proving too much" here, but for me this all this bottoms out in the intuition that going from 10% to 90% credence isn't all that big a shift from a mathematical perspective.

Given the fragile and logarithmic nature of subjective probabilities in fuzzy situations, choosing exact percentages will be hard and the exercise might be better treated as a multiple choice question like:

Almost impossible
Very unlikely
Maybe
Very likely
Almost certain

For the specific case of AI x-risk, the massive differences in the expected value of possible outcomes mean you usually only need that level of granularity to evaluate your options/actions. Nailing down the exact numbers is more entertaining than operationally useful.

I agree that 10-50-90% is not unreasonable in a pre-paradigmatic field. Not sure how it translates into words. Anything more confident than that seems like it would hit the limits of our understanding of the field, which is my main point.

[-]kolmplex1y10

Makes sense. From the post, I thought you'd consider 90% as too high an estimate.

My primary point was that an estimate of 10% and 90% (or maybe even >95%) aren't much different from a Bayesian evidence perspective. My secondary point was that it's really hard to meaningfully compare different peoples' estimates because of wildly varying implicit background assumptions.

Personally, I'm not very sure. But it seems to me that the risk of an AI-caused extinction is high enough to be worth of a serious discussion on the presidential level.

My reasoning:

GPT-4 is an AGI
1. A personal observation: I've been using it almost daily for months and for all kinds of diverse applied tasks, and I can confirm that it indeed demonstrates a general intelligence, in the same sense as a talented jack-of-all-trades human secretary demonstrates a general intelligence.
A much smarter AGI can be realistically developed
1. It seems that these days, the factor that limits AI smarts is the will to invest more money into it. It's not about finding the right algorithms anymore
2. The surest way to predict the next token is to deeply understand the universe
There are strong financial, scientific, political incentives to develop smarter and smarter AIs
Therefore, unless there is some kind of a dramatic change in the situation, humanity will create an AGI much smarter than GPT-4, and much smarter than the average human, and much smarter than the smartest humans
We have no idea how to co-exist with such an entity.

Judging by the scaling laws and the dev speed in the field, it's the matter of years, not decades. So, the question is urgent.

GPT4 can’t even do date arithmetic correctly. It’s superhuman in many ways, and dumb in many others. It is dumb in strategy, philosophy, game theory, self awareness, mathematics, arithmetic, and reasoning from first principles. It’s not clear that current scaling laws will be able to make GPTs human level in these skills. Even if it becomes human level, a lot of problems are NP. This allows effective utilization of an unaligned weak super-intelligence. Its path to strong super-intelligence and free replication seems far away. It took years from GPT3 to GPT4. GPT4 is not that much better. And these were all low hanging fruit. My prediction is that GPT5 will have less improvements. It will be similarly slow to get developed. Its improvements will be mostly in areas it is already good at, not in its inherent shortcomings. Most improvements will come from augmenting LLMs with tools. This will be significant, but it will importantly not enable strategic thinking or mathematical reasoning. Without these skills, it’s not an x-risk.

I think I touched on these points, that some things are easy and others are hard for LLMs, in my other post, https://www.lesswrong.com/posts/S2opNN9WgwpGPbyBi/do-llms-dream-of-emergent-sheep

I am not as pessimistic about the future capabilities, and definitely not as sure as you are (hence this post), but I see what you describe as a possibility. Definitely there is a lot of overhang in terms of augmentation: https://www.oneusefulthing.org/p/it-is-starting-to-get-strange

I think "rationally" in the title and "very very very sure" suggest you're looking at this question in slightly the wrong way.

If in fact most futures play out in ways that lead to human extinction, then a high estimate of extinction is correct or "rational"; if most futures don't lead to doom, then a low estimate of doom is correct. This is a fact independent of the public / consensus epistemic state of any relevant scientific fields.

A recent quote from Eliezer on this topic (context / source in the footnotes of this post):

My epistemology is such that it's possible in principle for me to notice that I'm doomed, in worlds which look very doomed, despite the fact that all such possible worlds no matter how doomed they actually are, always contain a chorus of people claiming we're not doomed.

Eliezer also talked a bit about how uncertainty over the right distribution can lead to high probabilities of doom towards the end of the Lunar Society podcast.

This is kind of a strawman / oversimplification of the idea, but: if you're maximally uncertain about the future, you expect with near certainty that the atoms in the solar system end up in a random configuration. Most possible configurations of atoms have no value to humans, so being very uncertain about something and then applying valid deductive reasoning to that uncertainty can lead to arbitrarily high estimates of doom. Of course, this uncertainty is in the map and not the territory; your original uncertainty may be unjustified or incorrect. But the point is, it doesn't really have anything to do with the epistemic state of a particular scientific field.

Also, I don't know of anyone informed who is "very very very sure" of any non-conditional far future predictions. My own overall p(doom) is a "fuzzy" 90%+. There are some conditional probabilities which I would estimate as much closer to 1 or 0 given the right framing, but ~10% uncertainty in my overall model seems like the right amount of epistemic humility / off-model uncertainty / etc. (I'd say 10% seems about equally likely to be too much humility vs. too little, actually.)

If in fact most futures play out in ways that lead to human extinction, then a high estimate of extinction is correct or "rational"; if most futures don't lead to doom, then a low estimate of doom is correct. This is a fact independent of the public / consensus epistemic state of any relevant scientific fields.

This seems wrong, or at least incomplete.

Give all the doom outcomes a p or 1/10^10000000000000000000000 and the bliss outcome 1-p. Even with a lot more ways doom occurs it seems we might not worry much about doom actually happening. It's true you might weight the value of doom much higher than bliss so some expected value might work towards your view. But now we need to consider the timing of doom and existential risks unrelated to AI. If someone were to work through all the AI dooms and timing of that doom and come to (sake of argument clearly) is 50 billion years then we have much more to worry about from our Sun than AI.