Outside the AI company Anthropic’s offices in San Francisco, a guy is on a hunger strike. He’s been there two weeks. Another was doing something similar outside Google DeepMind in London, but never quite hit his straps: he started tweeting about it on the first day, which seems a grandiose way of saying “I skipped breakfast”, and then gave up after doctors told him it was dangerous — a surprise to those of us who rather thought that was the point.
Jokes aside, the intention is to stop the tech companies from building superintelligent AI, which they say “threatens to destroy life on Earth”. But here’s the thing: if these protesters believe that superintelligent AI has a high chance of killing everyone, then they’re not overreacting. If anything, they’re underreacting. Since I, too, think that there is a non-trivial chance of AI killing everyone, I should really be applauding them, not sneering.
A couple of other people who aren’t sneering are Eliezer Yudkowsky and Nate Soares, who have just published If Anyone Builds It, Everyone Dies. They are, respectively, the founder and president of the Machine Intelligence Research Institute, which, long ago — way back in about 2002, when OpenAI’s ChatGPT was not even a twinkle in Sam Altman’s eye, and in fact Sam Altman himself was just starting high school — became what was probably the world’s first organised group dedicated to preventing AI from destroying the world.
Yudkowsky and Soares are so worried about the deadly potential of AI that they’re after something much more dramatic than a hunger strike. They want a global agreement, backed up with the sort of teeth that global anti-nuclear-proliferation treaties have, banning all frontier AI research. And yes, that would mean that if some country was found to be breaking that treaty, its data centres should be bombed.
And if you accept the premise, that’s not an overreaction either. Elon Musk — who is, don’t forget, the head of a major AI lab — has said that AI is just as dangerous as nukes, and as recently as 2023 that it could lead to “civilisation destruction”. Musk’s former friend Donald Trump is very much still surrounded by AI leaders, though: he is accompanied on his ongoing UK state visit by several Silicon Valley CEOs, including Altman. Trump ordered the bombing of Iranian nuclear facilities earlier this year, because they might be close to making a bomb; should he consider blowing up his friends’ data centres at some point?
Yudkowsky and Soares lay the risks out pretty clearly. First, if AI companies succeed in making a superintelligent AI — which, to reiterate, they are actively trying to do — then that AI will be more powerful than us, in the way that we are more powerful than chimpanzees. Yes, an individual chimp could pummel me to death, but humans as a species have nearly driven chimps extinct without even meaning to. If a superintelligent AI wants to eradicate us, or to achieve some goal which would lead to our eradication as a side effect, then we would not be able to do a great deal about it.
In any case, the pair argues, we don’t actually know what’s going on inside AIs — much as we don’t truly know how the human brain works. There are trillions of simulated connections between billions of simulated neurons in each AI, far too many for any human to check. “The relationship that biologists have with DNA,” they say, “is pretty much the relationship that AI engineers have with the numbers inside an AI.” That is, we can see that some genes correlate somewhat with some behaviour or disease, but we can’t just read out an embryo’s DNA and say “this will be a 6’2” guy who’s interested in maths and can play the guitar”. Similarly, AI engineers cannot look into an AI’s nodes and say “this AI is truthful and human-aligned”, although there are efforts to become better at doing that.
Another problem they lay out is that when we try to give them goals, we don’t really know what we’re doing. We train AI to want to be helpful and honest and so on, but only in a very high-level way. It’s comparable, Y&S argue, to how evolution trained us to “want” to reproduce, but accidentally created humans who “want” to have orgasms, and often actively take steps to stop ourselves reproducing with contraception. An AI’s goals could just as easily be disconnected from the goals we gave it. And even though AIs talk like humans, they are nothing like humans — their minds are alien. The things they want will be weird, and the very basic things that humans want — like staying alive — might not be important to them, despite our best efforts.
Finally, they argue, we’ll only have one chance to get it right. A superintelligent AI could, they say, kill us if it tried. So it must never try, not even on the first time we boot it up. But we can’t test the superintelligent AI — because if you get it wrong the first time it will kill everyone — so you have to test it on stupider, weaker versions, and hope those tests apply. It is, they say, like a space rocket: you can do all the lab testing you like, but it will never be quite like the stress of launching into space at seven miles a second. About 30% of new rocket types explode on their first flight.
“We train AI to want to be helpful and honest and so on, but only in a very high-level way.”
One objection sometimes made at this point is that AI does not want things; it just does what we ask of it. But that is a pointless semantic dispute, as far as Y&S (and I) are concerned: a chess-playing AI acts as though it wants to win at chess. You can tell, by how it protects its king and tries to checkmate yours. If you don’t want to call it “wanting”, fine. But there’s an old AI proverb: “The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” That’s true of “wanting”, too. As AIs get cleverer and more powerful, they’ll be more able to pursue the goals we give them, and will be driven to do so. It will look exactly like “wanting”. Call it what you like.
Once, you might have said that this doesn’t matter: AIs are just some programming in a box, and don’t have power in the world. But we’ve long since connected AIs to the internet, used them to run factories and weapons; one AI got connected to Twitter, was given some money to put into crypto, and is now worth $50,000,000. Money can be exchanged for goods and services. The idea that we could just switch them off, or that they are powerless to affect the real world, was hard enough to sustain 15 years ago; now it is silly.
So taking the points together, Yudkowsky and Soares argue compellingly that if things continue apace, unchecked, we may have, in the coming years, a hyperintelligent machine capable of manipulating the real world, which has weird, alien goals which are unlikely to be compatible with humans being alive; a planet with an oxygen-rich atmosphere and a surface temperature below the boiling point of water is great for us, but not particularly important to an AI. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”
Yes, Y&S are, it should be said, outliers on the “AI doom” scale. They think — as their book title explicitly says — that if anyone builds it, everyone dies. But there are others, such as Scott Alexander, who have much lower estimates of the risk: he describes himself as a “boring moderate” because he thinks there’s less than a 25% chance (!) that AI will kill everyone. Part of the disagreement here seems to be that Y&S think that there could be a sudden, staggeringly rapid takeoff, in which the AI goes from not-dangerous to dangerous within a few hours.
The AI researcher Quintin Pope, for one, thinks a key premise of Yudkowsky’s argument is false: he believes training AI is nothing like evolution, and that it’s more like human learning, so accidentally instilling weird, dangerous values into the AI is unlikely. Pope and his colleague Nora Belrose reassuringly argue that AI “is easy to control” — although even they, self-described AI optimists, think “a catastrophic AI takeover is roughly 1% likely”.
So while it might be fair to think that Yudkowsky and Soares are overconfident in their predictions of doom, we are nonetheless in a situation where the “boring moderates” believe we have a one-in-four chance of dying and the “optimists” think it’s a mere one-in-100. Would you play Russian roulette using a revolver with 100 chambers?
As I was reading If Anyone Builds It, Everyone Dies, I realised that it was not the book I wanted to be reading. The book I wanted to be reading was by someone like Dario Amodei, the CEO of Anthropic, or Demis Hassabis, the CEO of Google DeepMind, or maybe Altman himself. The book would be called something like I’m Going to Build It, And I Don’t Think We’re Going to Die. It would be a chapter-by-chapter rebuttal of this book.
I know that all those men understand the arguments Yudkowsky and Soares are making. Hassabis met his DeepMind cofounder Shane Legg at a conference organised by MIRI; Altman is on record as saying Yudkowsky was critical in his decision to start OpenAI. Anthropic was founded explicitly with AI safety goals which stemmed from Yudkowsky’s work. These guys know this stuff, most of which long predates the publication of this book, and which has been echoed by many other thinkers. They have spoken about the risks. Hassabis has said that AI could do “incredible things for humanity” but also have existential risks if mismanaged; Amodei likewise has written long essays on AI’s enormous potential upsides such as curing disease, ending climate change and wrote a seminal 2016 paper on the potential risks, but has not, to my knowledge, ever explained why he is confident now that the risks are worth taking.
I suspect that Yudkowsky and Soares are too confident that AI means doom. But even the “moderate” position that there’s a 25% chance is at least three orders of magnitude too high for me. I think the guy doing the hunger strike outside DeepMind would say the same, if he hadn’t got peckish and gone home for a snack. The only people who might reasonably explain why we shouldn’t worry are the ones building the superintelligent AIs. But for some reason they’re not telling us.