Imagine that it’s the early Thirties. You’re an educated person; you know a bit about modern physics. You know Einstein has shown that time and space are linked, and that matter and energy are the same thing. You picture the atom as something like a little solar system, electrons orbiting a nucleus. And you know that, for the last couple of decades, there’s been a debate going on. Can splitting that tiny nucleus release enough energy to power cities — or to blow them up?
The radiochemist Frederick Soddy had argued that it could, as early as 1909; his work inspired HG Wells’ science fiction novel The World Set Free, which featured planes dropping nuclear weapons and the Earth being destroyed.
But other physicists were unconvinced. Soddy’s sometime collaborator, the great Ernest Rutherford, was the first to demonstrate that splitting the atom would release energy, by using a primitive particle accelerator to fire protons into a sheet of lithium; the demonstration greatly excited The New York Times, which reported that “the innermost citadel of matter, the nucleus of the atom, can be smashed, yielding tremendous amounts of energy.”
Rutherford himself, though, had noted that you had to fire so many protons in order to hit a single lithium nucleus that you would put far more energy in than you ever got out. In 1933, he said in a speech that “anyone who expects a source of power from the transformation of these atoms is talking moonshine.”
What should you, our educated but non-expert Thirties layperson, believe? You have two eminent experts in the field who thoroughly disagree. It sounds like science fiction; a single bomb with the power of 10,000 tons of dynamite. In fact it is science fiction, thanks to Mr Wells. But does that mean it is false?
Of course, we now know it was not false. Soddy was right and Rutherford was wrong; literally the day after the latter made the “moonshine” comment, the Hungarian physicist Leo Szilard came up with the idea of the nuclear chain reaction, in which splitting a single atom fires out smaller particles which in turn split neighbouring ones, giving out far more energy than you put in. It is the basis of nuclear fission reactors and bombs.
And, in 1939, Szilard and Einstein would write a famous letter to the US president warning that Germany could use uranium to make atomic bombs, a letter which sparked the Manhattan Project and, eventually, led to the bombs that destroyed Hiroshima and Nagasaki.
But in the early Thrties, without the benefit of hindsight, what should you have believed? You don’t have the technical knowledge to gainsay either Soddy or Rutherford. You just have to decide who to trust; and conveniently, whatever your pre-existing beliefs are, you can choose an expert to back you up. But I would argue that the correct position to hold would be one of significant uncertainty, and if you are genuinely uncertain whether some new technology could destroy the world, you should probably be a bit scared.
We are in a Thirties moment now with artificial intelligence. If you ask some people in the field, it has immense promise — it could end poverty and climate change, cure ageing — and immense threat; it could create an eternal dictatorship or even end all life on Earth. If you ask others, it is a “stochastic parrot”, capable of predicting the next word in a string but little more valuable than that.
You can, in fact, choose your champion, just as you could have in 1933: the three men often described as the “godfathers of AI” — Geoffrey Hinton, Yoshua Bengio and Yann LeCun, the pioneers of the deep learning architecture that powers modern AI — disagree on the topic. Hinton regularly warns that AI could kill everyone and has said that he is “kind of glad” that he is 77 and thus may not live long enough to see any potential apocalypse. Bengio recently said “the worst-case scenario is human extinction”. But LeCun, although he is concerned about near-term harms such as misinformation and bias, calls the idea “preposterously ridiculous”. (It’s not quite “merest moonshine,” but it’s a spiritual successor.)
And just like in 1933, it sounds like science fiction. Everyone’s go-to mental model of an AI apocalypse is The Terminator, and it’s hard to take Arnold Schwarzenegger seriously.
Who is right: Bengio and Hinton, or LeCun? We don’t know. But what I want to argue is that it is appropriate to be scared. It would have been appropriate to be scared in 1933. If two eminent experts disagree about some point in their subject matter, the appropriate response is uncertainty. And uncertainty over whether or not the world is on the brink of building weapons that can destroy cities, and potentially civilisation, is frightening.
Back in 2019 I wrote a book about AI, and about the “rationalist” community of people who were warning, like Hinton and Bengio, that it could destroy the world. Back then, the timelines people had in mind were decades: the arrival of truly transformative AI would likely be in my children’s lifetimes, people said, but since my children were two and three years old at the time, that didn’t feel just around the corner. When I was researching the book in 2017, it was still pretty exciting that AIs could reliably tell the difference between a picture of a cat and a picture of a dog.
Now, though, things look very different. I just asked Claude, Anthropic’s large language model, to write me a four-stanza poem about AI safety in Beowulf-style alliterative verse. “Hear now, hall-thanes, of hazards unheeded,” it writes, “When minds made of metal shall march beyond makers.” I’m a little unsure about the scansion of “Algorithmic armies with aims all their own”, but it is completely mental that it can do that. AIs can make photorealistic video, design drugs, write code. It has not been very long since I wrote that book, but the situation has changed beyond recognition.
And as a result, people’s timelines for when “transformative”, or “superintelligent”, AI will arrive have shortened. One recent paper by a former OpenAI programmer suggested it might be 2027. That’s one of the more bullish estimates, but I’ve spoken to people at other AI companies, and no one thinks it’ll be more than 10 years. Dario Amodei, the CEO of Anthropic, wrote the other day that “I believe that these systems could change the world, fundamentally, within two years; in 10 years, all bets are off”. It’s not my children’s lifetimes we’re talking about any more, it’s my children’s childhoods.
“It’s not my children’s lifetimes we’re talking about any more, it’s my children’s childhoods.”
And, again, lots of them think this could be extraordinarily bad. As in, human extinction.
It’s probably worth, very briefly, explaining why they think this. The idea is essentially that you give an AI some goal, and it will try to carry out exactly that goal, as written, not whatever it is that you actually wanted. Humans do this too. Think of how hard it is to write tax codes that don’t have loopholes, and how hard people work to stay within the letter of the law while avoiding as much tax as possible.
But AIs will (soon, probably) be much cleverer and more capable than humans and better at finding loopholes, and — crucially — will care about nothing at all except fulfilling its goal, and not all the messy, fuzzy things humans care about, like society and staying alive.
Also, whatever goals you give an AI, there are certain things it will probably want to do to be able to fulfil those goals. Among them will be gaining resources and not being switched off.
And finally, there’s a thing in game theory called the Thucydides trap: essentially, it’s how the First World War started, with no one exactly wanting war, but no one trusting their neighbours — so they all take steps to protect themselves, which look to their neighbours like preparations for war. In many scenarios, the rational move is to attack your opponent even if you would prefer peace.
So you would have a superintelligent AI that cares only about fulfilling the specific task you’ve given it, which it will interpret in unpredictable ways possibly very unlike what you intended; that is extremely keen to gather resources and to not be switched off in order to fulfil that task; and which — if it doesn’t trust you not to switch it off — may consider it the rational move to destroy you.
I don’t expect to have convinced you in four paragraphs that it’s a genuine concern. But the point is that many serious AI thinkers are worried about it, just as some serious physicists were worried in the Thirties that splitting the atom could create staggeringly destructive weapons, and you do not have the technical knowledge to gainsay them. Even if you did, the fact that someone equally knowledgeable is worried should give you pause.
I should say that AI companies are keen to do something about it. Google DeepMind recently released a 145-page document on AI safety, looking at ways to prevent “misalignment” — that is, an AI pursuing its own interests, rather than ours — or misuse. And Anthropic was explicitly founded to create safe — “helpful, harmless, and honest” — AI, and much of its intellectual foundation comes from concerned rationalists I wrote my book about.
It’s also worth noting that when I wrote my book, large language models like ChatGPT weren’t really around; they work very differently, and don’t seem to be the relentlessly goal-driven superintelligent AIs we pictured a decade ago. Instead, they put on personas depending on what you want from them. If you tell a model you’re a child, it adopts child-friendly tones; if you act like a mathematician it talks to you like a mathematician. This may explain the worrying tendency toward scheming, deception, and blackmail that some AIs display in tests — the AI “knows” it is in a test, and behaves in a way it thinks appropriate to those tests. One AI researcher at a big company told me for this piece that the rise of LLMs have made him much less concerned than he used to be about the classic “AI goes crazy” scenario: and that he was now more worried about its misuse by humans. He did, though, say that he still wanted lots more research on how to make AI safe, and the risk remains real.
Exactly what to do about it, I don’t know: Eliezer Yudkowsky, one of the first people to raise the concern of AI catastrophic risk, wrote in 2023 that there should be an indefinite moratorium on training large AI models, that all large data centres should be shut down, and that countries should “be willing to destroy a rogue datacenter by airstrike” if those rules were broken. Dario Amodei of Anthropic doesn’t quite go that far, but in his Times piece said that big AI companies should be legally required to disclose their policies for evaluating and testing frontier AI models to ensure that they are safe. My own position is probably closer to Amodei’s, though I can understand Yudkowsky’s logic.
For what it’s worth, I think that the most likely outcome is that we don’t all die; that AI has some really huge impacts on the world, but that they’re mostly for the better. But I am not wholly confident in that, and I think that anyone who is confident, who says that AI’s threat is merest moonshine or the modern equivalent, is overconfident; too sure of themselves. And not being confident about whether we’ll all be alive in 10 years, or two, is a scary thing, and I think being scared is appropriate.
Rutherford was overconfident too.