Is there reassurance in a new study about the potential sneakiness of superintelligent AI?

Dec 26, 2023 ChatGPT

Will an artificial intelligence (AI) superintelligence appear suddenly, or will scientists see it coming, and have a chance to warn the world? That’s a question that has received a lot of attention recently, with the rise of large language models, such as ChatGPT, which have achieved vast new abilities as their size has grown.

Some findings point to “emergence”, a phenomenon in which AI models gain intelligence in a sharp and unpredictable way. But a recent study calls these cases “mirages” — artifacts arising from how the systems are tested — and suggests that innovative abilities instead build more gradually.

The work was presented last week at the NeurIPS machine-learning conference in New Orleans.

Large language models are typically trained using huge amounts of text, or other information, which they use to generate realistic answers by predicting what comes next. Even without explicit training, they manage to translate language, solve mathematical problems, and write poetry or computer code. The bigger the model is — some have more than a hundred billion tunable parameters — the better it performs. Some researchers suspect that these tools will eventually achieve artificial general intelligence (AGI), matching and even exceeding humans on most tasks.

The new research tested claims of emergence in several ways. In one approach, the scientists compared the abilities of four sizes of OpenAI’s GPT-3 model to add up four-digit numbers. Looking at absolute accuracy, performance differed between the third and fourth size of model from nearly 0% to nearly 100%. But this trend is less extreme if the number of correctly predicted digits in the answer is considered instead. The researchers also found that they could also dampen the curve by giving the models many more test questions — in this case, the smaller models answer correctly some of the time.

Next, the researchers looked at the performance of Google’s LaMDA language model on several tasks. The ones for which it showed a sudden jump in apparent intelligence, such as detecting irony or translating proverbs, were often multiple-choice tasks, with answers scored discretely as right or wrong. When, instead, the researchers examined the probabilities that the models placed on each answer — a continuous metric — signs of emergence disappeared.

Finally, the researchers turned to computer vision, a field in which there are fewer claims of emergence. They trained models to compress and then reconstruct images. By merely setting a strict threshold for correctness, they could induce apparent emergence.

Study co-author Sanmi Koyejo, a computer scientist at Stanford University in Palo Alto, California, says that it wasn’t unreasonable for people to accept the idea of emergence, given that some systems exhibit abrupt “phase changes”. He also notes that the study can’t completely rule it out in large language models — let alone in future systems — but adds that “scientific study to date strongly suggests most aspects of language models are indeed predictable”.

The work also has implications for AI safety and policy. “The AGI crowd has been leveraging the emerging-capabilities claim,” Raji says. Unwarranted fear could lead to stifling regulations or divert attention from more pressing risks. “The models are making improvements, and those improvements are useful,” she says. “But they’re not approaching consciousness yet.”

ChatGPT