The Top 3 Game-changing AI Breakthroughs in 2023
In many ways, 2023 was the year that people began to understand what AI really is—and what it can do. It was the year that chatbots first went truly viral, and the year that governments began taking AI risk seriously. Those developments weren’t so much new innovations, as they were technologies and ideas taking center-stage after a long gestation period. But there were plenty of new innovations, too. Here are three of the biggest from the past year:
Multimodality
“Multimodality” might sound like jargon, but it’s worth understanding what it means: it’s the ability of an AI system to process lots of different types of data—not just text, but also images, video, audio and more. This year was the first time that the public gained access to powerful multimodal AI models. OpenAI’s GPT-4 was the first of these, allowing users to upload images as well as text inputs. GPT-4 can “see” the contents of an image, which opens up all kinds of possibilities, for example asking it what to make for dinner based on a photograph of the contents of your fridge. In September, OpenAI rolled out the ability for users to interact with ChatGPT by voice as well as text. Google DeepMind’s latest model Gemini, announced in December, can also work with images and audio. A launch video shared by Google showed the model identifying a duck based on a line drawing on a post-it note. In the same video, after being shown an image of pink and blue yarn and asked what it could be used to create, Gemini generated an image of a pink and blue octopus plushie. (The marketing video appeared to show Gemini observing moving images and responding to audio commands in real time, but in a post on its website, Google said the video had been edited for brevity—and that the model was being prompted using still images, not video, and text prompts, not audio, although the model does have audio capabilities.)
Constitutional AI
One of the biggest unanswered questions in AI is how to align it to human values. If these systems become smarter and more powerful than humans, they could cause untold harm to our species—some even say total extinction—unless, somehow, they are constrained by rules that put human flourishing at their center. The process that OpenAI used to align ChatGPT (to avoid the racist and sexist behaviors of earlier models) worked well—but it required a large amount of human labor, through a technique known as “reinforcement learning with human feedback,” or RLHF. Human raters would assess the AI’s responses and give it the computational equivalent of a doggy treat if the response was helpful, harmless, and compliant with OpenAI’s list of content rules. By rewarding the AI when it was good and punishing it when it was bad, OpenAI developed an effective and relatively harmless chatbot. But since the RLHF process relies heavily on human labor, there’s a big question mark over how scalable it is. It’s expensive. It’s subject to the biases or mistakes made by individual raters. It becomes more failure-prone the more complicated the list of rules is. And it looks unlikely to work for AI systems that are so powerful they begin doing things humans can’t comprehend. Constitutional AI—first described by researchers at top AI lab Anthropic in a December 2022 paper—tries to address these problems, harnessing the fact that AI systems are now capable enough to understand natural language. The idea is quite simple. First, you write a “constitution” that lays out the values you’d like your AI to follow. Then you train the AI to score responses based on how aligned they are to the constitution, and then incentivize the model to output responses that score more highly. Instead of reinforcement learning from human feedback, it’s reinforcement learning from AI feedback. “These methods make it possible to control AI behavior more precisely and with far fewer human labels,” the Anthropic researchers wrote. Constitutional AI was used to align Claude, Anthropic’s 2023 answer to ChatGPT.
Text-to-video
One noticeable outcome of the billions of dollars pouring into AI this year has been the rapid rise of text-to-video tools. Last year, text-to-image tools had barely emerged from their infancy; now, there are several companies offering the ability to turn sentences into moving images with increasingly fine-grained levels of accuracy. One of those companies is Runway, a Brooklyn-based AI video startup that wants to make filmmaking accessible to anybody. Its latest model, Gen-2, allows users to not just generate a video from text, but also change the style of an existing video based on a text prompt (for example, turning a shot of cereal boxes on a tabletop into a nighttime cityscape,) in a process it calls video-to-video. Another startup in the text-to-video space is Pika AI, which is reportedly being used to create millions of new videos each week. Run by two Stanford dropouts, the company launched in April but has already secured funding that values it at between $200 and $300 million, according to Forbes.