In the dizzying race to build generative AI systems, the tech industry’s mantra has been: “bigger is better,” regardless of price.
Now tech companies are starting to adopt smaller AI technologies that aren’t as powerful but cost much less. And for many clients, that can be a good trade-off.
On Tuesday, Microsoft unveiled three smaller AI models that are part of a family of technologies the company has called Phi-3. The company said that even the smallest of the three performed almost as well as GPT-3.5, the much larger system that underpinned OpenAI’s ChatGPT chatbot when it surprised the world with its launch in late 2022.
The smaller Phi-3 model fits on a smartphone, so it can be used even if you are not connected to the Internet. And it can run on the types of chips that power regular computers, rather than more expensive processors made by Nvidia.
Because smaller models require less processing, large technology providers can charge customers less to use them. They hope that means more customers can apply AI in places where larger, more advanced models have been too expensive to use. Although Microsoft said that using the new models would be “substantially cheaper” than using larger models like GPT-4, it did not offer details.
Smaller systems are less powerful, which means they may be less accurate or sound more cumbersome. But Microsoft and other technology companies are betting that customers will be willing to give up some performance if it means they can finally afford AI.
Customers imagine many ways to use AI, but with larger systems “they say, ‘Oh, but you know, they can get a little expensive,'” said Eric Boyd, a Microsoft executive. Smaller models, almost by definition, are cheaper to implement, he said.
Boyd said some customers, like doctors or tax preparers, could justify the costs of larger, more accurate AI systems because their time was so valuable. But many tasks may not require the same level of precision. Online advertisers, for example, believe they can target ads better with AI, but need lower costs to be able to use the systems regularly.
“I want my doctor to do things right,” Boyd said. “In other situations, where I’m summarizing opinions from online users, if it’s a little off, it’s not the end of the world.”
Chatbots run on large language models, or LLMs, mathematical systems that spend weeks analyzing digital books, Wikipedia articles, news articles, chat logs, and other text selected from the Internet. By identifying patterns throughout that text, they learn to generate text on their own.
But LLMs store so much information that retrieving what is needed for each chat requires considerable computing power. And that is expensive.
While tech giants and startups like OpenAI and Anthropic have focused on improving larger AI systems, they are also racing to develop smaller models that offer lower prices. Meta and Google, for example, launched smaller models over the past year.
Meta and Google also have these “open source” models, meaning anyone can use and modify them for free. This is a common way for companies to get outside help to improve their software and encourage the broader industry to use their technologies. Microsoft is also opening source code for its new Phi-3 models.
(The New York Times sued OpenAI and Microsoft in December for copyright infringement of news content related to artificial intelligence systems.)
After OpenAI launched ChatGPT, Sam Altman, the company’s CEO, said the cost of each chat was “single digit cents” – a huge expense considering what popular web services like Wikipedia offer for tiny fractions of a cent.
Now, researchers say their smaller models can at least approach the performance of leading chatbots like ChatGPT and Google Gemini. Basically, the systems can still analyze large amounts of data but store the patterns they identify in a smaller package that can operate with less processing power.
The construction of these models involves a balance between power and size. Sébastien Bubeck, a researcher and vice president at Microsoft, said the company built its new smaller models by refining the data injected into them, working to ensure the models learned from higher quality text.
Some of this text was generated by the AI itself, known as “synthetic data.” Human curators then worked to separate the sharpest text from the rest.
Microsoft has built three different small models: Phi-3-mini, Phi-3-small, and Phi-3-medium. Phi-3-mini, which will be available Tuesday, is the smallest (and cheapest) but the least powerful. Phi-3 Medium, which is not yet available, is the most powerful but the largest and most expensive.
Making the systems small enough to go directly to a phone or personal computer “will make them much faster and much less expensive,” said Gil Luria, an analyst at investment bank DA Davidson.