OpenAI has unveiled a new series of reasoning models dubbed OpenA1 o1-preview and o1-mini, previously codenamed Strawberry.
The new o1 model is different from its GPT line of large language models, which make up the core of OpenAI’s releases to date. For that range, GPT-4o is the most recent update, unveiled over the summer.
The core difference is the o1 models are better at reasoning and have been designed with detail in mind. While they are large language models (LLMs) like their GPT siblings, the new lineup has been “designed to spend more time thinking before they respond,” the company explained in a blog post.
It hopes that o1-preview could be especially useful for users seeking to solve problems in fields where 4o or GPT-4 are less effective. However, with added detail comes added time – o1-preview is much slower to return an answer than ChatGPT, taking 30 seconds to solve a problem on average per The Verge.
“o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it,” wrote Sam Altman, CEO at OpenAI, in a post on X.
“But also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning.”
While OpenAI hasn’t expanded upon how o1-preview was trained, it’s drawn attention to a new algorithm it developed so that the model can break difficult problems down into easier steps and correct mistakes it makes while processing problems.
“Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process,” the company said in a technical post.
All of this gives the o1 models the ability to reason through science, coding, and math tasks that are more complex than previous models could manage. By taking its time, the model appears to return fewer hallucinations, an industry term for incorrect answers — but it’s worth noting OpenAI hasn’t made the claim that o1-preview is free of hallucinations altogether.
Who is o1-preview for?
OpenAI claims that the o1-preview model can perform on par with PhD students on physics, chemistry, and biology tasks, and does better in math and coding than GPT-4o. The company said its previous model could solve 13% of the problems in the International Mathematics Olympiad, while the new model scored 83%, with similar leaps forward seen in Codeforces competitions.
While that’s impressive, the o1 models remain in early stages of development, meaning they lack some features of ChatGPT, including searching the web and uploading images or files.
“For many common cases GPT-4o will be more capable in the near term,” the company said. “But for complex reasoning tasks this is a significant advancement and represents a new level of AI capability.”
Alongside o1-preview, OpenAI also unveiled o1-mini, a faster and cheaper small language model, saying it excels at writing and debugging code. “As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge,” the company wrote.
That matters, as the new models are not only more capable than previous ones but more expensive too. OpenAI has priced the o1 preview at $15 per one million input tokens and $60 per 1 million output tokens — three and four times the price of GPT-o4 respectively.
The costs mean the new models may need to be used sparingly by most businesses already concerned by AI’s high costs. OpenAI said the o1 models’ reasoning capabilities make them particularly useful for complex problems in science, coding or math.
“For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows,” the company said.
So far, the o1-preview is only available to some API users. That said, OpenAI has already enabled access to the o1 models in ChatGPT for paid-for subscribers of its ChatGPT Plus and Team packages, as well as enterprise and educational users. Eventually, the models will be added to the free version too.
Initially, users can select which model they’d like to use, with weekly limits of 30 questions for 01-preview and 50 questions for o1-mini. The eventual aim is to increase those limits and let ChatGPT choose the best model to answer a prompt, and to enable the new models to access the web, handle uploaded files, and other tasks that GPT-o4 can manage.
Source link