We need to stop caring about AI model releases


If you regularly use an AI tool in your working life, answer the following: which precise AI model does it use? Chances are, if you’re using Microsoft Copilot, Gemini for Google Workspace, or similar, you could probably tell me the rough model family – “Oh, it’s an OpenAI model,” or “I’m certain it’s Gemini”.

But which model number? And why? Does it matter?

As technologists, we can become too focused on software minutiae. Indeed, some of us in the tech space have been conditioned by the past few years of frenetic AI announcements to hang on the every word of AI developers, eagerly comparing the performance of every new release from the likes of OpenAI, Google, Anthropic, or others against one another.

As a journalist based in the UK, the vast majority of AI model releases – both planned announcements and surprise launches – happen sometime after my working day ends. When OpenAI’s GPT-4 first launched, I remember poring over the details well into the evening, preparing my notes for coverage the next day.

But as I’ve become more experienced in the AI space and my perspective on the technology has grown, model launches have become far less interesting. More and more, I’m focused on whether new AI launches bring tangible benefits. Models are just the engines for AI products – the hard to explain algorithms running under the hood – and should only be considered relative to the usefulness of their outputs.

The fervor around models has become like a cooking show that judges contestants solely on their ingredients. Yes, freshness and seasonality matter, much as model efficiencies and the data on which they were trained matters. But no one cares unless you take what you have and make a meal out of it – and no one should.

Nothing will stop AI developers from holding these grand announcements. For many, their entire business model continues to rely on regular funding rounds buoyed by exciting new announcements, even if not backed by profit or return on investment (ROI). But I firmly believe that as AI tools become more commonplace, moving from the hands of early AI evangelists to regular business users, far more focus will be placed on AI product branding, licensing, and effectiveness over the specific model it’s using.

In OpenAI’s announcement for GPT-4.5, the firm encouraged users to manually select the model and have a go seeing if it gives better results than its other offerings. Following the announcement, OpenAI CEO Sam Altman quickly clarified it “isn’t a reasoning model and won’t crush benchmarks” – undermining any attempts to objectively review GPT-4.5 against competing models.

I have to applaud this approach, if only for its bold-facedness. The firm is arguing for an entirely subjective, ‘vibes’ approach to evaluating the effectiveness of AI products – conveniently timed for the moment OpenAI begins to reckon with apparent diminishing model performance and Microsoft begins to back out of the AI training boom.

GPT-4.5 is a model without a clear purpose, described by Altman as capable of indefinable “magic” but – by the firm’s own admission – worse across certain benchmarks than its earlier offering o3-mini. It should be judged on the quality of its outputs rather than its research paper, but I suspect enterprise users will end up settling on different options.

In another recent announcement from Anthropic, its promoted its latest model Claude 3.7 Sonnet as the “best-in-class” for AI-generated code. Both firms are looking for their fixed purpose and once they’ve become known for it, that’s likely all for which they’ll be sought out.

This is the future I see for AI models. Beyond a certain point, they’ll be forgotten about and only updated in the same way you would any other software.

The DeepSeek debacle that consumed the tech sphere in January is a great temperature check for the whole sector. Speaking to friends and family about the situation, it can become hard to describe why the underdog model shocked markets in the way it did – given that DeepSeek doesn’t have killer features that other frontier models lack.

“It’s roughly performant but costs less,” doesn’t pack the same punch as some of the headlines suggesting DeepSeek had changed AI forever. And nor should it; model releases can very well rock the world of enthusiasts, but they’re never going to raise the pulse of your average user.

To bring back the engine analogy, the hubbub around tech models reminds me of conversations with auto enthusiasts. You can keep your eye on the latest releases and breakthroughs but if the car you own is dependable, affordable, and gets you where you need to go, you’ll be loath to get rid of it. AI developers need to take the same approach and plug models into products people will want to keep for the long run.


Source link
Exit mobile version