Anthropic’s Claude Opus 4 and Sonnet 4 Set a New Benchmark in AI Coding

On Thursday, Anthropic launched two new AI models under the Claude 4 series — Claude Opus 4 and Claude Sonnet 4. Anthropic says Claude Opus 4 is the “world’s best coding model” and it offers sustained performance on long-horizon, agentic workflows. And Claude Sonnet 4 brings superior coding and reasoning performance than Claude Sonnet 3.7.
First, let’s talk about the Claude Opus 4 AI model. On the SWE-bench verified benchmark which measures performance on real software engineering tasks, Claude Opus 4 achieves 72.5%, slightly higher than OpenAI’s best coding model, Codex-1 which got 72.1%. However, with parallel test-time compute, which appears similar to the Deep Think mode in Gemini 2.5 Pro, Opus 4 achieved a groundbreaking 79.4%.
What is interesting is that the Claude Sonnet 4 model achieves 72.7% on SWE-bench, and with parallel test-time compute, gets 80.2% accuracy — delivering better coding performance than the larger Opus 4 model.
Anthropic says the Claude Sonnet 4 model “balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.“
Claude Opus 4 excels in complex, long-running tasks and agentic workflows, while Claude Sonnet 4 combines strong coding performance and efficiency. Both models are hybrid reasoning models, meaning they can offer near-instant responses and extended thinking for deeper reasoning.
Anthropic also notes that when given access to local files, Claude Opus 4 maintains key information in a memory file. For example, while playing Pokémon, Claude Opus 4 created a navigation guide file to improve its gameplay.
Finally, in terms of safety, the company, for the first time, has activated AI Safety Level 3 (ASL-3) for the Claude Opus 4 model, in line with Anthropic’s Responsible Scaling Policy (RSP). Anthropic has implemented Constitutional Classifiers and other defenses to prevent jailbreaking techniques.
Claude 4 models are rolling out to all paid users under Pro, Max, Team, and Enterprise plans. And thankfully, Claude Sonnet 4 is available to free users as well, but without extended thinking.
Source link