Blog

Anthropic’s Claude Opus 4 and Sonnet 4 Set a New Benchmark in AI Coding

11 hours ago

1 minute read

On Thursday, Anthropic launched two new AI models under the Claude 4 series — Claude Opus 4 and Claude Sonnet 4. Anthropic says Claude Opus 4 is the “world’s best coding model” and it offers sustained performance on long-horizon, agentic workflows. And Claude Sonnet 4 brings superior coding and reasoning performance than Claude Sonnet 3.7.

First, let’s talk about the Claude Opus 4 AI model. On the SWE-bench verified benchmark which measures performance on real software engineering tasks, Claude Opus 4 achieves 72.5%, slightly higher than OpenAI’s best coding model, Codex-1 which got 72.1%. However, with parallel test-time compute, which appears similar to the Deep Think mode in Gemini 2.5 Pro, Opus 4 achieved a groundbreaking 79.4%.

What is interesting is that the Claude Sonnet 4 model achieves 72.7% on SWE-bench, and with parallel test-time compute, gets 80.2% accuracy — delivering better coding performance than the larger Opus 4 model.

Image Credit: Anthropic

Anthropic says the Claude Sonnet 4 model “balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.“

Claude Opus 4 excels in complex, long-running tasks and agentic workflows, while Claude Sonnet 4 combines strong coding performance and efficiency. Both models are hybrid reasoning models, meaning they can offer near-instant responses and extended thinking for deeper reasoning.

Anthropic also notes that when given access to local files, Claude Opus 4 maintains key information in a memory file. For example, while playing Pokémon, Claude Opus 4 created a navigation guide file to improve its gameplay.

Finally, in terms of safety, the company, for the first time, has activated AI Safety Level 3 (ASL-3) for the Claude Opus 4 model, in line with Anthropic’s Responsible Scaling Policy (RSP). Anthropic has implemented Constitutional Classifiers and other defenses to prevent jailbreaking techniques.

Claude 4 models are rolling out to all paid users under Pro, Max, Team, and Enterprise plans. And thankfully, Claude Sonnet 4 is available to free users as well, but without extended thinking.

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Source link

Anthropic’s Claude Opus 4 and Sonnet 4 Set a New Benchmark in AI Coding

Arjun Sha

Discord might use AI to help you catch up on conversations

Someone Found Over 180 Million User Records in an Unprotected Online Database

Why Should We Save the Consumer Product Safety Commission?

The best wireless earbuds of 2024

Why You Shouldn’t Sign In With Google or Facebook

This Roborock Flexi Wet/Dry Vacuum Mop Is $120 Off Right Now

Microsoft is Offering Sora AI via Bing Video Creator For Free

Dell Technologies World 2025: All in on AI

Microsoft-backed no-code AI startup files for bankruptcy

TikTok Is Pushing Old and False News as ”Breaking” Alerts

Wordle Answer for Today, August 13, 2024

Today’s AI models have a poor grasp of world history – Computerworld

Arjun Sha

Related Articles

Discord might use AI to help you catch up on conversations

Someone Found Over 180 Million User Records in an Unprotected Online Database

Why Should We Save the Consumer Product Safety Commission?

The best wireless earbuds of 2024

Why You Shouldn’t Sign In With Google or Facebook

This Roborock Flexi Wet/Dry Vacuum Mop Is $120 Off Right Now

Microsoft is Offering Sora AI via Bing Video Creator For Free

Dell Technologies World 2025: All in on AI

Microsoft-backed no-code AI startup files for bankruptcy

TikTok Is Pushing Old and False News as ”Breaking” Alerts

Wordle Answer for Today, August 13, 2024

Today’s AI models have a poor grasp of world history – Computerworld