Claude Opus 4.7 Beats GPT-5.4 & Gemini 3.1 on SWE-bench

Introduction

On April 16, 2026, Anthropic quietly rewrote the AI leaderboard. Claude Opus 4.7 launched with a benchmark sweep that vaulted it ahead of GPT-5.4 and Gemini 3.1 Pro on the most demanding coding and agentic reasoning tasks — and it did so at the same price as Opus 4.6. No markup. No waitlist. Just a stronger model across every Claude surface you already use.

If you build with AI, ship code, or lean on a model for long, context-heavy work, this release changes the calculus. Here is what Claude Opus 4.7 actually delivers, how it compares to the competition in April 2026, and why the new "xhigh" effort mode may be the most quietly important feature of the year.

What Is Claude Opus 4.7? A Quick Overview

Claude Opus 4.7 is Anthropic’s newest flagship large language model, released April 16, 2026, and immediately available across Claude.ai, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. It is a point upgrade in name only — the internal gains over Opus 4.6 are significant enough that Anthropic recommends developers re-tune their prompts.

The highlights at a glance:

SWE-bench Verified: 87.6% (up from 84.2% on Opus 4.6)
SWE-bench Pro: 64.3% (up from 53.4% — an 11-point jump)
Vision accuracy: 98.5% with support for images up to 2,576px on the long edge (~3.75 megapixels, roughly 3x prior)
GPQA Diamond: 94.2%
New xhigh effort mode for balanced deep reasoning
Task budgets (public beta) for predictable token spend on long agent runs
Same price as Opus 4.6: $5 per million input tokens, $25 per million output tokens

Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro: The April 2026 Showdown

Here is the real story: Opus 4.7 is now the top scorer on the benchmarks developers actually care about.

Coding and Software Engineering

On SWE-bench Verified — the industry standard for measuring a model’s ability to resolve real GitHub issues — Claude Opus 4.7’s 87.6% leaves Gemini 3.1 Pro’s 80.6% seven points behind. GPT-5.4 sits between the two. The jump on SWE-bench Pro, which uses harder, longer-horizon tasks, is even more telling: Opus 4.7 posted 64.3%, beating every public frontier model currently available.

Agentic Reasoning and Long-Horizon Tasks

Anthropic also rolled out Managed Agents, a companion capability that handles sandboxing, permissions, state management, and error recovery automatically. Internal tests show task success rates up to 10 points higher than standard prompting on the same model. Combined with Opus 4.7’s improved instruction following, it is a meaningful upgrade for anyone building autonomous workflows.

Vision and Multimodal

The 2,576px input ceiling is quietly massive. It means Opus 4.7 can now read dense dashboards, CAD drawings, long screenshots, and technical schematics without downsampling artifacts eating the important details. Gemini 3.1 still leads on multimodal video, but for static image work Opus 4.7 is state of the art.

Pricing

This is the sleeper story. Opus 4.7 matches Opus 4.6 at $5/$25 per million tokens. GPT-5.4 sits at a comparable tier, and Gemini 3.1 Pro is cheaper but behind on coding. Frontier performance with zero price increase is rare in April 2026 — most releases have trended more expensive, not flat.

The New "xhigh" Effort Mode: Why It Matters

Until now, Claude’s reasoning effort slider had three positions: low, medium, high, and max. Opus 4.7 inserts a new xhigh step between high and max.

Why does another tier matter? Because "max" effort is slow and expensive, and "high" often is not deep enough for complex engineering problems. The new xhigh level hits the sweet spot: noticeably deeper reasoning than high, but considerably faster and cheaper than max. Developers doing code review, architecture analysis, or multi-file refactors now have a tier tuned exactly for those workloads.

Paired with the new /ultrareview command in Claude Code — a dedicated review mode for pull requests — and public-beta task budgets that cap spend on long agent runs, Anthropic is signaling that Opus 4.7 is built for production teams, not just benchmark headlines.

What This Means for Users

For developers, Claude Opus 4.7 is the new default for serious coding work. If you were paying for Opus 4.6 yesterday, you are already on 4.7 today at the same price — but your prompts may need light re-tuning because instruction following is tighter.

For product teams building agents, Managed Agents plus task budgets plus xhigh effort mode mean more predictable runs, fewer surprise token bills, and better completion rates on multi-step tasks. Expect a wave of agent-first startups to switch their default model this week.

For enterprises evaluating frontier AI, the benchmarks matter less than the price stability. Holding the $5/$25 line at frontier quality means total cost of ownership stops climbing. That is the first time in over a year that has been true.

For everyday Claude.ai users, the experience feels sharper and more capable — especially on image-heavy tasks and long technical documents — without any changes to the interface.

Key Takeaways

Claude Opus 4.7 launched April 16, 2026 across Claude.ai, API, Bedrock, Vertex AI, and Microsoft Foundry.
It leads SWE-bench Verified (87.6%) and SWE-bench Pro (64.3%), beating GPT-5.4 and Gemini 3.1 Pro on coding.
Vision input jumps to 2,576px per edge with 98.5% accuracy — ideal for dense screenshots and diagrams.
The new xhigh effort mode bridges high and max for faster, deeper reasoning.
Task budgets and the /ultrareview command target production developer workflows.
Price is unchanged from Opus 4.6: $5 input / $25 output per million tokens.
Managed Agents boosts agentic task success by up to 10 points in Anthropic’s internal tests.

FAQ

When was Claude Opus 4.7 released?

Claude Opus 4.7 was released on April 16, 2026, by Anthropic. It is live immediately on Claude.ai, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Is Claude Opus 4.7 better than GPT-5.4 and Gemini 3.1 Pro?

On coding benchmarks, yes. Claude Opus 4.7 scores 87.6% on SWE-bench Verified versus Gemini 3.1 Pro’s 80.6%, and it leads SWE-bench Pro with 64.3%. GPT-5.4 still leads on OSWorld-V computer-use tasks, and Gemini 3.1 leads on long video understanding. The right model depends on your workload.

How much does Claude Opus 4.7 cost?

Pricing is unchanged from Claude Opus 4.6: $5 per million input tokens and $25 per million output tokens. Claude.ai Pro and Team subscribers get access at their existing subscription price.

What is the xhigh effort mode in Claude Opus 4.7?

The xhigh effort mode is a new reasoning setting that sits between "high" and "max." It delivers deeper reasoning than high while remaining faster and cheaper than max — useful for complex engineering, code review, and multi-step analysis where max is overkill.

Do I need to change my prompts for Claude Opus 4.7?

Anthropic recommends re-tuning existing prompts. Opus 4.7 follows instructions more literally than Opus 4.6, so overly loose prompts may produce different behavior. Most developers report only minor edits are needed.

What is the context window for Claude Opus 4.7?

Claude Opus 4.7 supports a 1 million token context window on Enterprise and the API, the same as Opus 4.6. It can process long documents, entire codebases, and extensive multi-turn agent histories in a single session.

Is Claude Opus 4.7 available in Claude Code?

Yes. Claude Code users now have Opus 4.7 as the default engine, plus the new /ultrareview command for dedicated code-review sessions with deeper inspection than the standard review flow.

Conclusion

Claude Opus 4.7 is not a headline-grabbing new architecture. It is something arguably more important: a steady, measurable step up in the coding, reasoning, and agentic work that frontier models are actually used for — at a flat price, with production-grade tooling attached. In a quarter where every other lab has raised prices alongside capability, Anthropic’s discipline here will be felt in budgets and roadmaps well beyond April 2026.

If your stack touches code, agents, or long-context work, Opus 4.7 is the model to try this week.

Claude Opus 4.7 Just Beat GPT-5.4 and Gemini 3.1 on SWE-bench (April 2026)

Introduction

What Is Claude Opus 4.7? A Quick Overview

Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro: The April 2026 Showdown

Coding and Software Engineering

Agentic Reasoning and Long-Horizon Tasks

Vision and Multimodal

Pricing

The New "xhigh" Effort Mode: Why It Matters

What This Means for Users

Key Takeaways

FAQ

When was Claude Opus 4.7 released?

Is Claude Opus 4.7 better than GPT-5.4 and Gemini 3.1 Pro?

How much does Claude Opus 4.7 cost?

What is the xhigh effort mode in Claude Opus 4.7?

Do I need to change my prompts for Claude Opus 4.7?

What is the context window for Claude Opus 4.7?

Is Claude Opus 4.7 available in Claude Code?

Conclusion

Sign up to receive email updates, fresh news and more!

Introduction

What Is Claude Opus 4.7? A Quick Overview

Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro: The April 2026 Showdown

Coding and Software Engineering

Agentic Reasoning and Long-Horizon Tasks

Vision and Multimodal

Pricing

The New "xhigh" Effort Mode: Why It Matters

What This Means for Users

Key Takeaways

FAQ

When was Claude Opus 4.7 released?

Is Claude Opus 4.7 better than GPT-5.4 and Gemini 3.1 Pro?

How much does Claude Opus 4.7 cost?

What is the xhigh effort mode in Claude Opus 4.7?

Do I need to change my prompts for Claude Opus 4.7?

What is the context window for Claude Opus 4.7?

Is Claude Opus 4.7 available in Claude Code?

Conclusion

Related Posts