Claude Opus 4.7

Anthropic's flagship Claude 4 model. The strongest pick in the family for long-horizon agent loops, hard reasoning, and first-attempt code edits.

1M tokens · Text / Vision / Code · Prompt cache

Use Claude Opus 4.7 on VM0

Claude Opus 4.7 is the model you reach for when the work has to be right the first time: code that compiles cleanly, multi-step plans that don't lose the thread across long tool chains, abstract puzzles smaller models stumble on. Vendor benchmarks (SWE-bench Verified, Terminal-Bench 2.0, ARC AGI 2, OSWorld, BrowseComp) put concrete numbers on the gains over Opus 4.6.

Vendor list price is $5 / $25 per 1M tokens with cached input at $0.50 / 1M, the highest in the Claude family. The cost-effective pattern is to keep Sonnet 4.6 as the default and route only the hardest steps to Opus.

What is Claude Opus 4.7?

April 2026 (succeeding Opus 4.6) · Top-tier of the Claude 4 family. Anthropic's recommended upgrade for users on Opus 4.6.

Claude Opus 4.7 is the flagship of Anthropic's Claude 4 family, released in April 2026 as the recommended upgrade from Opus 4.6. Anthropic frames it as a step-change improvement on agentic coding and abstract reasoning rather than a refresh on the surface API. The 1M-token context window and adaptive-thinking effort levels introduced in 4.6 carry over unchanged, so existing agent code drops in without rewrites.

Compared to Sonnet 4.6 (the workhorse in the same family), Opus invests more compute per token. The behavioural payoff shows up in three places: fewer dropped instructions on long agent loops, materially better first-attempt code patches, and stronger recall once the conversation history grows past 100K tokens. The trade-off is the highest list price in the Claude family ($5 / $25 per 1M tokens) and slower per-token output speed, which is why Anthropic itself positions Opus as the orchestrator or escalation tier rather than the everywhere-default.

Independent leaderboards (Artificial Analysis, Vellum) corroborate the relative ordering against Opus 4.6, but absolute numbers shift weekly and OpenAI has flagged training-data contamination on SWE-bench Verified across all frontier models. Treat the public scores as directional rather than authoritative; the structured behavioural differences (long-loop coherence, first-attempt patch quality, multi-tool routing reliability) are the more durable signal.

What's notable about Claude Opus 4.7

Headline architecture and capability features.

Opus 4.7 keeps the 1M-token context window from Opus 4.6, billed at standard input pricing across the entire window. It supports adaptive thinking at four effort levels (low, medium, high, and max), a Compaction API for server-side context summarisation on long runs, and prompt caching where cached input bills at one-tenth the input rate. Multi-agent and tool-use surfaces are unchanged from 4.6, including the Mailbox Protocol for peer-to-peer agent teams and the inference_geo parameter that exposes US-only inference at a 1.1× multiplier. Inputs are multimodal across text, vision, and code.

Specs at a glance

FamilyClaude 4 generation

ModalitiesText, vision, code

LanguagesEnglish-first, multilingual

Prompt cachingSupported (Anthropic)

Context window1M tokens

Max outputUp to 64K tokens

Effort levelsLow / Medium / High / Max

Vendor list price$5 input / $25 output per 1M

Claude Opus 4.7 benchmarks

Vendor-reported scores from Anthropic's Opus 4.7 release materials, with deltas shown against the public Opus 4.6 numbers. Independent reviews place 4.7 ahead of GPT-5.2 on most agentic-coding tasks and within a few points of Gemini 3 Pro on abstract reasoning. Treat absolute percentages as directional; OpenAI has flagged training-data contamination on SWE-bench Verified across all frontier models.

SWE-bench Verifiedvendor-reported; up from Opus 4.6's 80.8%

~83.5%

SWE-bench Provendor-reported

Leads Claude family at release

Terminal-Bench 2.0vendor-reported; up from Opus 4.6's 65.4%

~71%

τ2-bench Retailvendor-reported tool-use

~93%

OSWorld (computer use)vendor-reported; up from Opus 4.6's 72.7%

~76%

BrowseCompvendor-reported web tasks

~88%

ARC AGI 2vendor-reported; up from Opus 4.6's 68.8%

~75%

Humanity's Last Exam (with tools)vendor-reported

Leads Claude family

GPQA Diamondvendor-reported graduate-level science

~92%

MRCR v2 (1M, 8-needle)long-context recall

Improved over 4.6's 76%

MMMU Pro (multimodal)vendor-reported

Leads Claude family

Claude Opus 4.7 pricing

Provider list price, per 1M tokens.

Input$5.00

Output$25.00

Cache read$0.50

Cache write$6.25

How Claude Opus 4.7 behaves in practice

Observed behaviour from production agent runs.

Tool routing

Lowest rate of mis-routed tool calls in the Claude family. The gap versus Sonnet 4.6 widens on hard edge cases such as conditional tool selection, deeply nested arguments, and tool calls dispatched after long stretches of reasoning.

Long-context recall

Coherent across 200K+ token agent transcripts. The 1M-token window holds up far better than predecessors thanks to the context-rot improvements Anthropic introduced in Opus 4.6 and refined further for 4.7. Vendor-reported MRCR v2 at 1M shows a measurable lift over Opus 4.6's 76%.

First-attempt code edits

Strongest patch quality in the Claude family. The right pick when an agent has to modify code that must keep compiling and passing tests, especially when the patch spans multiple files. Anthropic's Terminal-Bench 2.0 result reflects this directly.

Speed

Slower than Sonnet 4.6 and noticeably slower than Haiku 4.5. Anthropic publishes ~41 tokens/sec at max effort for Opus 4.6, and 4.7 is in a similar range. Reserve it for the steps that actually need the extra reasoning depth and run lighter tiers in parallel.

Hallucination behaviour

Opus 4.7 retains Anthropic's conservative refusal posture and tends to admit uncertainty rather than confabulate, which is the reason production teams keep paying the premium for high-stakes reasoning despite cheaper open-weight alternatives like Kimi K2.6 and DeepSeek V4 Pro now matching it on benchmarks.

Best agent tasks for Claude Opus 4.7

The PR review that catches what humans miss

When a pull request changes 30 files, Opus 4.7 keeps the entire change in working memory and writes a review that ties what changed in auth/middleware.ts to the test it broke in routes/admin.test.ts. Junior reviewers get the kind of cross-file feedback that senior engineers usually catch on a second pass, and the team ships fewer patches that pass CI but break in production.

The research run that reads the whole pile

Drop a 200-page contract draft, three competitor proposals, and last quarter's legal opinions into the 1M-token context window, then ask Opus to flag every clause that's tighter than market and list the likely negotiation points. Smaller models start dropping earlier sections after 100K tokens; Opus keeps the whole picture in view and references the exact paragraph it's quoting.

The orchestrator running a multi-tool plan

Use Opus 4.7 as the planner that breaks a customer's request into ten steps, dispatches each step to a Sonnet- or Haiku-tier sub-agent, and stitches the results back together. Running Opus only at the planner layer (and the cheaper tiers everywhere else) costs a fraction of running Opus end-to-end, with most of the quality preserved.

The first-try code edits that don't waste a CI run

Ask Opus 4.7 to migrate a 50-file codebase from one ORM to another, refactor a tangled module, or apply a security fix across the repo. The patch applies cleanly on the first attempt more often than any other model in the family, which is what vendor-reported Terminal-Bench 2.0 reflects, and what your CI bill will reflect too.

When to skip Claude Opus 4.7

Skip Opus 4.7 on high-volume routine work where Sonnet 4.6 hits the same quality bar at a fraction of the cost, on latency-sensitive chat replies where Haiku 4.5 is much faster, and on bulk classification or extraction jobs where DeepSeek V4 Flash is roughly 80× cheaper at the vendor level.

Claude Opus 4.7 vs other models

Claude Opus 4.7 vs Claude Sonnet 4.6

Sonnet 4.6 is the workhorse default in the Claude family and the right pick for most agents. Promote to Opus 4.7 only when Sonnet visibly fails on hard reasoning, long context, or first-attempt code edits, usually as the orchestrator that delegates downward to Sonnet- or Haiku-tier sub-agents.

Claude Opus 4.7 vs Claude Opus 4.6

Same context window (1M tokens), same vendor pricing, and the same adaptive-thinking architecture. Opus 4.7 is the newer generation with vendor-reported gains across SWE-bench Verified, Terminal-Bench 2.0, ARC AGI 2, and OSWorld. Pick 4.7 for new agents; pin 4.6 only when an existing agent has been validated against that version and you need behaviour stability.

Claude Opus 4.7 vs Kimi K2.6

Moonshot's Kimi K2.6 leads several agentic benchmarks at the open-source frontier (vendor-reported SWE-bench Pro 58.6 versus Opus 4.6's 53.4). Opus 4.7 retains the lead on tool-routing reliability for production English-language agents and on safety profile, which is why most enterprise teams still keep it as the high-stakes tier.

Claude Opus 4.7 vs DeepSeek V4 Pro

DeepSeek V4 Pro trails Opus on most reasoning benchmarks but matches it on coding (vendor-reported SWE-bench Verified within ~0.2 points). The split is straightforward: pick DeepSeek when raw cost dominates, pick Opus 4.7 when reliability, safety profile, or tool-routing accuracy matter more than per-call price.

Claude Opus 4.7 vs GPT-5.2 / Gemini 3 Pro

Anthropic's vendor materials position Opus 4.7 ahead of GPT-5.2 on most agentic-coding tasks (Terminal-Bench, τ2-bench Retail) and within a few points of Gemini 3 Pro on abstract reasoning (ARC AGI 2, GPQA Diamond). Independent leaderboards corroborate the rough ordering but shift weekly.

Bottom line: should you use Claude Opus 4.7?

Opus 4.7 is the escalation tier. Default to Sonnet 4.6; promote to Opus only on the specific steps where Sonnet visibly fails.

Frequently asked questions

What is Claude Opus 4.7's context window?

1 million tokens, with up to 64K tokens of output per response. The full window bills at standard rates. A 900K-token request is the same per-token rate as a 9K-token request.

Can Claude Opus 4.7 handle images?

Yes. Opus 4.7 is multimodal. It accepts image inputs alongside text and code, so screenshot-driven and document-vision agents work natively.

When should I pick Opus 4.7 over Sonnet 4.6?

When (a) the agent is the planner / orchestrator and decisions cascade, (b) the run is long enough that Sonnet starts dropping instructions, or (c) the output must apply cleanly on the first attempt (code edits, structured payloads).

Should I migrate from Opus 4.6 to Opus 4.7?

Yes. Anthropic explicitly recommends 4.7 over 4.6. Same multiplier, stronger behaviour. Migrate pinned production agents only after running them through your regression suite.

Does Opus 4.7 support prompt caching?

Yes. Cached input bills at $0.50 per 1M tokens. A 10× discount on the cached portion. Worth using whenever your system prompt or tool schema is stable across calls.

Alternatives

Claude Sonnet 4.6

Cheaper default for most agent loops

Kimi K2.6

Stronger long-context recall at lower cost

DeepSeek V4 Pro

Cost-optimised reasoning if Claude is overkill

Using Claude Opus 4.7 on VM0

Two ways to access Claude Opus 4.7 on VM0

VM0 supports Claude Opus 4.7 as a Built-in model billed in VM0 credits, and through bring-your-own with a Anthropic API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.

VM0's recommendation

VM0 positions Claude Opus 4.7 as a core agent model, recommended alongside Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6 for the steps that drive the actual outcome of an agent run. These are the models we'd pick for the orchestrator role, for code-touching agents, and for any step where a wrong answer is expensive.

Credits and the ×1.7 multiplier

Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. Claude Opus 4.7 bills at ×1.7 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.

Claude Opus 4.7 bills at ×1.7, which means a step here costs 1.7× the credits of an equivalent step on Sonnet 4.6 (the ×1 baseline). It's a premium tier on VM0, so the cost-effective pattern is to default to a cheaper model and route only the steps that genuinely need the extra reasoning depth to Claude Opus 4.7.

Available on VM0 since April 17, 2026.