Claude Haiku 4.5 on VM0. Fast, cheap routing
The fast, cheap Claude. Good enough for routing, short summarisation, and simple tool calls at a fraction of Sonnet's cost.
200K tokens · Text / Vision / Code · Prompt cache
Claude Haiku 4.5 is the small, fast Claude — replies run at roughly 97 output tokens per second (four to five times faster than Sonnet 4.5) and you can run it across a lot of traffic without watching the bill spiral.
It's still a real Claude: vendor-reported SWE-bench Verified hits 73.3%, only a few points behind Sonnet 4.5 at a third of the cost, and Augment's agentic-coding evaluation reportedly puts it at 90% of Sonnet 4.5's performance.
Vendor list price is $1 / $5 per 1M tokens with cached input at $0.10 / 1M. Reach for it as the high-throughput worker or sub-agent under a Sonnet- or Opus-led system; skip it when the loop is long and multi-step.
What is Claude Haiku 4.5?
Late 2025 (Claude 4.5 generation) · Smallest tier of the Claude 4 family. Anthropic's high-throughput, low-latency option.
Claude Haiku 4.5 is the small, fast member of the Claude 4 family. It is built for latency-sensitive and high-volume work where Sonnet would be overkill. Single-tool calls, fast classifications, short summarisations, and simple Slack replies.
Haiku 4.5 is remarkably capable for its tier. Anthropic's vendor-reported SWE-bench Verified score is 73.3%. Only ~4 points behind Sonnet 4.5 at one-third the cost. In Augment's agentic coding evaluation it reportedly hits 90% of Sonnet 4.5's performance, which puts it in genuine sub-agent territory.
Despite being the small Claude, Haiku 4.5 is multimodal (vision-capable), supports prompt caching, and runs at ~97 tokens/sec. Comfortably the fastest model in our Built-in lineup.
What's notable about Claude Haiku 4.5
Headline architecture and capability features.
Haiku 4.5 ships with a 200K-token context window, multimodal input across text, vision, and code, and prompt caching that bills cached input at one-tenth the input rate. Output runs at roughly 97 tokens per second, four to five times faster than Sonnet 4.5.
Specs at a glance
Claude Haiku 4.5 benchmarks
Vendor-reported numbers from Anthropic's Haiku 4.5 launch materials. Note that OpenAI flagged training-data contamination on SWE-bench Verified across all frontier models. Treat absolute numbers cautiously, but the relative ordering is robust.
Claude Haiku 4.5 pricing
Provider list price, per 1M tokens.
How Claude Haiku 4.5 behaves in practice
Observed behaviour from production agent runs.
Speed
Fastest model in the Built-in lineup at ~97 tokens/sec. Reply latency is short enough for interactive Slack agents.
Routing accuracy
Good enough for single-tool flows; multi-tool routing is meaningfully behind Sonnet 4.6 on edge cases. Keep tool schemas tight.
Reasoning
Holds up on short tasks; loses track on long multi-step loops. Use it as a worker, not a planner.
Cost
Lowest cost in the Claude family on VM0. Prompt caching makes it the cheapest practical Anthropic option for repeated prompts.
Best agent tasks for Claude Haiku 4.5
The Slack triage agent that feels instant
Reads every incoming message, classifies it ("bug report", "sales lead", "meeting request"), routes it to the right channel, and posts an acknowledgment in under two seconds. At Sonnet's speed the same flow would feel laggy; at Haiku's ~97 tokens per second it feels like the bot is actually paying attention in real time.
The sub-agent under a Sonnet or Opus planner
Sonnet (or Opus) picks the strategy and breaks the request into ten narrow steps; Haiku executes each one. "Pull this CRM field, summarise this email, format this list" — none of those steps need flagship reasoning, and routing them to Haiku instead of running the whole loop on Sonnet drops the per-conversation cost dramatically without changing the output quality.
The bulk classifier that runs on every record
Tag a million tickets, extract the structured fields out of last quarter's email backlog, route a stream of inbound forms. Haiku's low per-token cost plus prompt caching on the (stable) system prompt means the unit cost per record is essentially noise on the budget, which is what makes "classify everything" workflows actually viable.
The vision micro-task that needs to be fast
OCR a screenshot, identify what type of chart it is, pull a number out of a receipt image. Haiku 4.5 is multimodal and very fast, which means a UI agent that takes a screenshot every few seconds and asks "what just changed?" stays responsive instead of stuttering.
When to skip Claude Haiku 4.5
Skip Haiku 4.5 on long multi-step agent loops where it drops instructions after several turns, and on hard reasoning or code edits where Sonnet 4.6 or Opus 4.7 is the right call.
Claude Haiku 4.5 vs other models
Claude Haiku 4.5 vs Claude Sonnet 4.6
Sonnet (×1) is the default for full agents. Haiku (×0.3) is the right pick when speed and cost matter more than long-loop coherence. Typically as a worker under a Sonnet/Opus planner. Vendor benchmarks put Haiku within ~4 points of Sonnet 4.5 on SWE-bench Verified.
Claude Haiku 4.5 vs DeepSeek V4 Flash
DeepSeek V4 Flash (×0.02) is much cheaper but with weaker tool-use and less reliable on multi-step loops. Use Flash for one-shot bulk work; use Haiku for short interactive Slack-style replies.
Claude Haiku 4.5 vs MiniMax M2.7
MiniMax M2.7 (×0.1) is cheaper and stronger on multilingual tasks. Haiku 4.5 leads on English-language tool-routing reliability and is multimodal.
Bottom line: should you use Claude Haiku 4.5?
The Claude you put behind heavy load. Triage, classification, sub-agent under a Sonnet/Opus orchestrator — yes. Planner role — no, that's Sonnet's job.
Frequently asked questions
Is Haiku 4.5 multimodal?
Yes. Haiku 4.5 accepts image inputs alongside text and code, so vision-driven agents work natively.
How fast is Haiku 4.5?
Anthropic reports ~97 tokens per second. 4 to 5 times faster than Sonnet 4.5. The fastest model in our Built-in lineup.
When should I pick Haiku over Sonnet?
Pick Haiku when (a) the agent loop is short. Usually under 5 turns, (b) latency matters more than reasoning depth, or (c) you need a cheap sub-agent under a Sonnet/Opus orchestrator.
Can Haiku run multi-tool agents?
It can, but accuracy drops on edge cases compared to Sonnet 4.6. Keep the tool surface narrow and the loop short, or fall back to Sonnet.
What's Haiku 4.5's SWE-bench score?
Anthropic reports 73.3% on SWE-bench Verified. Within ~4 points of Sonnet 4.5 at one-third the cost. On the harder SWE-bench Pro it scores 39.5% (Scale AI).
Alternatives
Using Claude Haiku 4.5 on VM0
Two ways to access Claude Haiku 4.5 on VM0
VM0 supports Claude Haiku 4.5 as a Built-in model billed in VM0 credits, and through bring-your-own with a Anthropic API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.
VM0's recommendation
VM0 positions Claude Haiku 4.5 as a cost-saving option rather than a core agent model. Use it to optimise unit cost on non-core work, such as bulk classification, pre-filters, latency-critical short replies, or pinned legacy agents, while keeping Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6 on the steps that decide the run.
Credits and the ×0.3 multiplier
Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. Claude Haiku 4.5 bills at ×0.3 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.
Claude Haiku 4.5 bills at ×0.3, which means a step here costs only 0.3× the credits of an equivalent step on Sonnet 4.6 (the ×1 baseline). That puts it well below the credit baseline and makes it the natural pick for high-volume background work where cost-per-step matters more than peak reasoning quality.
Available on VM0 since Available since launch.