DeepSeek V4 Flash on VM0. The cheapest model

The cheapest model in the lineup. 50× less than Sonnet 4.6. Surprisingly capable for its tier. Vendor-reported SWE-bench Verified within 1.6 points of V4 Pro.

1M tokens · Text / Code · Prompt cache

Use DeepSeek V4 Flash on VM0

DeepSeek V4 Flash is the cost-leader of the V4 generation, engineered for the absolute lowest unit cost in the lineup. It's good at single-shot work where the prompt does most of the lifting: tagging a million tickets, extracting structured fields from email backlogs, scoring reviews, pre-filtering records before the hard cases go to a stronger model. Vendor-reported SWE-bench Verified is 79.0% (within 1.6 points of V4 Pro), but Terminal-Bench 2.0 lags by 11 points — that's where Flash trails: long multi-step tool chains.

Vendor list price is $0.14 / $0.28 per 1M tokens with cache reads at $0.028 / 1M and free cache writes. Don't put Flash in a planner role; for that, V4 Pro or Sonnet 4.6. Everywhere else cost dominates, nothing competes.

What is DeepSeek V4 Flash?

April 24, 2026 · Cost-leader of DeepSeek's V4 family. Paired with V4 Pro for reasoning.

DeepSeek V4 Flash is the cost-leader in DeepSeek's V4 generation, released April 24, 2026 alongside V4 Pro. Where V4 Pro is positioned for reasoning, Flash is positioned for the absolute lowest unit cost. A model you can run at very high volumes without thinking about budget.

Flash is a 284B-parameter MoE with 13B active per token (vs Pro's 1.6T / 49B). Both share the V4 family's identical feature set: 1M-token context, 384K maximum output, three reasoning effort modes, JSON output, and tool calls.

On VM0 it carries a ×0.02 credit multiplier. The lowest in the entire Built-in catalogue. That makes it the default for bulk classification, tagging, extraction, and pre-filter workloads where the prompt does most of the work and the model just needs to follow instructions reliably. It shares the V4 family's free cache-write economics: only cache reads bill.

What's notable about DeepSeek V4 Flash

Headline architecture and capability features.

V4 Flash is a Mixture-of-Experts model with 284B total parameters and 13B active per token, fronted by a 1M-token context window with 384K of maximum output. It exposes three reasoning effort modes (standard, think, and think-max), bills only cache reads (cache writes are free), and ships under the MIT License with open weights.

Specs at a glance

FamilyDeepSeek V4 series

Parameters284B total / 13B active (MoE)

ModalitiesText, code

LanguagesMultilingual

Context window1M tokens

Max output384K tokens

LicenseMIT (open weights)

Available on VM0April 24, 2026

DeepSeek V4 Flash benchmarks

Vendor-reported scores from DeepSeek's V4 release. Flash matches Pro on simpler benchmarks but loses ground on multi-step tool use (Terminal-Bench) and factual recall (SimpleQA). Exactly what you'd expect from the smaller MoE.

SWE-bench Verifiedvendor-reported; within 1.6pts of V4 Pro

79.0%

Terminal-Bench 2.0vendor-reported; trails V4 Pro by 11pts

56.9%

SimpleQA-Verifiedvendor-reported; trails V4 Pro

34.1%

DeepSeek V4 Flash pricing

Provider list price, per 1M tokens.

Input$0.14

Output$0.28

Cache read$0.03

Cache writeNot billed

How DeepSeek V4 Flash behaves in practice

Observed behaviour from production agent runs.

Cost

By far the lowest cost in the Built-in lineup. The right pick whenever unit cost dominates the decision.

Single-shot accuracy

Good when the prompt is explicit and the task fits in one or two turns. Drops noticeably when asked to plan, branch, and remember across many steps.

Multi-step tool use

Vendor-reported Terminal-Bench 2.0 is 56.9% (vs V4 Pro's 67.9%). Meaningfully behind on complex multi-step tool flows. Don't put V4 Flash in a planner role.

Context window

1M tokens. Same as V4 Pro and far larger than Anthropic Haiku (200K).

Best agent tasks for DeepSeek V4 Flash

The classifier that runs on every record without flinching

Tag a million tickets by category, route inbound forms to the right team, score every review on the dimensions that matter. Per-record cost on Flash is fractions of a cent, which is what makes "classify everything as it arrives" workflows actually sustainable instead of getting throttled to a sample.

The pre-filter in front of a stronger model

Run V4 Flash on every record first, then route the top 5% (or the cases Flash isn't confident about) up to V4 Pro or Sonnet 4.6. Two-stage pipelines beat single-model pipelines on total cost almost every time — Flash handles the easy 95%, the stronger model only sees the hard 5%, and your bill scales with reasoning need rather than total volume.

The bulk-extraction job that pulls structured data from anywhere

Email backlogs, PDFs, meeting transcripts, scanned invoices — anywhere there's a fixed system prompt asking for the same JSON shape. Flash bills cache reads but not cache writes, so the long fixed prefix that defines the output schema is paid for once and amortises across the entire batch, driving the marginal per-document cost close to zero.

The long-document one-shot Q&A

Drop a whole book, a 200-page contract, or a codebase into the 1M-token context window and ask a single targeted question. Flash answers in one shot at fractions of a cent per call — more than fast enough for answering "does this document mention X?" across a long document at scale, which is one of the workflows agentic loops genuinely don't help with.

When to skip DeepSeek V4 Flash

Skip V4 Flash on multi-step agent loops where it drifts on long tool chains, and on hard reasoning, code edits, or planner roles where V4 Pro or Sonnet 4.6 is the right call.

DeepSeek V4 Flash vs other models

DeepSeek V4 Flash vs DeepSeek V4 Pro

Same vendor; V4 Pro (×0.3) does the reasoning, V4 Flash (×0.02) does the volume. The classic split: Flash as the pre-filter, Pro as the escalator. Vendor-reported SWE-bench Verified is within 1.6 points (79.0 vs 80.6); Terminal-Bench 2.0 favours Pro by 11 points (67.9 vs 56.9).

DeepSeek V4 Flash vs Claude Haiku 4.5

Haiku 4.5 (×0.3) is more reliable on multi-tool routing and faster on interactive flows. V4 Flash (×0.02) wins on raw cost and context size. Pick Flash for batch jobs; pick Haiku for interactive Slack-style replies.

DeepSeek V4 Flash vs MiniMax M2.7

M2.7 (×0.1) is stronger on multilingual reasoning and has a 50-minute timeout for long thinking. V4 Flash (×0.02) is faster and far cheaper for single-shot work.

Bottom line: should you use DeepSeek V4 Flash?

The cheapest model in the catalogue. Right for bulk tagging, extraction, and pre-filtering; wrong for planner roles or long agent loops.

Frequently asked questions

When was DeepSeek V4 Flash released?

DeepSeek released V4 Flash and V4 Pro together on April 24, 2026 under the MIT License with open weights.

Should I run my entire agent on V4 Flash?

Probably not. Flash is great at one-shot tasks but drifts on long multi-step loops (vendor-reported Terminal-Bench 2.0 is 11 points behind V4 Pro). The standard pattern is to use it as a pre-filter and escalate the hard cases to V4 Pro or Sonnet 4.6.

Are cache writes really free?

Yes. DeepSeek doesn't bill the cache-write portion. Only cache reads bill, at $0.028 per 1M tokens.

Is V4 Flash open-source?

Yes. Weights are published under the MIT License (284B total / 13B active MoE). The hosted DeepSeek API is the production path for VM0.

What's V4 Flash's context window?

1 million tokens. Identical to V4 Pro. Useful for long-document one-shot Q&A even at the cheapest tier.

Alternatives

DeepSeek V4 Pro

Same vendor, much stronger reasoning

Claude Haiku 4.5

Anthropic alternative for routing

MiniMax M2.7

Cheap multilingual alternative

Using DeepSeek V4 Flash on VM0

Two ways to access DeepSeek V4 Flash on VM0

VM0 supports DeepSeek V4 Flash as a Built-in model billed in VM0 credits, and through bring-your-own with a DeepSeek API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.

VM0's recommendation

VM0 positions DeepSeek V4 Flash as a cost-saving option rather than a core agent model. Use it to optimise unit cost on non-core work, such as bulk classification, pre-filters, latency-critical short replies, or pinned legacy agents, while keeping Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6 on the steps that decide the run.

Credits and the ×0.02 multiplier

Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. DeepSeek V4 Flash bills at ×0.02 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.

DeepSeek V4 Flash bills at ×0.02, which means a step here costs only 0.02× the credits of an equivalent step on Sonnet 4.6 (the ×1 baseline). That puts it at the cheapest tier of the Built-in catalogue and makes it the obvious choice when unit cost dominates the decision and the workload is largely single-shot.

Available on VM0 since April 24, 2026.