Back to articles

Model Economics

Cost Curves for Frontier Reasoning Models

A buyer's framework for converting token prices into workflow cost, including reasoning tokens, cache hits, batch discounts, and tool calls.

17 May 2026/13 min read

Research lens

$/workflow

5 pricing levers

Token price is only the first derivative

A reasoning model can look expensive per token and still be cheaper per completed workflow if it needs fewer retries, fewer tool calls, less human review, and less prompt scaffolding. The metric that matters is cost per accepted output.

Reasoning tokens move cost into the invisible part of the trace

For agentic work, internal reasoning and tool-use overhead can dominate visible response length. Buyers need logs that separate input, cached input, reasoning/output, tool calls, retries, and human-review fallout.

The practical curve has four regimes

Cheap models win on routing, extraction, and high-volume classification. Mid-tier models win on structured knowledge work. Frontier reasoning models win when one correct answer avoids downstream review. Batch and cache strategies bend every curve.

Visual

Output-token price does not equal workflow cost

Published API price signals checked 16 May 2026; lower output price is cheaper before quality/retry adjustment.

Gemini 3.1 Pro12
Claude Sonnet 4.615
GPT-5.415
Claude Opus 4.725
GPT-5.530

Grok 4.3

$1.25

$2.50

xAI published pricing, 1M context

Claude Sonnet 4.6

$3.00

$15.00

Strong mid-tier workhorse

GPT-5.4

$2.50

$15.00

OpenAI affordable frontier track

Claude Opus 4.7

$5.00

$25.00

Anthropic top tier

GPT-5.5

$5.00

$30.00

OpenAI flagship tier

Process

How to read the analysis

1Token price
2Prompt shape
3Reasoning trace
4Retries/tools
5Accepted output

Buyer metric

$/accepted output

A model is cheap only when the final reviewed workflow is cheap.

Hidden lever

Cache hits

Prompt caching can bend the curve for repeated enterprise context.

Failure cost

Retries

Retry rate and human review can overwhelm nominal per-token savings.

Batch lever

Asynchronous discount

OpenAI, Anthropic, and Google all expose batch-style discounts for asynchronous workloads, which makes back-office workflows economically different from live chat.

Cache lever

Prompt shape matters

Prompt caching works best when stable policy, schema, and instruction blocks are separated from user-specific context.

Buyer control

Accepted output cost

Reasoning, retries, tool calls, and human review should be logged separately so teams can see where the cost curve actually bends.

Research map

Cost curve ledger map

A token price table is only a starting point. The operator map converts provider pricing into workflow economics that finance and engineering can both inspect.

1

Stable context

Cached input

Which repeated instructions, policies, and schemas can be cached?

Cache ledger

2

Live context

Fresh input

How much user, retrieval, and tool context changes every call?

Prompt shape

3

Reasoning/output

Generated tokens

Where does visible and invisible reasoning cost accumulate?

Token ledger

4

Recovery

Retries/tools

Which failures create extra calls, tool use, or human review?

Failure ledger

5

Decision

Accepted output

What is the cost per workflow that passes review?

Budget envelope

Provider economics cockpit

Current pricing signals to model

Use this as a modeling input, not procurement advice. Official price pages change, and workflow cost still depends on retries, caching, batch eligibility, and review.

OpenAI

GPT-5.4

$2.50 / 1M input; cached input priced lower

$15.00 / 1M output

Batch processing can materially change offline workflow economics; model retries and review still dominate accepted-output cost.

Anthropic

Claude Sonnet 4.6

$3.00 / 1M input; prompt-cache reads discounted

$15.00 / 1M output

Prompt caching and batch processing make repeated policy/document context cheaper than naive per-call estimates.

Google

Gemini 3.1 Pro Preview

$2.00 / 1M input below 200k tokens

$12.00 / 1M output below 200k tokens

Batch mode offers lower unit pricing for jobs that can wait; long-context tiers change the slope above the threshold.

xAI

Grok 4.3

$1.25 / 1M input; cached input priced separately

$2.50 / 1M output

Low nominal output price is useful for routing and throughput, but buyers still need workflow-quality and retry measurements.

Batch eligibility

20-50%+ swing

Offline jobs like invoice checks, CRM cleanup, and document extraction should be modeled separately from live chat because providers price batch lanes differently.

Cache boundary

Stable context wins

Repeated policy, schema, and instruction blocks should be separated from user-specific content so cache hit rate becomes measurable.

Quality adjustment

$/accepted output

A cheap model becomes expensive when it raises retries, tool calls, escalation rate, or human review. The scorecard should price the accepted workflow, not the generated answer.

Trace accounting

6 ledgers

Log input, cached input, output/reasoning, tools, retries, and review minutes separately so cost curves can be debugged after launch.

Official pricing refresh ledger

Source-linked price snapshot

Pricing pages change. Treat this as a source-linked snapshot for modeling, then verify official provider pages before procurement.

Generated 2026-05-16T04:30:00+05:30 / version 0.2.0

OpenAI

GPT-5.5

Input: $5

Cached: $0.5

Output: $30

source-linked-fallback

Batch API can cut input and output token charges by 50% for asynchronous jobs.

Anthropic

Claude Opus 4.7

Input: $5

Cached: $0.5

Output: $25

parsed

Prompt-cache reads are a different column from cache writes; model audits should log both separately.

Google

Gemini 3.1 Pro Preview

Input: $2

Cached: $0.2

Output: $12

source-linked-fallback

Google includes thinking tokens in output pricing; long-context thresholds and batch availability change the curve.

xAI

grok-4.3

Input: $1.25

Cached: $0.2

Output: $2.5

parsed

xAI prices tool use separately; token costs are only one component of agentic workload cost.

Parser diagnostics

What needs source review before procurement

low-selector-drift

OpenAI GPT-5.5

Refresh used source-linked fallback fields. Review the official page because the provider page, model name, or DOM shape may have changed.

Review immediately; the expected model label was not found in the retrieved page text.

high

Anthropic Claude Opus 4.7

Automated extraction found required price fields. Still verify the official page before procurement.

Review when provider changes pricing table columns, token units, model name, batch policy, or cache policy.

low-selector-drift

Google Gemini 3.1 Pro Preview

Refresh used source-linked fallback fields. Review the official page because the provider page, model name, or DOM shape may have changed.

Review when provider changes pricing table columns, token units, model name, batch policy, or cache policy.

high

xAI grok-4.3

Automated extraction found required price fields. Still verify the official page before procurement.

Review when provider changes pricing table columns, token units, model name, batch policy, or cache policy.

Recommendation

Use this as a decision tool, not a belief system.

The right model, benchmark, or interpretability method depends on the workflow, risk tolerance, budget, latency target, data sensitivity, and the cost of a wrong answer.