Model Economics
Cost Curves for Frontier Reasoning Models
A buyer's framework for converting token prices into workflow cost, including reasoning tokens, cache hits, batch discounts, and tool calls.
Model Economics
A buyer's framework for converting token prices into workflow cost, including reasoning tokens, cache hits, batch discounts, and tool calls.
Research lens
$/workflow
5 pricing levers
A reasoning model can look expensive per token and still be cheaper per completed workflow if it needs fewer retries, fewer tool calls, less human review, and less prompt scaffolding. The metric that matters is cost per accepted output.
For agentic work, internal reasoning and tool-use overhead can dominate visible response length. Buyers need logs that separate input, cached input, reasoning/output, tool calls, retries, and human-review fallout.
Cheap models win on routing, extraction, and high-volume classification. Mid-tier models win on structured knowledge work. Frontier reasoning models win when one correct answer avoids downstream review. Batch and cache strategies bend every curve.
Visual
Published API price signals checked 16 May 2026; lower output price is cheaper before quality/retry adjustment.
Grok 4.3
$1.25
$2.50
xAI published pricing, 1M context
Claude Sonnet 4.6
$3.00
$15.00
Strong mid-tier workhorse
GPT-5.4
$2.50
$15.00
OpenAI affordable frontier track
Claude Opus 4.7
$5.00
$25.00
Anthropic top tier
GPT-5.5
$5.00
$30.00
OpenAI flagship tier
Process
Buyer metric
$/accepted output
A model is cheap only when the final reviewed workflow is cheap.
Hidden lever
Cache hits
Prompt caching can bend the curve for repeated enterprise context.
Failure cost
Retries
Retry rate and human review can overwhelm nominal per-token savings.
Batch lever
Asynchronous discount
OpenAI, Anthropic, and Google all expose batch-style discounts for asynchronous workloads, which makes back-office workflows economically different from live chat.
Cache lever
Prompt shape matters
Prompt caching works best when stable policy, schema, and instruction blocks are separated from user-specific context.
Buyer control
Accepted output cost
Reasoning, retries, tool calls, and human review should be logged separately so teams can see where the cost curve actually bends.
Research map
A token price table is only a starting point. The operator map converts provider pricing into workflow economics that finance and engineering can both inspect.
Stable context
Which repeated instructions, policies, and schemas can be cached?
Cache ledger
Live context
How much user, retrieval, and tool context changes every call?
Prompt shape
Reasoning/output
Where does visible and invisible reasoning cost accumulate?
Token ledger
Recovery
Which failures create extra calls, tool use, or human review?
Failure ledger
Decision
What is the cost per workflow that passes review?
Budget envelope
Provider economics cockpit
Use this as a modeling input, not procurement advice. Official price pages change, and workflow cost still depends on retries, caching, batch eligibility, and review.
OpenAI
$2.50 / 1M input; cached input priced lower
$15.00 / 1M output
Batch processing can materially change offline workflow economics; model retries and review still dominate accepted-output cost.
Anthropic
$3.00 / 1M input; prompt-cache reads discounted
$15.00 / 1M output
Prompt caching and batch processing make repeated policy/document context cheaper than naive per-call estimates.
$2.00 / 1M input below 200k tokens
$12.00 / 1M output below 200k tokens
Batch mode offers lower unit pricing for jobs that can wait; long-context tiers change the slope above the threshold.
xAI
$1.25 / 1M input; cached input priced separately
$2.50 / 1M output
Low nominal output price is useful for routing and throughput, but buyers still need workflow-quality and retry measurements.
Batch eligibility
20-50%+ swing
Offline jobs like invoice checks, CRM cleanup, and document extraction should be modeled separately from live chat because providers price batch lanes differently.
Cache boundary
Stable context wins
Repeated policy, schema, and instruction blocks should be separated from user-specific content so cache hit rate becomes measurable.
Quality adjustment
$/accepted output
A cheap model becomes expensive when it raises retries, tool calls, escalation rate, or human review. The scorecard should price the accepted workflow, not the generated answer.
Trace accounting
6 ledgers
Log input, cached input, output/reasoning, tools, retries, and review minutes separately so cost curves can be debugged after launch.
Official pricing refresh ledger
Pricing pages change. Treat this as a source-linked snapshot for modeling, then verify official provider pages before procurement.
Generated 2026-05-16T04:30:00+05:30 / version 0.2.0
Input: $5
Cached: $0.5
Output: $30
source-linked-fallback
Batch API can cut input and output token charges by 50% for asynchronous jobs.
Input: $5
Cached: $0.5
Output: $25
parsed
Prompt-cache reads are a different column from cache writes; model audits should log both separately.
Input: $2
Cached: $0.2
Output: $12
source-linked-fallback
Google includes thinking tokens in output pricing; long-context thresholds and batch availability change the curve.
Input: $1.25
Cached: $0.2
Output: $2.5
parsed
xAI prices tool use separately; token costs are only one component of agentic workload cost.
Parser diagnostics
low-selector-drift
Refresh used source-linked fallback fields. Review the official page because the provider page, model name, or DOM shape may have changed.
Review immediately; the expected model label was not found in the retrieved page text.
high
Automated extraction found required price fields. Still verify the official page before procurement.
Review when provider changes pricing table columns, token units, model name, batch policy, or cache policy.
low-selector-drift
Refresh used source-linked fallback fields. Review the official page because the provider page, model name, or DOM shape may have changed.
Review when provider changes pricing table columns, token units, model name, batch policy, or cache policy.
high
Automated extraction found required price fields. Still verify the official page before procurement.
Review when provider changes pricing table columns, token units, model name, batch policy, or cache policy.
Recommendation
The right model, benchmark, or interpretability method depends on the workflow, risk tolerance, budget, latency target, data sensitivity, and the cost of a wrong answer.