# Inference Trace Kit

Import contract for measured latency, throughput, cache, batch, cost, and acceptance traces across model/provider routes.

## Intake Steps

1. Capture exact model/provider identifiers, workload label, token counts, cache hit rate, batch eligibility, concurrency, latency, throughput, cost, and review effort.
2. Normalize each run into the CSV template.
3. Group rows by route and workload before comparing prices.
4. Report cost per accepted output at target p95 latency.
5. Keep raw provider logs, response ids, and reviewer notes outside the public file until they are redacted.

## Metrics

- **Cost per accepted output:** A cheap model is not cheap if it creates rejected outputs or review fallout.
- **p95 latency at concurrency:** Interactive agents fail on slow tails, not average latency.
- **Cache-adjusted input burden:** Repeated policy and schema prefixes should visibly bend the cost curve.
- **Batch displacement share:** Offline work should not be priced like live chat if the product can wait.
- **Throughput headroom:** Dedicated or self-hosted routes only make sense when utilization can stay high.

## Commands

```bash
pnpm inference:generate
pnpm build
SITE_URL=http://127.0.0.1:3000 pnpm verify:site
```