Executive readout
Average score, leader, sample size, split, and reviewer interpretation.
Downloadable benchmark brief
A consulting-ready brief generated from benchmark data: model rows, task traces, task mix, leaderboard controls, and the next evidence to collect before production.
Designed print template
The report is intentionally compact. It gives Saujas and Sanjay a shareable artifact for discovery calls, while keeping the public site tied to auditable benchmark data. The saved HTML print template is the source used to render the PDF.
Brief contents
Each generated markdown and PDF file mirrors these sections so the public report page and downloadable artifact stay in sync.
Executive readout
Average score, leader, sample size, split, and reviewer interpretation.
Model comparison
Provider, score, pass rate, recovery, cost index, and latency columns.
Trace packet table
Representative tasks with domain, split, difficulty, top run, and score.
Controls and next step
Freshness, leakage, repeat-run policy, retirement rule, and required provenance.