Website Buildout Status
A live tracker for what is shipped and what still needs real evidence.
This page turns the internal buildout plan into a public operating surface: shipped pages, generated research systems, pending real benchmark inputs, and the next research upgrades.
Coverage map
Where the shipped work is concentrated
The current website is strongest in Studio, benchmark infrastructure, research articles, consulting collateral, and company pages. The remaining work is mostly replacing synthetic data with real run artifacts.
Benchmarks
Studio
Company
Consulting
Content
Platform
Finished foundation
A visitor can now move through the full company story.
Homepage, Studio, models, leaderboards, articles, consulting, case studies, careers, terms, downloadable packets, and Command-K search are all part of the current build.
Studio
Homepage keeps the Independent analysis of AI positioning and now points strongly into Studio and Articles.
Studio
Studio route exists with product/service surfaces for agent benchmarks, coding agents, browser agents, support agents, model recommendations, and cost curves.
Studio
Studio products now have detail pages at /studio/[slug] with buyer questions, workflow steps, deliverables, demo state, and links into benchmark evidence.
Studio
Studio now has a generated product catalog from pnpm studio:generate, with markdown/JSON catalog files and buyer-ready packets for all eight Studio surfaces under edxperimental-labs/public/reports/studio/.
Studio
Studio now has a generated demo-readiness board from pnpm studio:generate, with demo-ready counts, owners, tour order, readiness gates, missing live-demo evidence, downloadable demo-readiness.md, and Command-K indexed /studio#studio-demo-readiness surface.
Studio
Studio now has captured visual preview assets from pnpm studio:screenshots, with one live page screenshot per Studio product and a manifest under edxperimental-labs/public/reports/studio/previews/.
Benchmarks
Agent Benchmark Explorer now includes a live trace explorer driven by generated benchmark data, with suite/task selectors, expected evidence, model trace ranking, top answer excerpt, failure reason, and tool-call chips.
Benchmarks
Browser Agent Evaluation Kit now includes a live browser-state evaluation demo driven by browser-operation traces, with scenario selection, strict state proof toggle, state/recovery/screenshot/handoff indicators, run comparison, and deployment readout.
Benchmarks
Coding Agent Arena now includes a live coding-agent console driven by maintenance traces, with task packet selection, browser-proof control, merge-readiness/regression/tool-discipline indicators, run ranking, acceptance evidence, and arena verdict.
Benchmarks
Customer Support Agent Scorecard now includes a live support scorecard console driven by support-policy traces, with scenario selection, escalation proof control, policy/tone/handoff indicators, model ranking, evidence chips, and rollout readout.
Studio
Indian Workflow Benchmark is now a Studio product page with a live benchmark console driven by the Indian Enterprise Workflow Suite, including workflow packet selection, holdout-pressure control, evidence/escalation/localization/cost indicators, task-mix bars, model comparison, and benchmark readiness guidance.
Studio
Consulting Diagnostic is now a Studio product page with a live intake console for choosing consulting tracks, adjusting deployment pressure/evidence gap/data sensitivity, routing work to Sanjay or Saujas, and generating first-sprint guidance.
Studio
Cost Curve Workbench now includes a live interactive Studio demo with sliders for input/output tokens, cache hit rate, batchable share, and human review cost, plus a modeled monthly cost curve and route comparison table.
Studio
Model Recommendation Console now includes a live interactive Studio demo with workload selection, quality/latency/cost/privacy/agentic controls, fit rankings, primary/fallback route cards, and a recommendation table.
Content
Models, Agents & Hardware now includes a generated inference economics playbook from pnpm inference:generate, covering managed APIs, hosted open-weight inference, dedicated endpoints, self-hosted GPUs, batch, cache, throughput, queueing, and latency variance.
Benchmarks
Models, Agents & Hardware now includes a generated inference trace kit from pnpm inference:generate, with measured-trace CSV template, JSON schema, runbook, metric definitions, and /models#inference-trace-kit surface for latency, throughput, cache, batch, acceptance, and cost data.
Content
Models, Agents & Hardware now includes a generated hardware procurement matrix from pnpm inference:generate, with decision gates, readiness scores, route choices, and /models#hardware-procurement-matrix surface for managed APIs, hosted open-weight inference, dedicated endpoints, cloud GPUs, and owned hardware.
Content
Models, Agents & Hardware route exists as one combined technical map.
Next high-impact work
The remaining blockers are real inputs, not route plumbing.
Feed real provider, notebook, browser-agent, and coding-agent outputs through /api/benchmark-run-intake or the CSV template, then replace prototype benchmark scores and task trace packets with reviewer-signed run rows.
Owner: Sanjay
Dependency: Real benchmark exports
Status: Waiting on input
Connect the first-party /api/newsletter capture route to the chosen mailing-list provider or CRM once Sanjay picks the provider.
Owner: Saujas with Sanjay
Dependency: Provider/process decision
Status: Waiting on input
Connect /api/consulting-intake and the generated consulting collateral to signed proposal templates plus the final CRM once the sales process is finalized.
Owner: Saujas with Sanjay
Dependency: Provider/process decision
Status: Waiting on input
Connect /api/careers-application and generated careers collateral to the final hiring inbox, CRM, or applicant tracker once the first candidate process is ready.
Owner: Saujas with Sanjay
Dependency: Provider/process decision
Status: Waiting on input
Add social links once Sanjay shares them, replacing the current contact placeholders.
Owner: Sanjay
Dependency: Official social URLs
Status: Waiting on input
Replace captured Studio preview screenshots with product walkthrough videos and client-approved demo media once live demos mature.
Owner: Sanjay
Dependency: Client approval
Status: Waiting on input
Replace generated Studio packets with richer live-demo media, screenshots, and client-approved examples as productized demos mature.
Owner: Sanjay
Dependency: Client approval
Status: Waiting on input
Wire the case-study demo-readiness rows into real product controls, sanitized walkthrough videos, and client-approved proof once those assets are available.
Owner: Saujas with Sanjay
Dependency: Client approval
Status: Waiting on input
Replace the synthetic scripts/generate-benchmark-results.mjs seed rows with real model/provider run outputs.
Owner: Sanjay
Dependency: Internal buildout
Status: Can improve now
Point the current /api/benchmark-run-intake, CSV/JSON trace importer, and replay-scaffold artifact files at real harness/notebook exports once the first real benchmark runs are available.
Owner: Sanjay
Dependency: Real benchmark exports
Status: Waiting on input
Research backlog
What should deepen once real traces and source packets arrive.
Extend the new open-weight inference economics article with measured latency/quality traces from actual Mistral, DeepSeek, Qwen, and hosted inference runs once API keys and benchmark harness outputs are available.
Owner: Sanjay
Dependency: Real benchmark exports
Status: Waiting on input
Keep expanding data/research-evidence-library.json beyond the current 31 generated reading packets, and add carefully selected short verbatim excerpts only where publication needs exact wording.
Owner: Sanjay
Dependency: Internal buildout
Status: Can improve now
Keep monitoring pricing refresh parser drift as provider pages change, and update provider-specific selectors when diagnostics show low confidence or missing expected model labels.
Owner: Sanjay
Dependency: Provider page drift
Status: Can improve now
Replace the generated Indian workflow v0.1 dataset design with redacted source packets, gold answers, reviewer notes, and real model/provider run exports.
Owner: Sanjay
Dependency: Internal buildout
Status: Can improve now
Replace generated benchmark-control metadata with real harness metadata once actual runs exist: raw prompts, exact model/provider identifiers, scorer identity, trace artifacts, and run replay links.
Owner: Sanjay
Dependency: Real benchmark exports
Status: Waiting on input
Extend the generated agent benchmark literature map with measured Edxperimental benchmark traces and a combined Agentic Reliability Index formula once real model/agent runs exist.
Owner: Sanjay
Dependency: Internal buildout
Status: Can improve now
Turn the generated mechanistic interpretability playbook modules into separate long-form article pages if Sanjay wants a full explainer series.
Owner: Sanjay
Dependency: Editorial decision
Status: Waiting on input
Extend the inference economics playbook with measured latency and throughput traces once real provider/GPU benchmark runs exist.
Owner: Sanjay
Dependency: Real benchmark exports
Status: Waiting on input
Launch control
What can move now, and what needs real-world inputs.
This keeps the half-hour buildout loop honest: improve the product surface and publishing system immediately, but wait for selected providers, approved evidence, and real harness exports before replacing provisional material.
Ready now
Publish research
Use the file-backed article kit, source registry, visual sidecars, and review checklist to add new notes without changing route code.
Capture benchmark runs
Use /benchmarks#benchmark-run-intake-form or the CSV template to capture provider, notebook, browser-agent, and coding-agent rows.
Run verification
Use pnpm lint, pnpm build, and pnpm verify:site from edxperimental-labs/ after each site expansion.
Needs Sanjay input
CRM or mailing list
Pick the provider that should receive newsletter, consulting, careers, and benchmark intake records.
Social links
Share official LinkedIn, X, Discord, YouTube, or GitHub URLs so footer/header placeholders can become real links.
Client evidence
Provide approved case-study screenshots, metrics, names, and demo media before provisional packets are presented as final proof.
Automation lane
Every 30 minutes
Inspect the tracker, choose one unfinished task, implement it, verify it, and update the finished-vs-next list.
Do not fake evidence
Improve scaffolds, importers, packets, and forms while labeling synthetic data until real traces arrive.
Keep the site connected
Every new page or artifact should be reachable from navigation, Command-K, a report link, or a related article.
Keep using this as the operating checklist.
Every future article, benchmark trace, Studio demo, consulting packet, and case-study update should land here so the site stays honest about what is shipped versus what needs stronger evidence.