Website Buildout Status

A live tracker for what is shipped and what still needs real evidence.

This page turns the internal buildout plan into a public operating surface: shipped pages, generated research systems, pending real benchmark inputs, and the next research upgrades.

Coverage map

Where the shipped work is concentrated

The current website is strongest in Studio, benchmark infrastructure, research articles, consulting collateral, and company pages. The remaining work is mostly replacing synthetic data with real run artifacts.

31

Research

30

Benchmarks

12

Studio

7

Company

5

Consulting

4

Content

3

Platform

Finished foundation

A visitor can now move through the full company story.

Homepage, Studio, models, leaderboards, articles, consulting, case studies, careers, terms, downloadable packets, and Command-K search are all part of the current build.

Studio

Homepage keeps the Independent analysis of AI positioning and now points strongly into Studio and Articles.

Studio

Studio route exists with product/service surfaces for agent benchmarks, coding agents, browser agents, support agents, model recommendations, and cost curves.

Studio

Studio products now have detail pages at /studio/[slug] with buyer questions, workflow steps, deliverables, demo state, and links into benchmark evidence.

Studio

Studio now has a generated product catalog from pnpm studio:generate, with markdown/JSON catalog files and buyer-ready packets for all eight Studio surfaces under edxperimental-labs/public/reports/studio/.

Studio

Studio now has a generated demo-readiness board from pnpm studio:generate, with demo-ready counts, owners, tour order, readiness gates, missing live-demo evidence, downloadable demo-readiness.md, and Command-K indexed /studio#studio-demo-readiness surface.

Studio

Studio now has captured visual preview assets from pnpm studio:screenshots, with one live page screenshot per Studio product and a manifest under edxperimental-labs/public/reports/studio/previews/.

Benchmarks

Agent Benchmark Explorer now includes a live trace explorer driven by generated benchmark data, with suite/task selectors, expected evidence, model trace ranking, top answer excerpt, failure reason, and tool-call chips.

Benchmarks

Browser Agent Evaluation Kit now includes a live browser-state evaluation demo driven by browser-operation traces, with scenario selection, strict state proof toggle, state/recovery/screenshot/handoff indicators, run comparison, and deployment readout.

Benchmarks

Coding Agent Arena now includes a live coding-agent console driven by maintenance traces, with task packet selection, browser-proof control, merge-readiness/regression/tool-discipline indicators, run ranking, acceptance evidence, and arena verdict.

Benchmarks

Customer Support Agent Scorecard now includes a live support scorecard console driven by support-policy traces, with scenario selection, escalation proof control, policy/tone/handoff indicators, model ranking, evidence chips, and rollout readout.

Studio

Indian Workflow Benchmark is now a Studio product page with a live benchmark console driven by the Indian Enterprise Workflow Suite, including workflow packet selection, holdout-pressure control, evidence/escalation/localization/cost indicators, task-mix bars, model comparison, and benchmark readiness guidance.

Studio

Consulting Diagnostic is now a Studio product page with a live intake console for choosing consulting tracks, adjusting deployment pressure/evidence gap/data sensitivity, routing work to Sanjay or Saujas, and generating first-sprint guidance.

Studio

Cost Curve Workbench now includes a live interactive Studio demo with sliders for input/output tokens, cache hit rate, batchable share, and human review cost, plus a modeled monthly cost curve and route comparison table.

Studio

Model Recommendation Console now includes a live interactive Studio demo with workload selection, quality/latency/cost/privacy/agentic controls, fit rankings, primary/fallback route cards, and a recommendation table.

Content

Models, Agents & Hardware now includes a generated inference economics playbook from pnpm inference:generate, covering managed APIs, hosted open-weight inference, dedicated endpoints, self-hosted GPUs, batch, cache, throughput, queueing, and latency variance.

Benchmarks

Models, Agents & Hardware now includes a generated inference trace kit from pnpm inference:generate, with measured-trace CSV template, JSON schema, runbook, metric definitions, and /models#inference-trace-kit surface for latency, throughput, cache, batch, acceptance, and cost data.

Content

Models, Agents & Hardware now includes a generated hardware procurement matrix from pnpm inference:generate, with decision gates, readiness scores, route choices, and /models#hardware-procurement-matrix surface for managed APIs, hosted open-weight inference, dedicated endpoints, cloud GPUs, and owned hardware.

Content

Models, Agents & Hardware route exists as one combined technical map.

Next high-impact work

The remaining blockers are real inputs, not route plumbing.

BenchmarksReplace

Feed real provider, notebook, browser-agent, and coding-agent outputs through /api/benchmark-run-intake or the CSV template, then replace prototype benchmark scores and task trace packets with reviewer-signed run rows.

Owner: Sanjay

Dependency: Real benchmark exports

Status: Waiting on input

CompanyNext

Connect the first-party /api/newsletter capture route to the chosen mailing-list provider or CRM once Sanjay picks the provider.

Owner: Saujas with Sanjay

Dependency: Provider/process decision

Status: Waiting on input

ConsultingNext

Connect /api/consulting-intake and the generated consulting collateral to signed proposal templates plus the final CRM once the sales process is finalized.

Owner: Saujas with Sanjay

Dependency: Provider/process decision

Status: Waiting on input

CompanyNext

Connect /api/careers-application and generated careers collateral to the final hiring inbox, CRM, or applicant tracker once the first candidate process is ready.

Owner: Saujas with Sanjay

Dependency: Provider/process decision

Status: Waiting on input

CompanyExtend

Add social links once Sanjay shares them, replacing the current contact placeholders.

Owner: Sanjay

Dependency: Official social URLs

Status: Waiting on input

StudioReplace

Replace captured Studio preview screenshots with product walkthrough videos and client-approved demo media once live demos mature.

Owner: Sanjay

Dependency: Client approval

Status: Waiting on input

StudioReplace

Replace generated Studio packets with richer live-demo media, screenshots, and client-approved examples as productized demos mature.

Owner: Sanjay

Dependency: Client approval

Status: Waiting on input

ConsultingWire

Wire the case-study demo-readiness rows into real product controls, sanitized walkthrough videos, and client-approved proof once those assets are available.

Owner: Saujas with Sanjay

Dependency: Client approval

Status: Waiting on input

BenchmarksReplace

Replace the synthetic scripts/generate-benchmark-results.mjs seed rows with real model/provider run outputs.

Owner: Sanjay

Dependency: Internal buildout

Status: Can improve now

BenchmarksWire

Point the current /api/benchmark-run-intake, CSV/JSON trace importer, and replay-scaffold artifact files at real harness/notebook exports once the first real benchmark runs are available.

Owner: Sanjay

Dependency: Real benchmark exports

Status: Waiting on input

Research backlog

What should deepen once real traces and source packets arrive.

ResearchExtend

Extend the new open-weight inference economics article with measured latency/quality traces from actual Mistral, DeepSeek, Qwen, and hosted inference runs once API keys and benchmark harness outputs are available.

Owner: Sanjay

Dependency: Real benchmark exports

Status: Waiting on input

ResearchExtend

Keep expanding data/research-evidence-library.json beyond the current 31 generated reading packets, and add carefully selected short verbatim excerpts only where publication needs exact wording.

Owner: Sanjay

Dependency: Internal buildout

Status: Can improve now

ResearchNext

Keep monitoring pricing refresh parser drift as provider pages change, and update provider-specific selectors when diagnostics show low confidence or missing expected model labels.

Owner: Sanjay

Dependency: Provider page drift

Status: Can improve now

BenchmarksReplace

Replace the generated Indian workflow v0.1 dataset design with redacted source packets, gold answers, reviewer notes, and real model/provider run exports.

Owner: Sanjay

Dependency: Internal buildout

Status: Can improve now

BenchmarksReplace

Replace generated benchmark-control metadata with real harness metadata once actual runs exist: raw prompts, exact model/provider identifiers, scorer identity, trace artifacts, and run replay links.

Owner: Sanjay

Dependency: Real benchmark exports

Status: Waiting on input

ResearchExtend

Extend the generated agent benchmark literature map with measured Edxperimental benchmark traces and a combined Agentic Reliability Index formula once real model/agent runs exist.

Owner: Sanjay

Dependency: Internal buildout

Status: Can improve now

ResearchNext

Turn the generated mechanistic interpretability playbook modules into separate long-form article pages if Sanjay wants a full explainer series.

Owner: Sanjay

Dependency: Editorial decision

Status: Waiting on input

ResearchExtend

Extend the inference economics playbook with measured latency and throughput traces once real provider/GPU benchmark runs exist.

Owner: Sanjay

Dependency: Real benchmark exports

Status: Waiting on input

Launch control

What can move now, and what needs real-world inputs.

This keeps the half-hour buildout loop honest: improve the product surface and publishing system immediately, but wait for selected providers, approved evidence, and real harness exports before replacing provisional material.

Ready now

Publish research

Use the file-backed article kit, source registry, visual sidecars, and review checklist to add new notes without changing route code.

Capture benchmark runs

Use /benchmarks#benchmark-run-intake-form or the CSV template to capture provider, notebook, browser-agent, and coding-agent rows.

Run verification

Use pnpm lint, pnpm build, and pnpm verify:site from edxperimental-labs/ after each site expansion.

Needs Sanjay input

CRM or mailing list

Pick the provider that should receive newsletter, consulting, careers, and benchmark intake records.

Social links

Share official LinkedIn, X, Discord, YouTube, or GitHub URLs so footer/header placeholders can become real links.

Client evidence

Provide approved case-study screenshots, metrics, names, and demo media before provisional packets are presented as final proof.

Automation lane

Every 30 minutes

Inspect the tracker, choose one unfinished task, implement it, verify it, and update the finished-vs-next list.

Do not fake evidence

Improve scaffolds, importers, packets, and forms while labeling synthetic data until real traces arrive.

Keep the site connected

Every new page or artifact should be reachable from navigation, Command-K, a report link, or a related article.

Keep using this as the operating checklist.

Every future article, benchmark trace, Studio demo, consulting packet, and case-study update should land here so the site stays honest about what is shipped versus what needs stronger evidence.

Publishing kit