Studio

Products, demos, and benchmark systems from Edxperimental Labs.

Studio is the product shelf: agent benchmarks, model recommendation tools, cost workbenches, and consulting diagnostics that turn research into usable buyer workflows.

Request a demo View leaderboards

Studio catalog

Generated product packets for demos and buyer follow-up.

Generated Studio product catalog for Edxperimental Labs demos, buyer follow-up, and service packaging. Each packet turns a Studio surface into a sales-ready brief with audience, deliverables, buyer questions, and connected evidence.

Markdown catalog JSON catalog

PreviewAgent Benchmark ExplorerCan be shown as a product demo today4 outputs ResearchCoding Agent ArenaResearch surface ready for benchmark-backed demos4 outputs ResearchBrowser Agent Evaluation KitResearch surface ready for benchmark-backed demos4 outputs ConsultingCustomer Support Agent ScorecardBest introduced inside a consulting conversation4 outputs

Live preview

Preview

Agent Benchmark Explorer

AI teams comparing autonomous workflows

A structured benchmark surface for measuring whether agents can plan, use tools, recover from errors, and complete useful work rather than only answer prompts.

Task completion

Tool-call quality

Recovery rate

Cost per resolved task

Open product page

Coding Agent Arena live Studio preview screenshot

Live preview

Research

Coding Agent Arena

Engineering leaders and founders

A coding-agent evaluation track for repository edits, bug fixes, browser checks, terminal usage, and regression discipline.

Browser Agent Evaluation Kit

Teams automating web operations

Browser-agent tasks for navigation, form filling, extraction, screenshot QA, and resilient recovery from UI changes.

Customer Support Agent Scorecard

Support, CX, and operations teams

A scorecard for support agents covering escalation quality, policy adherence, multilingual handling, hallucination risk, and customer outcome.

Indian Workflow Benchmark

Indian enterprises, AI buyers, and product teams

A workflow benchmark for Indian business tasks: finance, support, multilingual handoffs, document reasoning, sales ops, and evidence-grounded escalation.

Outcome correctness

Evidence citation

Escalation judgement

Cost per accepted output

Open product page

Live preview

Preview

Model Recommendation Console

Buyers choosing models or API providers

A decision console that maps use-case constraints to a model shortlist across quality, latency, price, context, privacy, and deployment surface.

Cost Curve Workbench

Finance and platform teams

A calculator-style tool for converting token pricing into workload cost curves, batch discounts, cache effects, and per-resolution economics.

Cost per 1k tasks

Cache savings

Batch savings

Reasoning-token exposure

Open product page

Consulting Diagnostic live Studio preview screenshot

Live preview

Client intake

Consulting Diagnostic

Founders, AI buyers, and operations leaders

A fast intake surface for turning an AI idea, vendor claim, or production concern into a benchmarkable consulting engagement.

Studio Demo Readiness

What can be shown now, and what still needs real media.

Public Studio demo readiness board for deciding which products can be shown today, which need consulting context, and which need real traces or client-approved media before stronger claims.

7/8

Demo-ready

Readiness gates

Tour steps

Readiness board Catalog JSON

ProductScoreOwnerMissing for live demo

Agent Benchmark Explorer76Sanjay PrasadReviewer-signed trace, Real screenshot/media, Provider or agent export

Coding Agent Arena82Sanjay PrasadProduct walkthrough video, Client-approved example, Real run export

Browser Agent Evaluation Kit64Sanjay PrasadReviewer-signed trace, Real screenshot/media, Provider or agent export

Customer Support Agent Scorecard82SaujasProduct walkthrough video, Client-approved example, Real run export

Indian Workflow Benchmark82Sanjay PrasadProduct walkthrough video, Client-approved example, Real run export

Model Recommendation Console82Sanjay PrasadProduct walkthrough video, Client-approved example, Real run export

Cost Curve Workbench82Sanjay PrasadProduct walkthrough video, Client-approved example, Real run export

Consulting Diagnostic82SaujasProduct walkthrough video, Client-approved example, Real run export

Demo tour order

1Consulting Diagnostic 2Model Recommendation Console 3Cost Curve Workbench 4Agent Benchmark Explorer 5Indian Workflow Benchmark 6Coding Agent Arena 7Browser Agent Evaluation Kit 8Customer Support Agent Scorecard

Readiness gates

Generated demo packet exists.

Interactive or screenshot preview exists.

Connected benchmark/research evidence is linked.

Missing real traces, walkthrough video, and client-approved examples are labeled.

Studio operating loop

Every product starts as a consulting question, becomes an eval protocol, then turns into a reusable public surface once the scoring logic is stable.

Diagnose

Capture the buyer workflow, constraints, sample data, and acceptance criteria.

Benchmark

Run models, agents, providers, and toolchains through reproducible tasks.

Deploy

Deliver recommendation, dashboard, fallback plan, and monitoring loop.