Studio

Products, demos, and benchmark systems from Edxperimental Labs.

Studio is the product shelf: agent benchmarks, model recommendation tools, cost workbenches, and consulting diagnostics that turn research into usable buyer workflows.

Studio catalog

Generated product packets for demos and buyer follow-up.

Generated Studio product catalog for Edxperimental Labs demos, buyer follow-up, and service packaging. Each packet turns a Studio surface into a sales-ready brief with audience, deliverables, buyer questions, and connected evidence.

Agent Benchmark Explorer live Studio preview screenshotLive preview
Preview

Agent Benchmark Explorer

AI teams comparing autonomous workflows

A structured benchmark surface for measuring whether agents can plan, use tools, recover from errors, and complete useful work rather than only answer prompts.

Task completion
Tool-call quality
Recovery rate
Cost per resolved task
Open product page
Coding Agent Arena live Studio preview screenshotLive preview
Research

Coding Agent Arena

Engineering leaders and founders

A coding-agent evaluation track for repository edits, bug fixes, browser checks, terminal usage, and regression discipline.

Patch correctness
Test pass rate
Review quality
Time to mergeable PR
Open product page
Browser Agent Evaluation Kit live Studio preview screenshotLive preview
Research

Browser Agent Evaluation Kit

Teams automating web operations

Browser-agent tasks for navigation, form filling, extraction, screenshot QA, and resilient recovery from UI changes.

Navigation success
State verification
DOM robustness
Human handoff rate
Open product page
Customer Support Agent Scorecard live Studio preview screenshotLive preview
Consulting

Customer Support Agent Scorecard

Support, CX, and operations teams

A scorecard for support agents covering escalation quality, policy adherence, multilingual handling, hallucination risk, and customer outcome.

Resolution rate
Escalation precision
Policy adherence
Tone consistency
Open product page
Indian Workflow Benchmark live Studio preview screenshotLive preview
Designing v0.1

Indian Workflow Benchmark

Indian enterprises, AI buyers, and product teams

A workflow benchmark for Indian business tasks: finance, support, multilingual handoffs, document reasoning, sales ops, and evidence-grounded escalation.

Outcome correctness
Evidence citation
Escalation judgement
Cost per accepted output
Open product page
Model Recommendation Console live Studio preview screenshotLive preview
Preview

Model Recommendation Console

Buyers choosing models or API providers

A decision console that maps use-case constraints to a model shortlist across quality, latency, price, context, privacy, and deployment surface.

Fit score
Latency budget
Monthly cost
Fallback coverage
Open product page
Cost Curve Workbench live Studio preview screenshotLive preview
Preview

Cost Curve Workbench

Finance and platform teams

A calculator-style tool for converting token pricing into workload cost curves, batch discounts, cache effects, and per-resolution economics.

Cost per 1k tasks
Cache savings
Batch savings
Reasoning-token exposure
Open product page
Consulting Diagnostic live Studio preview screenshotLive preview
Client intake

Consulting Diagnostic

Founders, AI buyers, and operations leaders

A fast intake surface for turning an AI idea, vendor claim, or production concern into a benchmarkable consulting engagement.

Workflow risk
Evidence gap
First sprint
Owner routing
Open product page

Studio Demo Readiness

What can be shown now, and what still needs real media.

Public Studio demo readiness board for deciding which products can be shown today, which need consulting context, and which need real traces or client-approved media before stronger claims.

7/8

Demo-ready

4

Readiness gates

8

Tour steps

ProductScoreOwnerMissing for live demo
Agent Benchmark Explorer76Sanjay PrasadReviewer-signed trace, Real screenshot/media, Provider or agent export
Coding Agent Arena82Sanjay PrasadProduct walkthrough video, Client-approved example, Real run export
Browser Agent Evaluation Kit64Sanjay PrasadReviewer-signed trace, Real screenshot/media, Provider or agent export
Customer Support Agent Scorecard82SaujasProduct walkthrough video, Client-approved example, Real run export
Indian Workflow Benchmark82Sanjay PrasadProduct walkthrough video, Client-approved example, Real run export
Model Recommendation Console82Sanjay PrasadProduct walkthrough video, Client-approved example, Real run export
Cost Curve Workbench82Sanjay PrasadProduct walkthrough video, Client-approved example, Real run export
Consulting Diagnostic82SaujasProduct walkthrough video, Client-approved example, Real run export

Readiness gates

Generated demo packet exists.

Interactive or screenshot preview exists.

Connected benchmark/research evidence is linked.

Missing real traces, walkthrough video, and client-approved examples are labeled.

Studio operating loop

Every product starts as a consulting question, becomes an eval protocol, then turns into a reusable public surface once the scoring logic is stable.

Diagnose

Capture the buyer workflow, constraints, sample data, and acceptance criteria.

Benchmark

Run models, agents, providers, and toolchains through reproducible tasks.

Deploy

Deliver recommendation, dashboard, fallback plan, and monitoring loop.