# Edxperimental Labs Studio Catalog

Generated product shelf for demos, buyer follow-up, and consulting handoff.

Product count: 8
Demo-ready count: 7

## Agent Benchmark Explorer

Stage: Preview

Audience: AI teams comparing autonomous workflows

Summary: A structured benchmark surface for measuring whether agents can plan, use tools, recover from errors, and complete useful work rather than only answer prompts.

Maturity: Can be shown as a product demo today

Demo readiness: 76/100

Missing for live demo:
- Reviewer-signed trace
- Real screenshot/media
- Provider or agent export

Buyer questions:
- Can the agent recover after a bad tool call?
- Does it verify state before claiming completion?
- How much does each accepted workflow actually cost?
- Which failures should trigger human handoff?

Outputs:
- Agent scorecard
- Trace review table
- Failure taxonomy
- Cost per resolved task

Links: [Product page](/studio/agent-benchmark-explorer) / [JSON packet](/reports/studio/agent-benchmark-explorer.json)

Preview: /reports/studio/previews/agent-benchmark-explorer.png

## Coding Agent Arena

Stage: Research

Audience: Engineering leaders and founders

Summary: A coding-agent evaluation track for repository edits, bug fixes, browser checks, terminal usage, and regression discipline.

Maturity: Research surface ready for benchmark-backed demos

Demo readiness: 82/100

Missing for live demo:
- Product walkthrough video
- Client-approved example
- Real run export

Buyer questions:
- Can this agent work inside our existing codebase?
- Does it respect ownership boundaries and avoid unrelated churn?
- Can it debug failing tests without hiding the failure?
- What tasks are safe to delegate today?

Outputs:
- Patch review
- Regression report
- Tool-use transcript
- Merge-readiness score

Links: [Product page](/studio/coding-agent-arena) / [JSON packet](/reports/studio/coding-agent-arena.json)

Preview: /reports/studio/previews/coding-agent-arena.png

## Browser Agent Evaluation Kit

Stage: Research

Audience: Teams automating web operations

Summary: Browser-agent tasks for navigation, form filling, extraction, screenshot QA, and resilient recovery from UI changes.

Maturity: Research surface ready for benchmark-backed demos

Demo readiness: 64/100

Missing for live demo:
- Reviewer-signed trace
- Real screenshot/media
- Provider or agent export

Buyer questions:
- Can the agent prove the page reached the right state?
- What happens when a modal or validation error appears?
- Which workflows are stable enough for automation?
- Where should a human remain in the loop?

Outputs:
- Browser task report
- Screenshot evidence
- Selector fragility map
- Handoff recommendation

Links: [Product page](/studio/browser-agent-evaluation-kit) / [JSON packet](/reports/studio/browser-agent-evaluation-kit.json)

Preview: /reports/studio/previews/browser-agent-evaluation-kit.png

## Customer Support Agent Scorecard

Stage: Consulting

Audience: Support, CX, and operations teams

Summary: A scorecard for support agents covering escalation quality, policy adherence, multilingual handling, hallucination risk, and customer outcome.

Maturity: Best introduced inside a consulting conversation

Demo readiness: 82/100

Missing for live demo:
- Product walkthrough video
- Client-approved example
- Real run export

Buyer questions:
- Will the agent respect our refund and exception policy?
- Can it handle bilingual or region-specific context?
- When should it escalate instead of improvising?
- Which support queues should be automated first?

Outputs:
- Support scenario pack
- Policy adherence matrix
- Escalation audit
- Rollout recommendation

Links: [Product page](/studio/customer-support-agent-scorecard) / [JSON packet](/reports/studio/customer-support-agent-scorecard.json)

Preview: /reports/studio/previews/customer-support-agent-scorecard.png

## Indian Workflow Benchmark

Stage: Designing v0.1

Audience: Indian enterprises, AI buyers, and product teams

Summary: A workflow benchmark for Indian business tasks: finance, support, multilingual handoffs, document reasoning, sales ops, and evidence-grounded escalation.

Maturity: Research surface ready for benchmark-backed demos

Demo readiness: 82/100

Missing for live demo:
- Product walkthrough video
- Client-approved example
- Real run export

Buyer questions:
- Which models survive Indian document and support workflows?
- Where do multilingual or policy tasks fail?
- What can be safely automated versus escalated?
- How do quality, latency, and cost change by workflow type?

Outputs:
- Workflow task pack
- Model comparison memo
- Evidence audit
- Deployment readiness map

Links: [Product page](/studio/indian-workflow-benchmark) / [JSON packet](/reports/studio/indian-workflow-benchmark.json)

Preview: /reports/studio/previews/indian-workflow-benchmark.png

## Model Recommendation Console

Stage: Preview

Audience: Buyers choosing models or API providers

Summary: A decision console that maps use-case constraints to a model shortlist across quality, latency, price, context, privacy, and deployment surface.

Maturity: Can be shown as a product demo today

Demo readiness: 82/100

Missing for live demo:
- Product walkthrough video
- Client-approved example
- Real run export

Buyer questions:
- Which model should handle the expensive path?
- Where can a cheaper router or fallback be used?
- What latency and privacy constraints change the answer?
- What benchmark evidence is still missing?

Outputs:
- Model shortlist
- Fallback map
- Monthly cost envelope
- Pre-production test plan

Links: [Product page](/studio/model-recommendation-console) / [JSON packet](/reports/studio/model-recommendation-console.json)

Preview: /reports/studio/previews/model-recommendation-console.png

## Cost Curve Workbench

Stage: Preview

Audience: Finance and platform teams

Summary: A calculator-style tool for converting token pricing into workload cost curves, batch discounts, cache effects, and per-resolution economics.

Maturity: Can be shown as a product demo today

Demo readiness: 82/100

Missing for live demo:
- Product walkthrough video
- Client-approved example
- Real run export

Buyer questions:
- What will this cost at 10k, 100k, or 1M workflows?
- How much do retries and human review change the answer?
- When does prompt caching materially matter?
- Which model class is cheap after quality is included?

Outputs:
- Cost curve
- Scenario table
- Savings waterfall
- Budget envelope

Links: [Product page](/studio/cost-curve-workbench) / [JSON packet](/reports/studio/cost-curve-workbench.json)

Preview: /reports/studio/previews/cost-curve-workbench.png

## Consulting Diagnostic

Stage: Client intake

Audience: Founders, AI buyers, and operations leaders

Summary: A fast intake surface for turning an AI idea, vendor claim, or production concern into a benchmarkable consulting engagement.

Maturity: Research surface ready for benchmark-backed demos

Demo readiness: 82/100

Missing for live demo:
- Product walkthrough video
- Client-approved example
- Real run export

Buyer questions:
- What is the smallest benchmark we should run first?
- Which evidence is missing before production?
- Who should own the technical and sales-engineering handoff?
- How quickly can we get a decision artifact?

Outputs:
- Diagnostic memo
- Evidence request list
- First-sprint plan
- Owner handoff

Links: [Product page](/studio/consulting-diagnostic) / [JSON packet](/reports/studio/consulting-diagnostic.json)

Preview: /reports/studio/previews/consulting-diagnostic.png
