# Coding Agent Arena

Stage: Research

Audience: Engineering leaders and founders

## Summary

A coding-agent evaluation track for repository edits, bug fixes, browser checks, terminal usage, and regression discipline.

## Buyer Problem

Coding agents are usually shown through curated demos. The Arena measures whether they can read an existing repository, make a scoped patch, run checks, inspect the UI, and produce work a senior engineer would review seriously.

## Metrics

- Patch correctness
- Test pass rate
- Review quality
- Time to mergeable PR

## Deliverables

- Patch review
- Regression report
- Tool-use transcript
- Merge-readiness score

## Buyer Questions

- Can this agent work inside our existing codebase?
- Does it respect ownership boundaries and avoid unrelated churn?
- Can it debug failing tests without hiding the failure?
- What tasks are safe to delegate today?

## Demo State

Live coding arena is connected to the Coding Agent Maintenance Suite; next step is importing real agent patches, logs, and review artifacts.

Demo readiness: 82/100

Missing for live demo:
- Product walkthrough video
- Client-approved example
- Real run export

## Connected Evidence

- [Coding Agent Maintenance Suite](/benchmarks/coding-agent-maintenance-suite)
- [Agentic Reliability Index](/leaderboards#agentic-reliability-index)
- [Leaderboard methodology](/articles/building-a-useful-ai-leaderboard-without-fooling-ourselves)

## Visual Preview

![Coding Agent Arena live Studio preview screenshot](/reports/studio/previews/coding-agent-arena.png)
