# Browser Agent Evaluation Kit

Stage: Research

Audience: Teams automating web operations

## Summary

Browser-agent tasks for navigation, form filling, extraction, screenshot QA, and resilient recovery from UI changes.

## Buyer Problem

Browser agents fail in ways that are invisible to a text-only benchmark: stale selectors, modals, partial page state, authentication friction, and confident completion claims without proof.

## Metrics

- Navigation success
- State verification
- DOM robustness
- Human handoff rate

## Deliverables

- Browser task report
- Screenshot evidence
- Selector fragility map
- Handoff recommendation

## Buyer Questions

- Can the agent prove the page reached the right state?
- What happens when a modal or validation error appears?
- Which workflows are stable enough for automation?
- Where should a human remain in the loop?

## Demo State

Research kit with browser-operation scoring; real authenticated workflow packs can be built for consulting clients.

Demo readiness: 64/100

Missing for live demo:
- Reviewer-signed trace
- Real screenshot/media
- Provider or agent export

## Connected Evidence

- [Browser Operations Suite](/benchmarks/browser-operations-suite)
- [Agent benchmarks article](/articles/agent-benchmarks-that-survive-real-work)
- [Studio request form](/contact)

## Visual Preview

![Browser Agent Evaluation Kit live Studio preview screenshot](/reports/studio/previews/browser-agent-evaluation-kit.png)