# Browser Agent Harness

Owner: Agent evaluator

## Input

Browser task packet with start URL, target state, allowed tools, and blocked shortcuts.

## Output

Trace archive with DOM checkpoints, screenshot evidence, console warnings, timing, recovered errors, and final state proof.

## Required Evidence

- screenshot proof
- DOM checkpoint
- tool trace
- console warnings
- target-state assertion

## Rejection Rule

A browser run is rejected when the agent claims completion without state proof or screenshot/DOM evidence.

## Handoff

Normalize accepted rows into either the first-party benchmark run intake form or `public/reports/benchmark-intake/run-template.csv`, then run `pnpm benchmarks:generate` after reviewer signoff.