# Circuit Tracing

Circuit Tracing matters when a deployment owner needs more than a pass/fail eval. It gives the audit team a way to collect internal evidence, test a causal hypothesis, and state the limits before changing a production control.

## Field Guide

### What the method is for

Build a graph-like hypothesis for how features and components combine to produce behavior.

### The operator question

What path connects input features, intermediate features, and the final behavior the audit cares about?

### What the audit should produce

Circuit audit note with a graph, the behavior it explains, causal tests, and open limits.

### Where the method fails

Circuit traces are targeted explanations, not complete model maps. They should explain a specific behavior under a defined task distribution.

## Audit Template

| Field | Guidance |
| --- | --- |
| Behavior under review | One narrow failure, policy behavior, shortcut, or refusal pattern. |
| Candidate mechanism | What path connects input features, intermediate features, and the final behavior the audit cares about? |
| Evidence packet | Attribution graph; Feature-to-feature edges; Intervention checks; Behavioral case study |
| Decision boundary | What can change in production if the causal claim survives review. |
| Limit memo | Circuit traces are targeted explanations, not complete model maps. They should explain a specific behavior under a defined task distribution. |

## Evidence To Collect

- Attribution graph
- Feature-to-feature edges
- Intervention checks
- Behavioral case study

## Sources

- [Circuit Tracing methods](https://transformer-circuits.pub/2025/attribution-graphs/methods.html)
- [Tracing the Thoughts of a Large Language Model](https://www.anthropic.com/research/tracing-thoughts-language-model)
- [Anthropic Responsible Scaling Policy](https://www.anthropic.com/responsible-scaling-policy)
