# Audit Limitations

Audit Limitations matters when a deployment owner needs more than a pass/fail eval. It gives the audit team a way to collect internal evidence, test a causal hypothesis, and state the limits before changing a production control.

## Field Guide

### What the method is for

Prevent interpretability findings from being oversold as proof of safety or correctness.

### The operator question

What would make this interpretation fail, and what decision is still unsupported?

### What the audit should produce

Limit memo that states what the audit proves, what it does not prove, and what control should remain in production.

### Where the method fails

SAEs can surface patterns in unexpected places, including random or weakly related systems, so operational claims need baselines and decision boundaries.

## Audit Template

| Field | Guidance |
| --- | --- |
| Behavior under review | One narrow failure, policy behavior, shortcut, or refusal pattern. |
| Candidate mechanism | What would make this interpretation fail, and what decision is still unsupported? |
| Evidence packet | Counterexamples; Feature consistency checks; Random-model or random-feature baselines; Out-of-domain tests |
| Decision boundary | What can change in production if the causal claim survives review. |
| Limit memo | SAEs can surface patterns in unexpected places, including random or weakly related systems, so operational claims need baselines and decision boundaries. |

## Evidence To Collect

- Counterexamples
- Feature consistency checks
- Random-model or random-feature baselines
- Out-of-domain tests

## Sources

- [Sparse Autoencoders Can Interpret Randomly Initialized Transformers](https://arxiv.org/abs/2501.17727)
- [Feature consistency in SAEs](https://huggingface.co/papers/2505.20254)
- [Anthropic Responsible Scaling Policy](https://www.anthropic.com/responsible-scaling-policy)
