# Feature Dashboards

Feature Dashboards matters when a deployment owner needs more than a pass/fail eval. It gives the audit team a way to collect internal evidence, test a causal hypothesis, and state the limits before changing a production control.

## Field Guide

### What the method is for

Make recurring internal features inspectable by operators, reviewers, and deployment owners.

### The operator question

Can a reviewer see when policy, shortcut, refusal, sensitive-domain, or evidence-use features activate?

### What the audit should produce

Monitoring surface for high-risk workflows, paired with eval metrics and human review notes.

### Where the method fails

A dashboard can create false confidence if it shows labels without causal evidence or off-distribution checks.

## Audit Template

| Field | Guidance |
| --- | --- |
| Behavior under review | One narrow failure, policy behavior, shortcut, or refusal pattern. |
| Candidate mechanism | Can a reviewer see when policy, shortcut, refusal, sensitive-domain, or evidence-use features activate? |
| Evidence packet | Top examples; False positives; False negatives; Activation thresholds; Workflow slices |
| Decision boundary | What can change in production if the causal claim survives review. |
| Limit memo | A dashboard can create false confidence if it shows labels without causal evidence or off-distribution checks. |

## Evidence To Collect

- Top examples
- False positives
- False negatives
- Activation thresholds
- Workflow slices

## Sources

- [Mapping the Mind of a Large Language Model](https://www.anthropic.com/research/mapping-mind-language-model)
- [Sparse Autoencoders Find Highly Interpretable Features](https://huggingface.co/papers/2309.08600)
- [Sparse Autoencoder portal](https://www.sparseautoencoder.com/)