# On-device AI Classification System

Client: US-based stealth startup
Canonical path: /case-studies/on-device-classification

## Summary

A hybrid classification architecture combining lightweight on-device inference with server-side LLM routing for a large category taxonomy.

## Outcome

Designed for low-latency local decisions with deeper cloud classification when confidence drops.

## Challenge

The product needed to classify inputs across a very large taxonomy while preserving fast local behavior and avoiding unnecessary cloud calls.

## Evidence

- Taxonomy scale: 8000+ categories
- Architecture: Hybrid local/cloud
- Optimization loop: DSPy-style
- Primary risk controlled: Low-confidence routing

## Approach

1. Split classification into local confidence checks and deeper server-side routing for ambiguous cases.
2. Designed prompt/eval loops around category confusion, not only aggregate accuracy.
3. Used lightweight local inference where latency mattered and server LLMs where context depth mattered.
4. Prepared a measurement plan for confidence thresholds, fallback rates, and taxonomy drift.

## Timeline

1. Taxonomy analysis
2. Local classifier prototype
3. Server LLM fallback
4. Prompt optimization
5. Evaluation plan

## Risks

- Category drift
- False confidence
- Server fallback cost
- Ambiguous labels

## Stack

- Small LLM inference
- Server LLM classifier
- DSPy
- Text-gradient style prompt iteration

## Next Evidence Step

Replace provisional metrics with client-approved screenshots, raw artifacts, and final numbers when available.