US-based stealth startup
On-device AI Classification System
A hybrid classification architecture combining lightweight on-device inference with server-side LLM routing for a large category taxonomy.
Designed for low-latency local decisions with deeper cloud classification when confidence drops.
Challenge
The problem that had to be made measurable.
The product needed to classify inputs across a very large taxonomy while preserving fast local behavior and avoiding unnecessary cloud calls.
Taxonomy scale
8000+ categories
Architecture
Hybrid local/cloud
Optimization loop
DSPy-style
Primary risk controlled
Low-confidence routing
Approach
Split classification into local confidence checks and deeper server-side routing for ambiguous cases.
Designed prompt/eval loops around category confusion, not only aggregate accuracy.
Used lightweight local inference where latency mattered and server LLMs where context depth mattered.
Prepared a measurement plan for confidence thresholds, fallback rates, and taxonomy drift.
Project timeline
Taxonomy analysis
Local classifier prototype
Server LLM fallback
Prompt optimization
Evaluation plan
Case study packet
Downloadable evidence brief.
Generated packet for sales follow-up, client review, and future replacement with approved screenshots, raw artifacts, and final metrics.
Evidence cards
4
Risk items
4
Canonical page
/case-studies/on-device-classification
Manifest
/reports/case-study-packets/manifest.json
Risk register
Consulting pattern
Prototype quickly, then turn the demo into an evaluation surface.
The reusable Edxperimental pattern is to make the workflow measurable: inputs, expected output, acceptable failure, operational risk, and a repeatable benchmark before production expansion.
Discuss a similar project