Back to case studies

US-based stealth startup

On-device AI Classification System

A hybrid classification architecture combining lightweight on-device inference with server-side LLM routing for a large category taxonomy.

Designed for low-latency local decisions with deeper cloud classification when confidence drops.

Challenge

The problem that had to be made measurable.

The product needed to classify inputs across a very large taxonomy while preserving fast local behavior and avoiding unnecessary cloud calls.

Taxonomy scale

8000+ categories

Architecture

Hybrid local/cloud

Optimization loop

DSPy-style

Primary risk controlled

Low-confidence routing

Approach

1

Split classification into local confidence checks and deeper server-side routing for ambiguous cases.

2

Designed prompt/eval loops around category confusion, not only aggregate accuracy.

3

Used lightweight local inference where latency mattered and server LLMs where context depth mattered.

4

Prepared a measurement plan for confidence thresholds, fallback rates, and taxonomy drift.

Project timeline

1

Taxonomy analysis

2

Local classifier prototype

3

Server LLM fallback

4

Prompt optimization

5

Evaluation plan

Case study packet

Downloadable evidence brief.

Generated packet for sales follow-up, client review, and future replacement with approved screenshots, raw artifacts, and final metrics.

Evidence cards

4

Risk items

4

Canonical page

/case-studies/on-device-classification

Manifest

/reports/case-study-packets/manifest.json

Risk register

Category drift
False confidence
Server fallback cost
Ambiguous labels

Consulting pattern

Prototype quickly, then turn the demo into an evaluation surface.

The reusable Edxperimental pattern is to make the workflow measurable: inputs, expected output, acceptable failure, operational risk, and a repeatable benchmark before production expansion.

Discuss a similar project