US-based stealth startup

On-device AI Classification System

A hybrid classification architecture combining lightweight on-device inference with server-side LLM routing for a large category taxonomy.

Designed for low-latency local decisions with deeper cloud classification when confidence drops.

Challenge

The problem that had to be made measurable.

The product needed to classify inputs across a very large taxonomy while preserving fast local behavior and avoiding unnecessary cloud calls.

Taxonomy scale

8000+ categories

Architecture

Hybrid local/cloud

Optimization loop

DSPy-style

Primary risk controlled

Low-confidence routing

Approach

Split classification into local confidence checks and deeper server-side routing for ambiguous cases.

Designed prompt/eval loops around category confusion, not only aggregate accuracy.

Used lightweight local inference where latency mattered and server LLMs where context depth mattered.

Prepared a measurement plan for confidence thresholds, fallback rates, and taxonomy drift.

Project timeline

Taxonomy analysis

Local classifier prototype

Server LLM fallback

Prompt optimization

Evaluation plan

Case study packet

Downloadable evidence brief.

Generated packet for sales follow-up, client review, and future replacement with approved screenshots, raw artifacts, and final metrics.

Markdown packet Structured JSON

Evidence cards

Risk items

Canonical page

/case-studies/on-device-classification

Manifest

/reports/case-study-packets/manifest.json

Risk register

Category drift

False confidence

Server fallback cost

Ambiguous labels

Consulting pattern

Prototype quickly, then turn the demo into an evaluation surface.

The reusable Edxperimental pattern is to make the workflow measurable: inputs, expected output, acceptable failure, operational risk, and a repeatable benchmark before production expansion.

Discuss a similar project