Designing
Indian Workflow Index
Support, finance, legal, sales, document, and multilingual tasks scored by outcome, not trivia recall.
Leaderboards
These are prototype tracks for a public Indian AI benchmarking layer. The goal is not a single universal score; it is a set of buyer-relevant views with visible methodology.
Designing
Support, finance, legal, sales, document, and multilingual tasks scored by outcome, not trivia recall.
Prototype
Measures tool-use planning, recovery, browser discipline, terminal usage, and whether agents can verify their own work.
Live draft
Converts provider pricing into cost per completed workflow using cache hit assumptions, output length, and retries.
Collecting
Separates time-to-first-token, output tokens per second, queueing behavior, and provider variance.
Every public number should trace back to task provenance, model settings, sample count, scoring rubric, and the failure cases the score hides.
Read methodologyAgentic Reliability Formula
Trace-derived Agentic Reliability Index for comparing coding, browser, and support agents by completion, state proof, recovery, tool/policy correctness, and cost-latency discipline. The formula keeps completion, state proof, recovery, tool discipline, and operating cost visible instead of hiding them behind a single rank.
4
Suites
16
Traces
5
Weights
30%
Measures whether the agent produced the accepted workflow result.
25%
Rewards proof that the browser, codebase, support policy, or tool state actually reached the target.
20%
Separates agents that recover from validation errors, failed tools, and partial state from agents that simply stop.
15%
Captures tool discipline, policy adherence, escalation quality, and repository hygiene.
10%
Prevents slow or expensive agents from ranking well unless quality justifies the operating cost.
Prototype run matrix