Back to Indian Enterprise Workflow Suite

Legal / public trace / Medium

Vendor contract renewal risk

Review a vendor renewal clause; identify renewal deadline; surface termination notice risk; draft an internal note with evidence.

Expected evidence

renewal clause
notice deadline
commercial owner

Scoring focus

contract reading
deadline extraction
risk recommendation

Common failure mode

Weak models spot the renewal clause but miss the notice window that changes the recommendation.

Expected output

A concise internal note with renewal date; notice deadline; commercial risk; and cited contract evidence.

Score breakdown

Contract35
Deadline25
Risk25
Recommendation15

Trace provenance

Can this public trace be audited later?

Trace id: trace-indian-enterprise-workflow-suite-vendor-contract-renewal-risk
Created: 2026-05-18
Last reviewed: 2026-05-16
Source: data/benchmark-trace-runs.csv
Leakage risk: Medium: public sample can become saturated after publication.
Retirement status: Active public sample; review after monthly refresh.

Score calculation ledger

How the top score is allocated

Score equals the sum of weighted rubric components. Component earned points are allocated from the aggregate run score.

Contract30/35
Deadline22/25
Risk22/25
Recommendation13/15

Model version

frontier-reasoning-eval-public-2026-05

Run seed

2026051680

Prompt packet

vendor-contract-renewal-risk-public-packet-v0.1

Artifact bundle

Replay files for this trace

Replay scaffold generated from the current seed trace. Replace with real harness exports when model runs are available.

Replay command

pnpm benchmarks:replay --suite indian-enterprise-workflow-suite --task vendor-contract-renewal-risk

This command is intentionally documented before the real harness exists so the artifact contract is visible.

Payload preview

Split

public

Difficulty

Medium

Evidence fields

3

Model runs

4

Screenshot

Pending real browser or app screenshot artifact.

Model run evidence

Trace-level comparison

This is the inspection layer that keeps benchmark scores honest: each model class gets an outcome, cost proxy, latency, and reviewer note.

Frontier API provider

Frontier reasoning model

Accepted

Score87
Cost units4.9
Latency6230ms

Correctly linked renewal risk to the notice deadline and produced an actionable internal note.

Answer excerpt

The renewal auto-extends unless notice is sent before the deadline; the owner should confirm termination or renegotiation this week.

Failure reason

No major issue.

parse contractextract deadlinedraft risk note

Fast hosted API provider

Fast mid-tier model

Accepted with review

Score78
Cost units2.2
Latency3510ms

Good deadline extraction; reviewer tightened the business-risk framing.

Answer excerpt

The contract is close to renewal and should be reviewed before the notice period expires.

Failure reason

Needed clearer owner handoff.

parse contractextract deadline

Self-hosted/open-weight stack

Open-weight local model

Partial

Score57
Cost units1.5
Latency5340ms

Found the right section but failed the decision-critical date.

Answer excerpt

The vendor agreement may renew soon and legal should check it.

Failure reason

Missed the exact notice deadline.

parse contract

Low-cost routing endpoint

Small routing model

Rejected

Score35
Cost units0.4
Latency1730ms

Useful as a triage label but unsafe for contract advice.

Answer excerpt

Legal review needed.

Failure reason

Could route only; no reliable evidence extraction.

classify workflow

Why this trace matters

Aggregate scores are useful only when reviewers can inspect the task packet, expected evidence, and the exact failure mode. This page is the pattern for publishing public samples while keeping harder holdout tasks private.

Return to suite report