Target outcomes
- 100% routes with structured logs in production
- Empty-retrieval rate reviewed weekly on RAG paths
Initiative playbook
Typical delivery arc for this pattern in enterprise programs.
- 1Discovery2 to 4 wks
Agree mandatory telemetry fields and retention with platform and risk.
- 2Pilot6 to 8 wks
Dashboard P95 latency, token spend, and empty-retrieval rate for one route.
- 3Scaleongoing
Export to OpenTelemetry; alert on safety flags and cost anomalies per tenant.
Business use case
Problem
Pilots ship without run logging, when something goes wrong, nobody can answer which model, which retrieval, or what it cost.
Who benefits
- Platform engineering, SLOs and dashboards per route
- FinOps, token spend visibility before finance asks
- Incident response, trace id ties user report to logs
Success metrics
- 100% of production AI routes emit structured telemetry
- P95 latency tracked per model and per retrieval mode
- Weekly review of empty-retrieval rate on RAG paths
Solution
Wrap generateText (with optional seed RAG) and return a telemetry object alongside the answer, pattern for OpenTelemetry export, log drains, or Vercel observability.
Technical implementation
Stack
- AI SDK
generateTextwithusagetokens - searchSeedDocuments for retrieval slice timing
Architecture
Outcomes and learnings
- Log retrieval empties separately from model errors, different fixes
- Cost estimate is indicative; wire real pricing tables in production
- Same shape works for agents, batch jobs, and workflows with step spans
Where else this applies
Observability is what turns a demo into an operated service, finance, SRE, and risk all ask different questions from the same trace.
FinOps chargeback
Token and cost estimates per team, model, and feature flag.
Incident debugging
Support ties a bad answer to retrieval hits and model version within minutes.
RAG quality ops
Alert when empty-retrieval rate spikes after index or taxonomy changes.
Vendor routing reviews
Compare latency and spend when gateway routes change between models.
Using this stack elsewhere
Emit structured JSON from every AI route; forward to your log drain, OpenTelemetry collector, or Vercel observability with consistent traceId propagation.
Live demo
The demo is the same code path described above, not a simplified mock UI. Add keys in .env.local when you are ready; the narrative and diagrams stand on their own without them.
Business
Ask a policy question and inspect trace id, latency, tokens, retrieval hits, and a rough cost, what finance and SRE will ask for in week six.
Technical
generateText plus optional searchSeedDocuments; telemetry JSON returned from /api/demos/vercel-observability.