AI Labs
All examples

Production Observability for AI Routes

Structured telemetry on every request: trace id, latency breakdown, tokens, retrieval hits, and cost estimate.

GovernancePrompt evaluationRAG

Target outcomes

  • 100% routes with structured logs in production
  • Empty-retrieval rate reviewed weekly on RAG paths

Initiative playbook

Typical delivery arc for this pattern in enterprise programs.

  1. 1
    Discovery2 to 4 wks

    Agree mandatory telemetry fields and retention with platform and risk.

  2. 2
    Pilot6 to 8 wks

    Dashboard P95 latency, token spend, and empty-retrieval rate for one route.

  3. 3
    Scaleongoing

    Export to OpenTelemetry; alert on safety flags and cost anomalies per tenant.

Business use case

Problem

Pilots ship without run logging, when something goes wrong, nobody can answer which model, which retrieval, or what it cost.

Who benefits

  • Platform engineering, SLOs and dashboards per route
  • FinOps, token spend visibility before finance asks
  • Incident response, trace id ties user report to logs

Success metrics

  • 100% of production AI routes emit structured telemetry
  • P95 latency tracked per model and per retrieval mode
  • Weekly review of empty-retrieval rate on RAG paths

Solution

Wrap generateText (with optional seed RAG) and return a telemetry object alongside the answer, pattern for OpenTelemetry export, log drains, or Vercel observability.

Technical implementation

Stack

  • AI SDK generateText with usage tokens
  • searchSeedDocuments for retrieval slice timing

Architecture

How it runs
Drawing the flow…

Outcomes and learnings

  • Log retrieval empties separately from model errors, different fixes
  • Cost estimate is indicative; wire real pricing tables in production
  • Same shape works for agents, batch jobs, and workflows with step spans

Where else this applies

Observability is what turns a demo into an operated service, finance, SRE, and risk all ask different questions from the same trace.

FinOps chargeback

Token and cost estimates per team, model, and feature flag.

Incident debugging

Support ties a bad answer to retrieval hits and model version within minutes.

RAG quality ops

Alert when empty-retrieval rate spikes after index or taxonomy changes.

Vendor routing reviews

Compare latency and spend when gateway routes change between models.

Using this stack elsewhere

Emit structured JSON from every AI route; forward to your log drain, OpenTelemetry collector, or Vercel observability with consistent traceId propagation.

Live demo

The demo is the same code path described above, not a simplified mock UI. Add keys in .env.local when you are ready; the narrative and diagrams stand on their own without them.

Business

Ask a policy question and inspect trace id, latency, tokens, retrieval hits, and a rough cost, what finance and SRE will ask for in week six.

Technical

generateText plus optional searchSeedDocuments; telemetry JSON returned from /api/demos/vercel-observability.

Production observability slice

Same request with structured telemetry: trace id, latency breakdown, tokens, retrieval hits, rough cost.

Live