AI Labs
All examples

Multi-Model Routing with Vercel AI Gateway

Compare latency and responses across models through a single gateway endpoint.

Model routingStreaming

Target outcomes

  • Platform teams consolidate 3+ provider contracts behind one surface
  • Pilot compares cost/quality on real production prompts

Initiative playbook

Typical delivery arc for this pattern in enterprise programs.

  1. 1
    Discovery2 to 4 wks

    Inventory model contracts, latency SLOs, and cost caps per workload (chat vs batch).

  2. 2
    Pilot6 to 8 wks

    Route 10% traffic through gateway; run weekly side-by-side eval on production prompts.

  3. 3
    Scaleongoing

    Enforce routing policies, budgets, and automatic failover in platform IaC.

Business use case

Problem

Large enterprises often hold multiple model contracts (OpenAI, Anthropic, open-weight hosts). Application teams duplicate SDK wiring, auth, and routing logic; finance lacks unified spend visibility.

Who benefits

  • AI platform engineering, one gateway, many models
  • Product owners, evidence-based model selection from real prompts
  • FinOps, routing policies tied to cost and latency SLOs

Success metrics

  • Reduce provider-specific SDK integrations by consolidating behind gateway
  • Side-by-side eval on 50+ production prompts before declaring a default model
  • Document p95 latency delta between candidates for UX-sensitive flows

Solution

Send the same user prompt to two models through Vercel AI Gateway. The demo records wall-clock latency and displays responses side by side, how platform teams run structured bake-offs during transformation.

Technical implementation

Stack

  • AI SDK generateText (non-streaming compare for simplicity)
  • createOpenAI with baseURL: https://ai-gateway.vercel.sh/v1
  • Model list from AI_GATEWAY_COMPARE_MODELS (comma-separated provider/model IDs)

Architecture

One prompt, two models, measured side by side, how platform teams shortlist a default model without rewiring app code.

How it runs
Drawing the flow…

Implementation highlights

  • Promise.all runs models in parallel for fair latency comparison
  • Gateway model strings use provider/model format (e.g. openai/gpt-4o-mini)
  • No provider keys in app code, only AI_GATEWAY_API_KEY on the server

Outcomes and learnings

  • Gateway abstracts auth; apps pass model IDs, not raw vendor keys
  • Latency variance often dominates UX more than benchmark leaderboard rank
  • Bake-offs should use production-shaped prompts, not generic trivia

Where else this applies

Model bake-offs show up anywhere teams argue about default models, failover, or cost, and need evidence instead of vendor slides.

Customer-facing chat default

Pick the model that balances empathy and latency for consumer support, not the one that wins on coding benchmarks.

Batch summarisation

Route long documents to a larger model and quick classifiers to a mini model through one integration surface.

Regulated vs internal tiers

External channel uses approved models; internal sandbox can trial new IDs without app redeploys.

Failover drills

Prove secondary model quality when a primary region or provider has an outage.

Using this stack elsewhere

Vercel AI Gateway centralizes provider keys and routing; the same compare route can run in CI on golden prompts or in an internal admin UI for platform owners.

Live demo

The demo is the same code path described above, not a simplified mock UI. Add keys in .env.local when you are ready; the narrative and diagrams stand on their own without them.

Business

Same prompt, two models, side-by-side latency, useful when procurement already has three LLM contracts and someone has to pick a default.

Technical

Parallel generateText calls through the Vercel AI Gateway; only AI_GATEWAY_API_KEY on the server.

AI Gateway model comparison

Same prompt, two models via Vercel AI Gateway, compare latency and output.

Live