Multi-model routing via unified gateway

Case studyArchitecture, governance, and how to adapt this pattern in a pilot

Business use case

Problem

Large enterprises often hold multiple model contracts (OpenAI, Anthropic, open-weight hosts). Application teams duplicate SDK wiring, auth, and routing logic; finance lacks unified spend visibility.

Who benefits

AI platform engineering, one gateway, many models
Product owners, evidence-based model selection from real prompts
FinOps, routing policies tied to cost and latency SLOs

Success metrics

Reduce provider-specific SDK integrations by consolidating behind gateway
Side-by-side eval on 50+ production prompts before declaring a default model
Document p95 latency delta between candidates for UX-sensitive flows

Solution

Send the same user prompt to two models through unified model gateway. The demo records wall-clock latency and displays responses side by side, how platform teams run structured bake-offs during transformation.

Technical implementation

Stack

AI SDK generateText (non-streaming compare for simplicity)
createOpenAI with baseURL: https://ai-gateway.vercel.sh/v1
Compare set configured on the server (which models to run side by side)

Architecture

One prompt, two models, measured side by side, how platform teams shortlist a default model without rewiring app code.

How it runs

Drawing the flow…

Implementation highlights

Promise.all runs models in parallel for fair latency comparison
Gateway model strings use provider/model format (e.g. openai/gpt-4o-mini)
No provider keys in app code — hosted gateway credentials on the server only

Outcomes and learnings

Gateway abstracts auth; apps pass model IDs, not raw vendor keys
Latency variance often dominates UX more than benchmark leaderboard rank
Bake-offs should use production-shaped prompts, not generic trivia

Delivery playbookDiscovery → pilot → scale

1
Discovery2–4 wks
Inventory model contracts, latency SLOs, and cost caps per workload (chat vs batch).
2
Pilot6–8 wks
Route 10% traffic through gateway; run weekly side-by-side eval on production prompts.
3
Scaleongoing
Enforce routing policies, budgets, and automatic failover in platform IaC.

Where else this appliesModel bake-offs show up anywhere teams argue about default models, failover, or cost, and need evidence instead of vendor slides.

Customer-facing chat default

Pick the model that balances empathy and latency for consumer support, not the one that wins on coding benchmarks.

Batch summarisation

Route long documents to a larger model and quick classifiers to a mini model through one integration surface.

Regulated vs internal tiers

External channel uses approved models; internal sandbox can trial new IDs without app redeploys.

Failover drills

Prove secondary model quality when a primary region or provider has an outage.

unified model gateway centralizes provider keys and routing; the same compare route can run in CI on golden prompts or in an internal admin UI for platform owners.