Target outcomes
- Platform teams consolidate 3+ provider contracts behind one surface
- Pilot compares cost/quality on real production prompts
Initiative playbook
Typical delivery arc for this pattern in enterprise programs.
- 1Discovery2 to 4 wks
Inventory model contracts, latency SLOs, and cost caps per workload (chat vs batch).
- 2Pilot6 to 8 wks
Route 10% traffic through gateway; run weekly side-by-side eval on production prompts.
- 3Scaleongoing
Enforce routing policies, budgets, and automatic failover in platform IaC.
Business use case
Problem
Large enterprises often hold multiple model contracts (OpenAI, Anthropic, open-weight hosts). Application teams duplicate SDK wiring, auth, and routing logic; finance lacks unified spend visibility.
Who benefits
- AI platform engineering, one gateway, many models
- Product owners, evidence-based model selection from real prompts
- FinOps, routing policies tied to cost and latency SLOs
Success metrics
- Reduce provider-specific SDK integrations by consolidating behind gateway
- Side-by-side eval on 50+ production prompts before declaring a default model
- Document p95 latency delta between candidates for UX-sensitive flows
Solution
Send the same user prompt to two models through Vercel AI Gateway. The demo records wall-clock latency and displays responses side by side, how platform teams run structured bake-offs during transformation.
Technical implementation
Stack
- AI SDK
generateText(non-streaming compare for simplicity) - createOpenAI with
baseURL: https://ai-gateway.vercel.sh/v1 - Model list from
AI_GATEWAY_COMPARE_MODELS(comma-separated provider/model IDs)
Architecture
One prompt, two models, measured side by side, how platform teams shortlist a default model without rewiring app code.
Implementation highlights
Promise.allruns models in parallel for fair latency comparison- Gateway model strings use
provider/modelformat (e.g.openai/gpt-4o-mini) - No provider keys in app code, only
AI_GATEWAY_API_KEYon the server
Outcomes and learnings
- Gateway abstracts auth; apps pass model IDs, not raw vendor keys
- Latency variance often dominates UX more than benchmark leaderboard rank
- Bake-offs should use production-shaped prompts, not generic trivia
Where else this applies
Model bake-offs show up anywhere teams argue about default models, failover, or cost, and need evidence instead of vendor slides.
Customer-facing chat default
Pick the model that balances empathy and latency for consumer support, not the one that wins on coding benchmarks.
Batch summarisation
Route long documents to a larger model and quick classifiers to a mini model through one integration surface.
Regulated vs internal tiers
External channel uses approved models; internal sandbox can trial new IDs without app redeploys.
Failover drills
Prove secondary model quality when a primary region or provider has an outage.
Using this stack elsewhere
Vercel AI Gateway centralizes provider keys and routing; the same compare route can run in CI on golden prompts or in an internal admin UI for platform owners.
Live demo
The demo is the same code path described above, not a simplified mock UI. Add keys in .env.local when you are ready; the narrative and diagrams stand on their own without them.
Business
Same prompt, two models, side-by-side latency, useful when procurement already has three LLM contracts and someone has to pick a default.
Technical
Parallel generateText calls through the Vercel AI Gateway; only AI_GATEWAY_API_KEY on the server.