AI Labs
AI Labs

Examples you can run, not just slide decks

Patterns, metrics, and runnable demos for architecture reviews and pilots, from The Ops Toolbox.

25 worked patterns and 15 decision guides for architecture reviews, steering forums, and six-week pilots.

25

Worked examples

Agents, RAG, HITL, observability, orchestration

15

Decision guides

Program, architecture, governance

4

Cloud stacks

Vercel, Azure, AWS, Claude

Built for governed programmes

Escalation before writes, cite-only RAG, evals in CI, and honest configure-keys demos when API keys are missing.

Featured patterns

Runnable reference builds we use in workshops and pilot squads

All examples →

Streaming agent on Vercel AI SDK with tool calling for CRM lookup and ticket creation.

Browser streams UI messages with sessionId to the route handler

AgentsStreamingHuman escalation
View case study & demo →

Compare latency and responses across models through a single gateway endpoint.

Same prompt POSTed once to the compare route

Model routingStreaming
View case study & demo →

Knowledge Q&A using Azure OpenAI and Azure AI Search over seeded policy documents.

Question sent to searchKnowledge (AI Search or local seed fallback)

RAGEnterprise
View case study & demo →

Conversational demo using Bedrock Converse API with optional Knowledge Base retrieval.

Question enriched with seed context snippets

RAGEnterprise
View case study & demo →

Extract initiative risks, stakeholders, and systems from unstructured text using schema-constrained generation.

Discovery notes POSTed as plain text

Structured extractionEnterprise
View case study & demo →

A pay-per-invocation durable workflow that orchestrates a multi-step transformation intake pipeline.

POST starts a workflow run with start()

OrchestrationEnterprise
View case study & demo →

A governed orchestration pipeline: retrieve context, generate a grounded answer, then safety-score the output.

Retrieve top-k policy chunks (Search or seed fallback)

OrchestrationGovernanceEnterprise
View case study & demo →

Uses Bedrock Agent Runtime when configured, otherwise runs a multi-step orchestration (context → intent → answer) via Converse.

If agent env is set: invoke Agent Runtime and stream completion

OrchestrationEnterprise
View case study & demo →

Keyword retrieval over in-app policy seeds, then grounded answers via the Vercel AI SDK.

Question triggers keyword search over SEED_DOCUMENTS

RAGGovernance
View case study & demo →

Agent proposes CRM writes and refunds; supervisors approve or reject before anything executes.

Scenario analysed with generateObject for action proposal

AgentsGovernanceHuman escalation
View case study & demo →

Structured telemetry on every request: trace id, latency breakdown, tokens, retrieval hits, and cost estimate.

Optional seed retrieval timed separately from model call

GovernancePrompt evaluationRAG
View case study & demo →

Representative outcomes

Anonymized composites from assessment and pilot engagements, not client logos

Outcomes we repeat

Patterns that show up across engagements, with metrics sponsors and risk teams recognise

Architecture review or 6-week pilot

Regulated policy Q&A with citations

Financial services, insurance, and large HR policy estates

  • Citation rate above 90% on an agreed golden question set
  • Documented unknown-answer path when retrieval is weak
  • Quantified lift vs general chat baseline for audit

Pilot squad with steering forum every week

Operations copilot with human approval

Support, ITSM, and CRM-adjacent workflows

  • Read-only tools in pilot; writes only after supervisor approve
  • Handle time improvement on covered intents with quality sampling
  • Incident and override metrics in the same dashboard as cost

1 to 2 week assessment

Portfolio prioritisation and council operating model

Medium and large programmes with many AI ideas

  • Single intake scorecard and capped active pilots
  • Named champions with protected time and enablement kit
  • Scale / pivot / stop decisions documented per pilot

Architecture review then pilot on one squad

Platform standards and gateway routing

CTO office standardising models, logging, and cost

  • Default model per task type with documented fallback
  • Structured logs and eval CI on prompt or index change
  • Cost per successful task visible to finance monthly

2 to 3 week review alongside build team

Security and privacy gate before production

InfoSec review of agent or RAG going live

  • Control-to-artefact pack accepted by risk (diagrams, logs, evals)
  • Pen test on tool endpoints, not chat UI only
  • Runbook drill for disable-tools and human fallback

Workshop plus architecture review

Copilot coexistence and custom systems of record

Microsoft-centric enterprises with M365 Copilot licensed

  • Channel matrix: Copilot vs custom app vs human queue
  • Custom work scoped to CRM/ITSM writes and cite-only corpora
  • Aligned retention and safety rules across channels

Advisory at a glance

Services and engagement shapes backed by the catalogue on this site

Next step

Talk about your next pilot

Patterns, metrics, and runnable demos for architecture reviews and pilots, from The Ops Toolbox.

Prefer the web form? The Ops Toolbox.

  • One workflow, clear metrics
  • Your cloud, your keys
  • Written handoff, not dependency