AI LabsAdvisory evidence base

Examples you can run, not slide decks

Worked examples and decision guides you can run in reviews and pilots — an evidence base from The Ops Toolbox. 25 worked examples and 15 decision guides for your next review or pilot.

AI SDK

Azure AI Foundry

AWS

Claude

Start here Browse examples Guides

Built for governed programmes: human approval before changes, answers tied to sources, and designs you can defend in review.

Transformation intakeLive

TryYou're intake PM for a transformation brief — watch it move from summary to risks to a recommendation your steering group can review.

Open demo

Worked examples

Assistants, document Q&A, human approval, monitoring, workflows

Decision guides

Program, architecture, governance

Cloud stacks

AWS, Azure, AI SDK, and Claude

Popular decision guides

Plain-language summaries and technical depth for steering forums and architecture reviews

All guides →

program

AI councils & champion programs

How to stand up a cross-functional AI council, a distributed champion network, and an operating rhythm that scales pilots without losing control.

Read guide

program

Scoping a six-week pilot

How to pick one workflow, define stop rules, and leave with metrics executives will fund, pivot, or kill.

Read guide

program

Measuring AI success

Business KPIs and technical signals that belong in the same dashboard, so finance and engineering do not talk past each other.

Read guide

architecture

RAG vs fine-tuning

When retrieval is enough, when custom weights earn their cost, and how to compare both with the same golden questions.

Read guide

Featured patterns

Reference builds you can run in workshops and pilot squads

All examples →

AI SDK

OpenAI

Live demo

Operational Copilot with Streaming Tools

Operations staff get live help with customer lookup and ticket creation — with escalation when needed.

AssistantsLive responsesHuman escalation

View case study & demo

Operational copilotLive

UserSummarise the change request and flag policy gaps.

AgentDraft ready. Escalation required before write.

Live assistant with supervisor handoff

Open demo

AI SDK

OpenAI

Live demo

Multi-model routing via unified gateway

Send the same request to different models and compare speed and quality through one endpoint.

Same question sent to two models for a side-by-side comparison

Model routingLive responses

View case study & demo

Azure AI Foundry

OpenAI

Live demo

Enterprise RAG on Azure AI Foundry

Staff ask policy questions and get answers backed by approved documents.

Question searches approved documents before the model drafts an answer

Document Q&AEnterprise

View case study & demo

AWS

Live demo

AWS-Native Q&A with Amazon Bedrock

In-account Q&A using your knowledge base when configured, or seed content for early pilots.

Question answered with sample documents or your knowledge base when configured

Document Q&AEnterprise

View case study & demo

Claude

Live demo

Transformation Discovery with Claude Structured Output

Turn workshop notes into structured initiative risks, stakeholders, and systems.

Workshop notes turned into a structured charter for human review

Structured extractionEnterprise

View case study & demo

AI SDK

Live demo

Durable multi-step orchestration

A reliable multi-step workflow for transformation intake — from brief to recommendation.

Paste a brief and start a multi-step workflow on demand

Multi-step workflowsEnterprise

View case study & demo

Azure AI Foundry

OpenAI

Live demo

Agent Orchestration on Azure AI Foundry

Answer questions in three governed steps: find context, draft answer, check safety.

Find relevant policy excerpts, then draft and safety-check the answer

Multi-step workflowsGovernanceEnterprise

View case study & demo

AWS

Live demo

Orchestrated Agents on AWS Bedrock

Multi-step Q&A on AWS — full agent mode when configured, honest fallback pipeline when not.

Full agent when configured; otherwise a visible retrieve → classify → answer pipeline

Multi-step workflowsEnterprise

View case study & demo

AI SDK

OpenAI

Live demo

Policy Q&A with seed RAG

Ask policy questions and get grounded answers from in-app documents — with citations.

Policy question answered from in-app documents with citations

Document Q&AGovernance

View case study & demo

AI SDK

OpenAI

Live demo

Human-in-the-Loop Approval Gates

Supervisors approve or reject CRM updates and refunds before anything runs.

Assistant proposes an action; supervisor approves or rejects first

AssistantsGovernanceHuman escalation

View case study & demo

AI SDK

OpenAI

Live demo

Production Observability for AI Routes

See latency, tokens, matched documents, and rough cost on every request — ready for SRE and finance.

Every request returns latency, tokens, matches, and rough cost

GovernanceQuality checksDocument Q&A

View case study & demo

Representative outcomes

Anonymised composites from assessments and pilots — representative stories, not client logos

Software · Medium

B2B SaaS · ~400 employees

Support leaders wanted a copilot without granting write access to billing on day one.

Outcomes

Pilot handle-time down ~18% on covered intents (internal baseline)
Zero autonomous billing writes in production path
Board one-pager tied to metrics, not model names

Regulated · Large

Financial services · ~2,500 employees

Policy Q&A had to cite approved documents and fail closed when retrieval was weak.

Outcomes

Citation rate >92% on golden question set
Documented when to route users to human policy reviewers
Document Q&A lift quantified vs general chat baseline

Operations / logistics tech · Small

Scale-up · ~90 employees

Founder-led AI ideas with no platform team; needed one shippable slice and a stop rule.

Outcomes

First demo to steering forum in under three weeks
Clear go/no-go on vector index spend
Internal champion trained on runbook and eval rubric

Outcomes we repeat

Results we see repeatedly — with metrics sponsors and risk teams already track

Architecture review or 6-week pilot

Regulated policy Q&A with source citations

Financial services, insurance, and large HR policy estates

Citation rate above 90% on an agreed golden question set
Documented unknown-answer path when retrieval is weak
Quantified lift vs general chat baseline for audit

Pilot squad with steering forum every week

Operations assistant with human approval

Support, ITSM, and CRM-adjacent workflows

Read-only access in pilot; updates only after supervisor approval
Handle time improvement on covered intents with quality sampling
Incident and override metrics in the same dashboard as cost

1 to 2 week assessment

Portfolio prioritisation and council operating model

Medium and large programmes with many AI ideas

Single intake scorecard and capped active pilots
Named champions with protected time and enablement kit
Scale / pivot / stop decisions documented per pilot

Architecture review then pilot on one squad

Platform standards and model routing

CTO office standardising models, logging, and cost

Default model per task type with documented fallback
Structured logs and eval CI on prompt or index change
Cost per successful task visible to finance monthly

2 to 3 week review alongside build team

Security and privacy gate before production

InfoSec review before an AI assistant or document Q&A goes live

Evidence pack accepted by risk (diagrams, logs, quality checks)
Pen test on tool endpoints, not chat UI only
Runbook drill for disable-tools and human fallback

Workshop plus architecture review

Copilot coexistence and custom systems of record

Microsoft-centric enterprises with M365 Copilot licensed

Channel matrix: Copilot vs custom app vs human queue
Custom work scoped to CRM/ITSM updates and cite-only document sets
Aligned retention and safety rules across channels

Advisory at a glance

How we work — backed by the examples and guides on this site

Plan your next pilot

Worked examples and decision guides you can run in reviews and pilots — an evidence base from The Ops Toolbox.

Email NedContact form on The Ops Toolbox

Prefer the web form? The Ops Toolbox.

One workflow, clear metrics
Your cloud, your keys
Written handoff, not dependency

Examples you can run, not slide decks

Popular decision guides

AI councils & champion programs

Scoping a six-week pilot

Measuring AI success

RAG vs fine-tuning

Featured patterns

Representative outcomes

Outcomes we repeat

Regulated policy Q&A with source citations

Operations assistant with human approval

Portfolio prioritisation and council operating model

Platform standards and model routing

Security and privacy gate before production

Copilot coexistence and custom systems of record

Advisory at a glance

Types of work

Start here

How we help

About

Plan your next pilot