AI Labs by The Ops Toolbox
Anti-patterns
What we see fail in the wild, and what to do instead. Use in steering forums when someone proposes the left column.
Big-bang platform before one workflow
Symptom
Six-month “AI platform” RFP before any pilot metrics
No shared definition of success; risk and engineering block each other without a reference path.
Better move
One six-week pilot on a single workflow with stop rules.
Write tools on day one
Symptom
Agent can create tickets, refunds, or emails immediately
One bad tool call becomes a customer incident; trust collapses faster than demos impress.
Better move
Read-only tools, then human approval gates, then narrow writes.
RAG without citations
Symptom
Chatbot answers policy questions fluently with no sources
Legal and compliance cannot defend answers in audit; hallucinations look authoritative.
Better move
Cite-only prompts, source metadata in UI, unknown-answer when retrieval is empty.
Prompt folklore instead of evals
Symptom
“We changed the system prompt in prod on Friday”
No regression detection; quality drifts silently across teams.
Better move
Golden-set rubric evals before merge; version prompts like code.
Ignoring Copilot and M365
Symptom
Custom app duplicates summarise-my-email
Duplicated spend and user confusion; procurement asks why Copilot exists.
Better move
Document coexistence; custom apps own system-of-record workflows.
No observability
Symptom
Only metric is “users tried the chat”
Cannot debug incidents, tune cost, or prove value to finance.
Better move
Log model, latency, tokens, retrieval, safety per session.
Fine-tune first
Symptom
Week one proposal: fine-tune on last year’s tickets
Expensive refresh cycle; knowledge stale on next policy change.
Better move
RAG + schema extraction; fine-tune only with clear evidence.
Next step
Talk about your next pilot
Patterns, metrics, and runnable demos for architecture reviews and pilots, from The Ops Toolbox.
Prefer the web form? The Ops Toolbox.
- One workflow, clear metrics
- Your cloud, your keys
- Written handoff, not dependency