AI Labs

AI Labs by The Ops Toolbox

Anti-patterns

What we see fail in the wild, and what to do instead. Use in steering forums when someone proposes the left column.

Big-bang platform before one workflow

Symptom

Six-month “AI platform” RFP before any pilot metrics

No shared definition of success; risk and engineering block each other without a reference path.

Better move

One six-week pilot on a single workflow with stop rules.

Write tools on day one

Symptom

Agent can create tickets, refunds, or emails immediately

One bad tool call becomes a customer incident; trust collapses faster than demos impress.

Better move

Read-only tools, then human approval gates, then narrow writes.

RAG without citations

Symptom

Chatbot answers policy questions fluently with no sources

Legal and compliance cannot defend answers in audit; hallucinations look authoritative.

Better move

Cite-only prompts, source metadata in UI, unknown-answer when retrieval is empty.

Prompt folklore instead of evals

Symptom

“We changed the system prompt in prod on Friday”

No regression detection; quality drifts silently across teams.

Better move

Golden-set rubric evals before merge; version prompts like code.

Ignoring Copilot and M365

Symptom

Custom app duplicates summarise-my-email

Duplicated spend and user confusion; procurement asks why Copilot exists.

Better move

Document coexistence; custom apps own system-of-record workflows.

No observability

Symptom

Only metric is “users tried the chat”

Cannot debug incidents, tune cost, or prove value to finance.

Better move

Log model, latency, tokens, retrieval, safety per session.

Fine-tune first

Symptom

Week one proposal: fine-tune on last year’s tickets

Expensive refresh cycle; knowledge stale on next policy change.

Better move

RAG + schema extraction; fine-tune only with clear evidence.

All decision guides

Next step

Talk about your next pilot

Patterns, metrics, and runnable demos for architecture reviews and pilots, from The Ops Toolbox.

Prefer the web form? The Ops Toolbox.

  • One workflow, clear metrics
  • Your cloud, your keys
  • Written handoff, not dependency