Target outcomes
- Block severity ≥ 4 pending human review
- False positive review sampled weekly in pilot
Initiative playbook
Typical delivery arc for this pattern in enterprise programs.
- 1Discovery2 to 4 wks
Align with legal on category thresholds for customer-facing vs internal assistants.
- 2Pilot6 to 8 wks
Block-and-review queue for severity ≥ 4; sample 200 conversations/week for false positives.
- 3Scaleongoing
Integrate safety scores into CI/CD prompt tests and production observability dashboards.
Business use case
Problem
Customer-facing assistants must pass risk review before launch. Legal expects category-level scores (hate, violence, self-harm, sexual), not a single opaque “blocked” flag.
Who benefits
- Risk & compliance, measurable thresholds per channel
- Customer experience, professional drafts with safety envelope
- Engineering, repeatable pre/post checks in the API path
Success metrics
- Block or queue severity ≥ 4 for human review
- Weekly sample review of false positives during pilot
- Zero production launch without dual input/output analysis
Solution
Run Azure AI Content Safety on the user prompt, generate with Azure OpenAI, then analyse the model output again. The UI shows per-category severity so reviewers can calibrate thresholds.
Technical implementation
Stack
- Azure OpenAI for generation
- Content Safety REST API (
text:analyze) when endpoint/key configured - Heuristic fallback for local demo without Safety resources
Architecture
Safety gates sit on both sides of the model call, bad prompts never reach the LLM; risky outputs never reach the user raw.
Implementation highlights
lib/demos/content-safety.tscentralizes REST calls and fallback- Severity badges in UI use colour + label (not colour alone) for accessibility
- Thresholds should differ between internal copilot vs customer-facing channels
Outcomes and learnings
- Score input and output; output-only misses injection-style prompts
- Pair automation with a human review queue for edge cases
- Document threshold changes per environment (dev/stage/prod)
Where else this applies
Input and output safety gates belong in any customer-facing or brand-sensitive channel, not only public chatbots.
Marketing copy assistants
Scan drafts for violence, hate, or sexual content before publishing workflows continue.
Community forums
Score posts and model replies before they appear to other users.
HR employee chat
Detect harassment or self-harm indicators with escalation to trained responders.
Partner-facing portals
Enforce brand-safe responses when partners ask open-ended product questions.
Using this stack elsewhere
Azure Content Safety integrates with Foundry deployments; scores can drive block, rewrite, or human-review queues in the same region as your OpenAI resource.
Live demo
The demo is the same code path described above, not a simplified mock UI. Add keys in .env.local when you are ready; the narrative and diagrams stand on their own without them.
Business
See scores on the way in and on the way out, helpful when legal asks “how do we know the model didn’t just freestyle?”
Technical
Content Safety REST on prompt and completion; Azure OpenAI sits in the middle.