Responsible AI Gate with Azure Content Safety

Case studyArchitecture, governance, and how to adapt this pattern in a pilot

Business use case

Problem

Customer-facing assistants must pass risk review before launch. Legal expects category-level scores (hate, violence, self-harm, sexual), not a single opaque “blocked” flag.

Who benefits

Risk & compliance, measurable thresholds per channel
Customer experience, professional drafts with safety envelope
Engineering, repeatable pre/post checks in the API path

Success metrics

Block or queue severity ≥ 4 for human review
Weekly sample review of false positives during pilot
Zero production launch without dual input/output analysis

Solution

Run Azure AI Content Safety on the user prompt, generate with Azure OpenAI, then analyse the model output again. The UI shows per-category severity so reviewers can calibrate thresholds.

Technical implementation

Stack

Azure OpenAI for generation
Content Safety REST API (text:analyze) when endpoint/key configured
Heuristic fallback for local demo without Safety resources

Architecture

Score input and output around the model call; hosted demo returns all scores for review.

How it runs

Drawing the flow…

Implementation highlights

lib/demos/content-safety.ts centralizes REST calls and fallback
Severity badges in UI use colour + label (not colour alone) for accessibility
Thresholds should differ between internal copilot vs customer-facing channels

Outcomes and learnings

Score input and output; output-only misses injection-style prompts
Pair automation with a human review queue for edge cases
Document threshold changes per environment (dev/stage/prod)

Delivery playbookDiscovery → pilot → scale

1
Discovery2–4 wks
Align with legal on category thresholds for customer-facing vs internal assistants.
2
Pilot6–8 wks
Block-and-review queue for severity ≥ 4; sample 200 conversations/week for false positives.
3
Scaleongoing
Integrate safety scores into CI/CD prompt tests and production observability dashboards.

Where else this appliesInput and output safety gates belong in any customer-facing or brand-sensitive channel, not only public chatbots.

Marketing copy assistants

Scan drafts for violence, hate, or sexual content before publishing workflows continue.

Community forums

Score posts and model replies before they appear to other users.

HR employee chat

Detect harassment or self-harm indicators with escalation to trained responders.

Partner-facing portals

Enforce brand-safe responses when partners ask open-ended product questions.

Azure Content Safety integrates with Foundry deployments; scores can drive block, rewrite, or human-review queues in the same region as your OpenAI resource.