AI Labs
All examples

Responsible AI Gate with Azure Content Safety

Generate responses with Azure OpenAI while scoring prompts and outputs for harmful content categories.

GovernanceEnterprise

Target outcomes

  • Block severity ≥ 4 pending human review
  • False positive review sampled weekly in pilot

Initiative playbook

Typical delivery arc for this pattern in enterprise programs.

  1. 1
    Discovery2 to 4 wks

    Align with legal on category thresholds for customer-facing vs internal assistants.

  2. 2
    Pilot6 to 8 wks

    Block-and-review queue for severity ≥ 4; sample 200 conversations/week for false positives.

  3. 3
    Scaleongoing

    Integrate safety scores into CI/CD prompt tests and production observability dashboards.

Business use case

Problem

Customer-facing assistants must pass risk review before launch. Legal expects category-level scores (hate, violence, self-harm, sexual), not a single opaque “blocked” flag.

Who benefits

  • Risk & compliance, measurable thresholds per channel
  • Customer experience, professional drafts with safety envelope
  • Engineering, repeatable pre/post checks in the API path

Success metrics

  • Block or queue severity ≥ 4 for human review
  • Weekly sample review of false positives during pilot
  • Zero production launch without dual input/output analysis

Solution

Run Azure AI Content Safety on the user prompt, generate with Azure OpenAI, then analyse the model output again. The UI shows per-category severity so reviewers can calibrate thresholds.

Technical implementation

Stack

  • Azure OpenAI for generation
  • Content Safety REST API (text:analyze) when endpoint/key configured
  • Heuristic fallback for local demo without Safety resources

Architecture

Safety gates sit on both sides of the model call, bad prompts never reach the LLM; risky outputs never reach the user raw.

How it runs
Drawing the flow…

Implementation highlights

  • lib/demos/content-safety.ts centralizes REST calls and fallback
  • Severity badges in UI use colour + label (not colour alone) for accessibility
  • Thresholds should differ between internal copilot vs customer-facing channels

Outcomes and learnings

  • Score input and output; output-only misses injection-style prompts
  • Pair automation with a human review queue for edge cases
  • Document threshold changes per environment (dev/stage/prod)

Where else this applies

Input and output safety gates belong in any customer-facing or brand-sensitive channel, not only public chatbots.

Marketing copy assistants

Scan drafts for violence, hate, or sexual content before publishing workflows continue.

Community forums

Score posts and model replies before they appear to other users.

HR employee chat

Detect harassment or self-harm indicators with escalation to trained responders.

Partner-facing portals

Enforce brand-safe responses when partners ask open-ended product questions.

Using this stack elsewhere

Azure Content Safety integrates with Foundry deployments; scores can drive block, rewrite, or human-review queues in the same region as your OpenAI resource.

Live demo

The demo is the same code path described above, not a simplified mock UI. Add keys in .env.local when you are ready; the narrative and diagrams stand on their own without them.

Business

See scores on the way in and on the way out, helpful when legal asks “how do we know the model didn’t just freestyle?”

Technical

Content Safety REST on prompt and completion; Azure OpenAI sits in the middle.

Content safety gate

Analyse prompt and model output with Azure Content Safety before display.

Live