Azure OpenAI Chat Baseline

Case studyArchitecture, governance, and how to adapt this pattern in a pilot

Business use case

Problem

Teams jump straight to RAG. Without a baseline, you cannot tell whether retrieval helped, or whether the model already knew the answer (or hallucinated convincingly).

Who benefits

Architecture reviewers, compare baseline vs RAG vs orchestration on the same questions
Compliance, separate “general chat” from “citation-required” channels
Foundry admins, prove OpenAI connectivity before indexing corpora

Success metrics

Baseline hallucination rate measured on policy questions (expect non-zero)
Clear product rule: which channels may use baseline vs governed RAG
Deployment and region documented for audit

Solution

A minimal Azure OpenAI chat route with a system prompt that steers policy questions toward the governed RAG example, showing how to position baseline chat without blending modes.

Technical implementation

Architecture

Simple chat completion, use as a control when measuring RAG lift.

How it runs

Drawing the flow…

Outcomes and learnings

If baseline answers “good enough,” you still may need RAG for citations, not for fluency.

Document which workloads are allowed on baseline vs governed paths
Use the same deployment name in RAG and chat routes for operational simplicity
Add content safety and retrieval before external-facing channels

Delivery playbookDiscovery → pilot → scale

1
Discovery2–4 wks
List channels allowed on baseline vs governed RAG; define hallucination sampling process.
2
Pilot6–8 wks
Run baseline vs RAG on the same 50 questions; quantify citation lift.
3
Scaleongoing
Restrict baseline to low-risk internal tasks; enforce Foundry project boundaries.

Where else this appliesA deliberate non-RAG baseline helps you measure when retrieval is worth the operational cost, and keeps general Q&A channels honest.

Brainstorming workshops

Facilitators use general chat for ideation where citations are not required.

Drafting starting points

Comms teams outline emails or talking points before legal moves them to governed RAG for fact checks.

Developer explainers

Platform teams answer “how does X work?” questions that do not need document grounding.

RAG lift measurement

Run identical policy questions through baseline vs RAG to quantify hallucination reduction.

Use the same Azure OpenAI deployment and monitoring as RAG paths so comparisons are fair; Foundry projects can host both experiences side by side.