For business sponsors
Executive summary
Skim this before the full guide. Technical detail follows in the sections below.
- Decision
- Whether InfoSec has enough artefacts to approve production.
- Primary metric
- All controls in the matrix have a linked diagram, config, or log sample.
- Stop rule
- Do not launch until open high findings have owner and target date.
Related worked example
Responsible AI Gate with Azure Content SafetyNeed facilitation on this topic? Start a conversation.
Controls vs evidence
InfoSec (OWASP LLM Top 10) reviews ask for proof that controls exist, not assertions that the team is responsible.
The AI security controls guide (controls guide) defines what to implement. This guide packages what to submit so reviewers can trace control to artefact quickly.
Treat the pack as a living folder updated each release, not a one-off PowerPoint before production readiness conversation go-live.
Sponsors should know which artefacts gate production promotion in the steering committee (AI council guide).
- Control statement in one sentence
- Owner role and escalation contact
- Artefact file name and last updated date
- Link to demo or log sample
- Exception register if control is partial
Questions you should expect
Reviewers will ask about data flows, subprocessors, authentication, logging, prompt injection (OpenAI mitigations), tool abuse, model change management, and incident response.
Prepare short written answers with diagrams attached. Verbal assurances do not survive turnover in security teams.
Procurement (production readiness conversation) may join with contract questions on training use and retention. Align answers with the executed DPA (Microsoft Copilot data protection).
Schedule a dry run with your internal security partner before external review to catch missing artefacts early—use the evidence pack checklist as a script.
- Where do prompts and outputs persist?
- Who can access logs and indexes?
- How are API keys (OWASP LLM Top 10) stored and rotated?
- What happens if retrieval returns nothing?
- How are models and prompts versioned?
- What is the disable-tools procedure?
Architecture artefacts
Provide a data-flow diagram (Microsoft Copilot data protection), network diagram where applicable, IAM matrix (OWASP LLM Top 10), and list of third-party APIs.
Link each control on the diagram to an OWASP LLM control or redacted log sample from your pilot environment.
Show trust boundaries between user browser, application, model API, search index, and approval queue (OpenAI safety best practices).
Version diagrams when you add tools or change regions. Reviewers compare submissions across releases.
- Context diagram with actors and systems
- Sequence diagram for propose-approve-execute
- IAM matrix (OWASP LLM Top 10): role, scope, data access
- subprocessor (production readiness conversation) and region table
- Environment separation: dev, test, prod
Identity and access evidence
Demonstrate SSO (Entra conditional access) with Entra ID (conditional access) or your corporate IdP, least-privilege app roles, and no shared admin keys in application code.
Export a redacted sample of authentication logs showing user identity on each model call where applicable.
Document break-glass access for engineers and how it is time boxed and reviewed.
Search indexes must respect the same entitlements as source systems per RAG ACL guidance. Include a test case in the pack.
- SSO (Entra conditional access) configuration screenshot or runbook excerpt
- Role matrix mapped to business functions
- Sample JWT claims in logs, redacted
- Key rotation calendar and last run date
- Access review ticket closed this quarter
Data classification and retention
State which data classes may enter prompts and which are prohibited, with examples HR and legal agree on.
Attach retention schedules for prompts, outputs, logs, embeddings (OpenAI embeddings guide), and approval records.
Describe deletion and export procedures for subject rights requests.
If you redact before model call, show before and after samples in the pack with fields removed—reference Azure Content Safety.
- data classification (Microsoft Copilot data protection) policy excerpt
- Retention table per store
- Redaction rules for PII and secrets
- Index rebuild process when sources delete content
- Backup and restore test date
Safety and abuse
Document input and output filtering, rate limits, and tool allow lists.
Run a lightweight penetration test (OWASP LLM Top 10) on tool endpoints, not only the chat user interface.
Include prompt injection (OpenAI mitigations) test results and what changed after remediation.
Show human approval samples for write actions with audit trail exports from OpenAI safety best practices.
- Content filter configuration export
- Rate limit and abuse alert thresholds
- Tool allow list file in version control
- Red-team summary with severity counts
- HITL (OpenAI safety best practices) approval screenshot and log extract
Control-to-artefact matrix
Attach one primary artefact per control so reviewers do not chase scattered Confluence pages.
Use a spreadsheet or table with columns for control ID, owner, artefact link, and test date.
Mark partial controls honestly with compensating measures and target dates.
Steering committees promote pilots when critical controls are green or accepted as risk with owner per NIST AI RMF Govern process.
- Identity: SSO (Entra conditional access) config, role matrix, sample auth logs
- Data: classification policy, index ACL (Azure RAG concepts) diagram, retention schedule (Microsoft Copilot data protection)
- Injection and tools: red-team summary, allow list export, HITL (OpenAI safety best practices) audit sample
- Runtime: dashboard link, rate-limit config, incident runbook drill date
- Change: CI eval report, model version (production readiness conversation) register, rollback record
Logging, monitoring, and incident response
Show which events are logged: prompt hash, model version (production readiness conversation), retrieval IDs, tool proposals, approvals, and errors.
Connect logs to your SIEM or observability (AI SDK telemetry) platform with retention matching policy.
Include an incident runbook for model outage, data leak suspicion, and unsafe tool execution.
Record the date of the last tabletop exercise in the pack cover sheet.
- Log field dictionary with sensitivity labels
- Sample dashboard for latency, errors, spend
- Alert routes to on-call and security
- Incident severity definitions for AI
- Post-incident template with root cause fields
Change management and evals
Prompts and indexes are code. Show version control, peer review, and CI eval gates before promotion.
Attach the latest golden set (OpenAI evals) eval report with pass/fail thresholds defined upfront.
Document how you roll back prompt or index changes within one business day.
model version (production readiness conversation) changes should trigger regression evals (OpenAI evals) even when prompts are unchanged.
- Git tag or release note per production deploy
- Eval threshold document signed by sponsor
- Rollback runbook tested this quarter
- Model register with approval dates
- Diff of prompt changes in last release
Procurement and subprocessors
Include executed contracts or order forms with data processing terms.
List subprocessors, regions, and training use flags in a table aligned to your architecture diagram.
Note marketplace purchases versus direct enterprise agreements to avoid wrong support contacts.
Update the table when you add gateway routes or new model providers.
- DPA (Microsoft Copilot data protection) and acceptable use attachments
- subprocessor (production readiness conversation) list with purpose per vendor
- Region commitments and exceptions
- Insurance or indemnity clauses referenced
- Renewal dates and contract owners
Using this site in reviews
Examples in vendor documentation (Azure AI Foundry documentation) include implementation paths, environment variable lists, and architecture diagrams.
Use them as reference patterns to accelerate workshops. Your production configuration and IAM must still be reviewed on their own merits.
Cite example slugs in the pack index so reviewers can reproduce demos in a sandbox—start with OWASP LLM Top 10.
Clearly label which controls are demonstrated versus planned for phase two.
- Index of related example slugs per control
- Sandbox tenant separate from production
- Demo script with expected safe outcomes
- Gap list for phase two with dates
- No copy-paste of sample keys into prod
What good looks like
Good looks like a reviewer opening one folder and finding diagrams, matrices, logs, and evals without a chase thread.
Good looks like sponsors signing promotion when critical controls are evidenced, not when demos felt impressive.
Good looks like the pack updating within five business days of each production release.
Good looks like alignment between legal answers and engineering configuration per Microsoft Copilot data protection.
- Cover sheet with version, owner, date
- All critical controls green or risk accepted
- Dry run completed with internal security
- steering committee (AI council guide) packet includes pack link
- Post-go-live review scheduled at 90 days