AI LabsbyThe Ops ToolboxOps Toolbox

Decision guide

Security review evidence pack

What InfoSec and procurement typically ask for before production, and how to answer with demos plus artefacts. Pair with the AI security controls guide for ongoing practices.

Business sponsorsTechnical leaders

Controls vs evidence

InfoSec (OWASP LLM Top 10) reviews ask for proof that controls exist, not assertions that the team is responsible.

The AI security controls guide (controls guide) defines what to implement. This guide packages what to submit so reviewers can trace control to artefact quickly.

Treat the pack as a living folder updated each release, not a one-off PowerPoint before production readiness conversation go-live.

Sponsors should know which artefacts gate production promotion in the steering committee (AI council guide).

  • Control statement in one sentence
  • Owner role and escalation contact
  • Artefact file name and last updated date
  • Link to demo or log sample
  • Exception register if control is partial

Questions you should expect

Reviewers will ask about data flows, subprocessors, authentication, logging, prompt injection (OpenAI mitigations), tool abuse, model change management, and incident response.

Prepare short written answers with diagrams attached. Verbal assurances do not survive turnover in security teams.

Procurement (production readiness conversation) may join with contract questions on training use and retention. Align answers with the executed DPA (Microsoft Copilot data protection).

Schedule a dry run with your internal security partner before external review to catch missing artefacts early—use the evidence pack checklist as a script.

  • Where do prompts and outputs persist?
  • Who can access logs and indexes?
  • How are API keys (OWASP LLM Top 10) stored and rotated?
  • What happens if retrieval returns nothing?
  • How are models and prompts versioned?
  • What is the disable-tools procedure?

Architecture artefacts

Provide a data-flow diagram (Microsoft Copilot data protection), network diagram where applicable, IAM matrix (OWASP LLM Top 10), and list of third-party APIs.

Link each control on the diagram to an OWASP LLM control or redacted log sample from your pilot environment.

Show trust boundaries between user browser, application, model API, search index, and approval queue (OpenAI safety best practices).

Version diagrams when you add tools or change regions. Reviewers compare submissions across releases.

  • Context diagram with actors and systems
  • Sequence diagram for propose-approve-execute
  • IAM matrix (OWASP LLM Top 10): role, scope, data access
  • subprocessor (production readiness conversation) and region table
  • Environment separation: dev, test, prod

Identity and access evidence

Demonstrate SSO (Entra conditional access) with Entra ID (conditional access) or your corporate IdP, least-privilege app roles, and no shared admin keys in application code.

Export a redacted sample of authentication logs showing user identity on each model call where applicable.

Document break-glass access for engineers and how it is time boxed and reviewed.

Search indexes must respect the same entitlements as source systems per RAG ACL guidance. Include a test case in the pack.

  • SSO (Entra conditional access) configuration screenshot or runbook excerpt
  • Role matrix mapped to business functions
  • Sample JWT claims in logs, redacted
  • Key rotation calendar and last run date
  • Access review ticket closed this quarter

Data classification and retention

State which data classes may enter prompts and which are prohibited, with examples HR and legal agree on.

Attach retention schedules for prompts, outputs, logs, embeddings (OpenAI embeddings guide), and approval records.

Describe deletion and export procedures for subject rights requests.

If you redact before model call, show before and after samples in the pack with fields removed—reference Azure Content Safety.

  • data classification (Microsoft Copilot data protection) policy excerpt
  • Retention table per store
  • Redaction rules for PII and secrets
  • Index rebuild process when sources delete content
  • Backup and restore test date

Safety and abuse

Document input and output filtering, rate limits, and tool allow lists.

Run a lightweight penetration test (OWASP LLM Top 10) on tool endpoints, not only the chat user interface.

Include prompt injection (OpenAI mitigations) test results and what changed after remediation.

Show human approval samples for write actions with audit trail exports from OpenAI safety best practices.

  • Content filter configuration export
  • Rate limit and abuse alert thresholds
  • Tool allow list file in version control
  • Red-team summary with severity counts
  • HITL (OpenAI safety best practices) approval screenshot and log extract

Control-to-artefact matrix

Attach one primary artefact per control so reviewers do not chase scattered Confluence pages.

Use a spreadsheet or table with columns for control ID, owner, artefact link, and test date.

Mark partial controls honestly with compensating measures and target dates.

Steering committees promote pilots when critical controls are green or accepted as risk with owner per NIST AI RMF Govern process.

Logging, monitoring, and incident response

Show which events are logged: prompt hash, model version (production readiness conversation), retrieval IDs, tool proposals, approvals, and errors.

Connect logs to your SIEM or observability (AI SDK telemetry) platform with retention matching policy.

Include an incident runbook for model outage, data leak suspicion, and unsafe tool execution.

Record the date of the last tabletop exercise in the pack cover sheet.

  • Log field dictionary with sensitivity labels
  • Sample dashboard for latency, errors, spend
  • Alert routes to on-call and security
  • Incident severity definitions for AI
  • Post-incident template with root cause fields

Change management and evals

Prompts and indexes are code. Show version control, peer review, and CI eval gates before promotion.

Attach the latest golden set (OpenAI evals) eval report with pass/fail thresholds defined upfront.

Document how you roll back prompt or index changes within one business day.

model version (production readiness conversation) changes should trigger regression evals (OpenAI evals) even when prompts are unchanged.

  • Git tag or release note per production deploy
  • Eval threshold document signed by sponsor
  • Rollback runbook tested this quarter
  • Model register with approval dates
  • Diff of prompt changes in last release

Procurement and subprocessors

Include executed contracts or order forms with data processing terms.

List subprocessors, regions, and training use flags in a table aligned to your architecture diagram.

Note marketplace purchases versus direct enterprise agreements to avoid wrong support contacts.

Update the table when you add gateway routes or new model providers.

Using this site in reviews

Examples in vendor documentation (Azure AI Foundry documentation) include implementation paths, environment variable lists, and architecture diagrams.

Use them as reference patterns to accelerate workshops. Your production configuration and IAM must still be reviewed on their own merits.

Cite example slugs in the pack index so reviewers can reproduce demos in a sandbox—start with OWASP LLM Top 10.

Clearly label which controls are demonstrated versus planned for phase two.

  • Index of related example slugs per control
  • Sandbox tenant separate from production
  • Demo script with expected safe outcomes
  • Gap list for phase two with dates
  • No copy-paste of sample keys into prod

What good looks like

Good looks like a reviewer opening one folder and finding diagrams, matrices, logs, and evals without a chase thread.

Good looks like sponsors signing promotion when critical controls are evidenced, not when demos felt impressive.

Good looks like the pack updating within five business days of each production release.

Good looks like alignment between legal answers and engineering configuration per Microsoft Copilot data protection.

  • Cover sheet with version, owner, date
  • All critical controls green or risk accepted
  • Dry run completed with internal security
  • steering committee (AI council guide) packet includes pack link
  • Post-go-live review scheduled at 90 days

Next step

Plan your next pilot

Worked examples and decision guides you can run in reviews and pilots — an evidence base from The Ops Toolbox.

Prefer the web form? The Ops Toolbox.

  • One workflow, clear metrics
  • Your cloud, your keys
  • Written handoff, not dependency