LLM Red Teaming Checklist

50-point interactive security audit for LLM applications. Check off controls across 8 categories, watch your security score update in real time, and export a PDF-ready audit report for your team.

0 / 50
Security Score
F
Complete the checklist to assess your LLM application's security posture.

Category Progress

How to Use This LLM Red Teaming Checklist

Red teaming an LLM application is fundamentally different from red teaming traditional software. The attack surface includes not just the infrastructure and API layer, but the model's behavior under adversarial inputs, the interaction between user-supplied content and developer-supplied instructions, and the emergent capabilities that arise from the model's training that developers may not have accounted for. This checklist structures the audit into eight distinct domains to ensure comprehensive coverage.

Start with the Critical priority items in each category — these represent the controls most commonly absent in production LLM deployments and the ones most likely to be exploited first. Work through High priority items next, then Medium and Low. The export function generates a formatted report you can share with your security team, include in compliance documentation, or use as a remediation tracker.

Category 1: Input Validation

Input validation for LLM applications goes beyond traditional input sanitization. It includes detecting injection patterns, enforcing length limits to prevent context window flooding, validating that inputs conform to expected formats for the use case, and filtering known jailbreak patterns. Unlike SQL injection or XSS, there is no syntactic boundary between valid user input and malicious instructions in natural language — all filtering must be semantic rather than purely syntactic.

Category 2: Output Filtering

LLM outputs must be validated before they are displayed to users, acted upon by downstream systems, or stored. Output filtering catches cases where injection succeeded internally (the model followed injected instructions) but the payload can be intercepted before delivery. It also prevents content policy violations, PII leakage from the context window, and outputs that trigger security alerts in downstream systems. Output filtering is a complementary layer — not a substitute for preventing injection in the first place.

Category 3: Role Enforcement

Role enforcement tests whether the LLM reliably stays in its designated function under adversarial pressure. A customer service bot should not become a general-purpose AI, a document summarizer should not become a code executor, and an assistant with tool access should not use tools beyond its defined scope. Role enforcement includes testing how the model responds to explicit override attempts, roleplay-based escapes, and gradual boundary erosion through multi-turn conversation.

Categories 4-8

Prompt injection resistance covers the specific attack vectors described in the Prompt Injection Defense Guide. Data handling audits verify that sensitive data in the context window cannot be leaked through indirect means. Model behavior testing evaluates the model's responses to harmful content requests, biased outputs, and hallucination in safety-critical contexts. Infrastructure security covers the API layer, rate limiting, authentication, and monitoring. Compliance checks align the deployment with OWASP LLM Top 10, NIST AI RMF, and applicable regulatory requirements.

Frequently Asked Questions

What is LLM red teaming and why is it important?

LLM red teaming is a structured security assessment of an LLM-based application from an adversarial perspective. Red teamers attempt to find ways to make the LLM behave in unintended ways — bypassing safety guidelines, leaking sensitive information, ignoring developer instructions, or taking unauthorized actions. It is important because LLMs have a fundamentally different and less well-understood attack surface than traditional software, and many teams deploy LLM applications without systematic security testing.

How long does a thorough LLM security audit take?

A thorough LLM security audit of a production application typically takes 1-3 weeks for a dedicated red team. A focused review of the highest-priority controls in this checklist can be completed in 2-3 days for a single application with limited scope. The time scales with the number of LLM entry points, the complexity of the agentic pipeline, and the number of external data sources the LLM can access. Use the Critical and High priority items in this checklist for a time-boxed rapid assessment.

Should I use automated or manual red teaming for LLM applications?

Both. Automated tools (like Garak, PyRIT, or custom fuzzing) are effective for testing known injection patterns at scale, verifying that specific keywords are blocked, and regression testing after changes. Manual red teaming is required for novel attack paths, multi-turn conversation attacks, business logic bypasses, and attacks that require understanding of the specific application context. The most effective programs combine automated scanning with focused manual testing of the highest-risk areas identified by automation.

What frameworks should guide LLM security assessments?

The OWASP LLM Top 10 (2025 edition) provides the most widely adopted framework for LLM vulnerability categories. NIST AI RMF provides a governance framework for AI risk management. MITRE ATLAS catalogs adversarial ML tactics and techniques. For compliance contexts, ISO/IEC 42001 (AI Management Systems) is increasingly referenced. This checklist aligns with OWASP LLM Top 10 categories while adding operational controls that the standards sometimes underspecify.

What is the most commonly missed LLM security control in production?

In our assessment of production LLM deployments, the most commonly missed controls are: (1) Indirect prompt injection testing — most teams test direct injection but do not test for injection via retrieved external content; (2) Least-privilege tool design for agentic systems — LLMs given broad tool access when narrow access would suffice; (3) Output validation before acting on LLM decisions in automated pipelines; (4) Rate limiting designed specifically for extraction attacks (which require many queries at low rate); (5) Documented incident response procedures for LLM-specific attacks.

ML

Michael Lip

Builder of Zovo Tools — free developer utilities with no tracking. LockML helps ML engineers compare models, audit security, and build safer AI systems.