Securing RAG & Agentic Chatbots with OWASP LLM Top 10
Over the past two years, Iβve been working on AI applications π€, guiding organizations to build AI governance frameworks, responsible AI policies, and deploying production-ready systems.
From this experience, I can confidently say: figuring out the technical part is fun π and often the easier part. The bigger challengeβand where most time is spentβis building responsible AI practices and governance frameworks that scale across the enterprise.
In my previous post, I discussed how to approach AI governance and frameworks at the enterprise level. In this post, letβs go through a quick 101 on designing AI application architectures responsibly.
π Reference: OWASP Top 10 for LLM Applications
ποΈ Why Architecture Matters in AI Applicationsβ
The AI landscape changes daily β‘, making it difficult to lock down a future-proof architecture. A good starting point is defining:
- π― The objective of the AI application
 - π₯οΈ The platform on which it will be built
 
These early decisions shape the system design and architecture.
For this discussion, letβs use an example: a domain-specific chatbot π¬ that uses customer data and a foundational model to generate responses. To make it more complex, weβll add tool calling π οΈ and agents πΉοΈ for real-time, domain-specific functions.
ποΈ Base Architectureβ
Figure 1 β Basic Chatbot Architecture

At first glance π, this architecture may look ready for production. However, during an architecture review board π§βπ» or discussions with security and compliance teams π, this base setup will quickly fall short.
Why? Because we havenβt yet considered the security, safety, and compliance risks π¨ that can be exploited in such a design.
Just as we use OWASP Top 10 to secure web applications π, OWASP has released the LLM Top 10βa framework to secure AI and LLM-powered applications.
π OWASP GenAI Security Projectβ
The OWASP GenAI Security Project is a global, open-source initiative dedicated to identifying, mitigating, and documenting security and safety risks associated with generative AI technologies, including large language models (LLMs), agentic AI systems, and AI-driven applications.
β This framework is an excellent starting point for both beginners π and experts π§ to evaluate architecture, identify vulnerabilities, and mitigate risks.
β Key Security Questions for Chatbot Applicationsβ
When designing AI applications, consider:
- π‘οΈ How will the application handle prompt injection (at both input and output)?
 - π How is sensitive data (PII) handled? Is it anonymized or masked?
 - π¦ How is training data stored and secured? What if training data is poisoned?
 - π§© How are custom libraries and tools secured? Are they scanned for vulnerabilities?
 - π’ Does the application disclose its use of AI and align with Responsible AI policies?
 - 𧬠How are fine-tuned or custom models protected? What happens if theyβre exposed?
 
π OWASP LLM Top 10β
Here are the 10 key risks to consider:
- π LLM01: Prompt Injection
 - β οΈ LLM02: Insecure Output Handling
 - π§ͺ LLM03: Training Data Poisoning
 - π LLM04: Model Denial of Service
 - π LLM05: Supply Chain Vulnerabilities
 - π LLM06: Sensitive Information Disclosure
 - π§© LLM07: Insecure Plugin Design
 - π€ LLM08: Excessive Agency
 - π€ LLM09: Overreliance
 - π΅οΈ LLM10: Model Theft
 
πΊοΈ Mapping to Architectureβ
Figure 2 β Mapping OWASP LLM Top 10 to Architecture

By applying these guidelines, you can create a matrix π that scores your architecture against the OWASP framework. This provides:
- A baseline security posture π for AI applications
 - A reference template π for future system design
 - A governance-aligned approach ποΈ to AI architecture
 
π OWASP LLM Top 10 β Scoring Matrix Templateβ
Each item can be scored on a scale of 1β5 (1 = poor π«, 5 = strong πͺ).
| π OWASP Risk ID | π Risk Category | π Description | π·οΈ Score (1-5) | π οΈ Notes / Mitigation Plan | 
|---|---|---|---|---|
| LLM01 | Prompt Injection | Protection against prompt injection attempts | ||
| LLM02 | Insecure Output Handling | Validation and sanitization of model outputs | ||
| LLM03 | Training Data Poisoning | Safeguards against corrupted training data | ||
| LLM04 | Model Denial of Service | Rate limiting, monitoring, and throttling | ||
| LLM05 | Supply Chain Vulnerabilities | Verification of datasets, plugins, libraries | ||
| LLM06 | Sensitive Info Disclosure | Anonymization, masking, encryption of PII | ||
| LLM07 | Insecure Plugin Design | Plugin isolation and secure coding practices | ||
| LLM08 | Excessive Agency | Controls to limit agent autonomy | ||
| LLM09 | Overreliance | Human-in-the-loop and fallback mechanisms | ||
| LLM10 | Model Theft | Access controls, encryption, monitoring | 
π§ͺ Sample Scoring Matrix: Chatbot + RAG + Agentβ
Hereβs a worked example for a domain-specific chatbot π¬ that uses RAG (Retrieval Augmented Generation π) with tool calling π οΈ and agentic workflows π€.
| π OWASP Risk ID | π Risk Category | π Description | π·οΈ Score (1-5) | π οΈ Notes / Mitigation Plan | 
|---|---|---|---|---|
| LLM01 | Prompt Injection | Moderate risk, mitigated with input/output filters | 3 | Add context validation + regex sanitization | 
| LLM02 | Insecure Output Handling | High risk due to tool execution | 2 | Enforce strict schema validation + guardrails | 
| LLM03 | Training Data Poisoning | Moderate risk if knowledge base ingestion is not validated | 3 | Add data quality checks + signed data sources | 
| LLM04 | Model Denial of Service | High risk (agents can loop or generate heavy queries) | 2 | Add rate limiting + monitoring | 
| LLM05 | Supply Chain Vulnerabilities | Plugins & APIs could be compromised | 3 | Use dependency scanning & signed artifacts | 
| LLM06 | Sensitive Info Disclosure | RAG may retrieve PII or confidential data | 2 | Add anonymization + retrieval filters | 
| LLM07 | Insecure Plugin Design | High risk with tool calling | 2 | Implement zero-trust plugin execution | 
| LLM08 | Excessive Agency | Agents may overstep bounds | 2 | Add role-based execution policies | 
| LLM09 | Overreliance | Users may blindly trust answers | 3 | Add disclaimers + confidence scoring | 
| LLM10 | Model Theft | Lower risk in managed cloud (e.g. Bedrock) | 4 | Rely on provider safeguards + IAM | 
π― Final Thoughtsβ
The OWASP LLM Top 10 is not just a checklistβitβs a security lens π for AI system design.
By using it in combination with your enterprise AI governance framework, youβll be better equipped to build secure π, responsible π±, and accountable π AI applications that can withstand real-world risks.
