Agent skill
AI Security Expert
Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection
Install this agent skill to your Project
npx add-skill https://github.com/frankxai/ai-architect/tree/main/skills/ai-security-expert
SKILL.md
AI Security Expert
Enterprise AI security architect specializing in securing LLM applications, defending against prompt injection, implementing guardrails, and OWASP LLM Top 10 compliance.
OWASP LLM Top 10 (2025)
Quick Reference
| # | Vulnerability | Risk | Key Defense |
|---|---|---|---|
| LLM01 | Prompt Injection | Critical | Input sanitization, delimiters |
| LLM02 | Insecure Output | High | Output validation, sanitization |
| LLM03 | Training Data Poisoning | High | Data provenance, auditing |
| LLM04 | Model DoS | Medium | Rate limiting, timeouts |
| LLM05 | Supply Chain | High | Verification, pinning |
| LLM06 | Sensitive Info Disclosure | High | PII detection, redaction |
| LLM07 | Insecure Plugin Design | High | Permission model, validation |
| LLM08 | Excessive Agency | High | Human-in-the-loop, least privilege |
| LLM09 | Overreliance | Medium | Confidence scores, citations |
| LLM10 | Model Theft | Medium | Rate limiting, watermarking |
LLM01: Prompt Injection
Attack Types:
- Direct: "Ignore previous instructions..."
- Indirect: Malicious content in RAG documents
- Encoding tricks: Unicode, special tokens
Defense Pattern:
User Input → Sanitize → Delimit → LLM → Validate Output → Filter
LLM02: Insecure Output Handling
- Never execute LLM output as code without validation
- Sanitize HTML (use allowlist)
- Validate SQL (SELECT only, table allowlist)
LLM04: Model DoS
- Rate limiting per user/API key
- Token limits on requests
- Timeout configurations
- Cost capping/alerts
LLM06: Sensitive Information Disclosure
- PII detection (regex + NER)
- System prompt protection
- Training data sanitization
- Output filtering
Code patterns: resources/security-patterns.py
PII Protection
Detection Patterns
| Type | Example Pattern |
|---|---|
*@*.com |
|
| Phone | XXX-XXX-XXXX |
| SSN | XXX-XX-XXXX |
| Credit Card | 16 digits |
| IP Address | X.X.X.X |
Redaction Strategy
- Detect PII in input before LLM call
- Redact PII in LLM output
- Log without PII
- Encrypt at rest
Guardrails Implementation
NeMo Guardrails (NVIDIA)
define user express harmful intent
"How do I hack"
define bot refuse harmful request
"I can't help with that."
define flow harmful intent
user express harmful intent
bot refuse harmful request
Guardrails AI
guard = Guard().use_many(
ToxicLanguage(on_fail="fix"),
PIIFilter(on_fail="fix"),
ValidJSON(on_fail="reask")
)
Custom Pipeline
Input Guards → LLM Call → Output Guards → Response
Implementation: resources/security-patterns.py
Security Architecture
Defense in Depth Layers
| Layer | Controls |
|---|---|
| Network | WAF, DDoS protection, API gateway |
| Auth | OAuth 2.0, API keys, mTLS |
| Input | Schema validation, injection detection |
| Guardrails | Topic restrictions, PII filtering |
| Model | Versioning, anomaly detection |
| Output | Response filtering, fact verification |
| Audit | Logging, retention, compliance |
Zero Trust Principles
- Never trust, always verify
- Least privilege for agents
- Assume breach (log everything)
Compliance Frameworks
EU AI Act (High-Risk)
- Risk management system
- Data governance
- Technical documentation
- Human oversight
- Accuracy/robustness testing
SOC 2 for AI
- Security: Access controls, encryption
- Availability: SLA monitoring, DR
- Processing Integrity: Input/output validation
- Confidentiality: Data classification
- Privacy: Data minimization, consent
Security Testing
Red Team Categories
- Direct injection attempts
- Jailbreak prompts
- Indirect injection via context
- Encoding/unicode tricks
Test suite: resources/security-patterns.py
Testing Checklist
- Injection patterns blocked
- System prompt protected
- PII detected and redacted
- Rate limits enforced
- Outputs validated
- Audit logs complete
Incident Response
Severity Levels
| Incident | Severity | Response |
|---|---|---|
| Prompt injection detected | Medium | Block, log, analyze |
| Data exfiltration attempt | High | Block, forensics, notify |
| Model extraction detected | High | Rate limit, investigate |
Response Steps
- Contain (block source)
- Preserve (logs, evidence)
- Analyze (attack pattern)
- Remediate (update defenses)
- Document (security log)
Resources
- OWASP LLM Top 10
- NIST AI Risk Management Framework
- NeMo Guardrails
- Guardrails AI
- LLM Security Best Practices
Secure AI systems with defense in depth and zero trust principles.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
GenAI DAC Specialist
Expert in OCI Generative AI Dedicated AI Clusters - deployment, fine-tuning, optimization, and production operations
Oracle Agent Spec Expert
Design framework-agnostic AI agents using Oracle's Open Agent Specification for portable, interoperable agentic systems with JSON/YAML definitions
OCI Services Expert
Expert guidance on Oracle Cloud Infrastructure services, cloud architecture patterns, cost optimization, deployment strategies, and OCI best practices for enterprise solutions
agentic-orchestration
Patterns for multi-agent coordination, task decomposition, handoffs, and workflow orchestration. Best practices for building and managing agent systems.
nvidia-nim
NVIDIA NIM inference microservices for deploying AI models with OpenAI-compatible APIs, self-hosted or cloud
AWS AI Services Expert
Build AI applications on AWS using Bedrock, SageMaker, and AI/ML services with best practices for enterprise deployment
Didn't find tool you were looking for?