Agent skill
ring:dev-chaos-testing
Gate 7 of development cycle - ensures chaos tests exist using Toxiproxy to verify graceful degradation under connection loss, latency, and partitions.
Install this agent skill to your Project
npx add-skill https://github.com/LerianStudio/ring/tree/main/dev-team/skills/dev-chaos-testing
SKILL.md
Dev Chaos Testing (Gate 7)
Overview
Ensure code handles failure conditions gracefully by injecting faults using Toxiproxy. Verify connection loss, latency, and network partitions don't cause crashes.
Core principle: All infrastructure fails. Chaos testing ensures your code handles it gracefully.
<block_condition>
- No chaos tests = FAIL
- Any dependency without failure test = FAIL
- Recovery not verified = FAIL
- System crashes on failure = FAIL </block_condition>
CRITICAL: Role Clarification
This skill ORCHESTRATES. QA Analyst Agent (chaos mode) EXECUTES.
| Who | Responsibility |
|---|---|
| This Skill | Gather requirements, dispatch agent, track iterations |
| QA Analyst Agent | Write chaos tests, setup Toxiproxy, verify recovery |
Standards Reference
MANDATORY: Load testing-chaos.md standards via WebFetch.
<fetch_required> https://raw.githubusercontent.com/LerianStudio/ring/main/dev-team/docs/standards/golang/testing-chaos.md </fetch_required>
Step 0: Detect External Dependencies (Auto-Detection)
MANDATORY: When external_dependencies is empty or not provided, scan the codebase to detect them automatically before validation.
if external_dependencies is empty or not provided:
detected_dependencies = []
1. Scan docker-compose.yml / docker-compose.yaml for service images:
- Grep tool: pattern "postgres" in docker-compose* files → add "postgres"
- Grep tool: pattern "mongo" in docker-compose* files → add "mongodb"
- Grep tool: pattern "valkey" in docker-compose* files → add "valkey"
- Grep tool: pattern "redis" in docker-compose* files → add "redis"
- Grep tool: pattern "rabbitmq" in docker-compose* files → add "rabbitmq"
2. Scan dependency manifests:
if language == "go":
- Grep tool: pattern "github.com/lib/pq" in go.mod → add "postgres"
- Grep tool: pattern "github.com/jackc/pgx" in go.mod → add "postgres"
- Grep tool: pattern "go.mongodb.org/mongo-driver" in go.mod → add "mongodb"
- Grep tool: pattern "github.com/redis/go-redis" in go.mod → add "redis"
- Grep tool: pattern "github.com/valkey-io/valkey-go" in go.mod → add "valkey"
- Grep tool: pattern "github.com/rabbitmq/amqp091-go" in go.mod → add "rabbitmq"
if language == "typescript":
- Grep tool: pattern "\"pg\"" in package.json → add "postgres"
- Grep tool: pattern "@prisma/client" in package.json → add "postgres"
- Grep tool: pattern "\"mongodb\"" in package.json → add "mongodb"
- Grep tool: pattern "\"mongoose\"" in package.json → add "mongodb"
- Grep tool: pattern "\"redis\"" in package.json → add "redis"
- Grep tool: pattern "\"ioredis\"" in package.json → add "redis"
- Grep tool: pattern "@valkey" in package.json → add "valkey"
- Grep tool: pattern "\"amqplib\"" in package.json → add "rabbitmq"
- Grep tool: pattern "amqp-connection-manager" in package.json → add "rabbitmq"
3. Deduplicate detected_dependencies
4. Set external_dependencies = detected_dependencies
Log: "Auto-detected external dependencies: [detected_dependencies]"
<auto_detect_reason> PM team task files often omit external_dependencies. If the codebase uses postgres, mongodb, valkey, or rabbitmq, these are external dependencies that MUST have chaos tests. Auto-detection prevents silent skips. </auto_detect_reason>
Step 1: Validate Input
REQUIRED INPUT:
- unit_id: [task/subtask being tested]
- external_dependencies: [postgres, mongodb, valkey, redis, rabbitmq, etc.] (from input OR auto-detected in Step 0)
- language: [go|typescript]
OPTIONAL INPUT:
- gate6_handoff: [full Gate 6 output]
if any REQUIRED input is missing:
→ STOP and report: "Missing required input: [field]"
if external_dependencies is empty (AFTER auto-detection in Step 0):
→ STOP and report: "No external dependencies found after codebase scan - chaos testing requires dependencies"
Step 2: Dispatch QA Analyst Agent (Chaos Mode)
Task tool:
subagent_type: "ring:qa-analyst"
prompt: |
**MODE:** CHAOS TESTING (Gate 7)
**Standards:** Load testing-chaos.md
**Input:**
- Unit ID: {unit_id}
- External Dependencies: {external_dependencies}
- Language: {language}
**Requirements:**
1. Setup Toxiproxy infrastructure in tests/utils/chaos/
2. Create chaos tests (TestIntegration_Chaos_{Component}_{Scenario} naming)
3. Use dual-gate pattern (CHAOS=1 env + testing.Short())
4. Test failure scenarios: Connection Loss, High Latency, Network Partition
5. Verify 5-phase structure: Normal → Inject → Verify → Restore → Recovery
**Output Sections Required:**
- ## Chaos Testing Summary
- ## Failure Scenarios
- ## Handoff to Next Gate
Step 3: Evaluate Results
Parse agent output:
if "Status: PASS" in output:
→ Gate 7 PASSED
→ Return success with metrics
if "Status: FAIL" in output:
→ Dispatch fix to implementation agent
→ Re-run chaos tests (max 3 iterations)
→ If still failing: ESCALATE to user
Step 4: Generate Output
## Chaos Testing Summary
**Status:** {PASS|FAIL}
**Dependencies Tested:** {count}
**Scenarios Tested:** {count}
**Recovery Verified:** {Yes|No}
## Failure Scenarios
| Component | Scenario | Status | Recovery |
|-----------|----------|--------|----------|
| {component} | {scenario} | {PASS|FAIL} | {Yes|No} |
## Handoff to Next Gate
- Ready for Gate 8 (Code Review): {YES|NO}
- Iterations: {count}
Failure Scenarios by Dependency
| Dependency | Required Scenarios |
|---|---|
| PostgreSQL | Connection Loss, High Latency, Network Partition |
| MongoDB | Connection Loss, High Latency, Network Partition |
| Valkey | Connection Loss, High Latency, Timeout |
| Redis | Connection Loss, High Latency, Timeout |
| RabbitMQ | Connection Loss, Network Partition, Slow Consumer |
| HTTP APIs | Timeout, 5xx Errors, Connection Refused |
Severity Calibration
| Severity | Criteria | Examples |
|---|---|---|
| CRITICAL | System crashes on failure, data loss | Panic on connection loss, corrupted state on partition |
| HIGH | No recovery, missing dependency tests | System doesn't recover after failure, untested dependency |
| MEDIUM | Partial recovery, missing scenarios | Recovery takes too long, missing latency test |
| LOW | Cleanup issues, documentation | Test artifacts not cleaned, missing chaos docs |
Report all severities. CRITICAL = immediate fix (production risk). HIGH = fix before gate pass. MEDIUM = fix in iteration. LOW = document.
Anti-Rationalization Table
| Rationalization | Why It's WRONG | Required Action |
|---|---|---|
| "Infrastructure is reliable" | AWS, GCP, Azure all have outages. Your code must handle them. | Write chaos tests |
| "Integration tests cover failures" | Integration tests verify happy path. Chaos tests verify failure handling. | Write chaos tests |
| "Toxiproxy is complex" | One container. 20 minutes setup. Prevents production incidents. | Write chaos tests |
| "We have monitoring" | Monitoring detects problems. Chaos testing prevents them. | Write chaos tests |
| "Circuit breakers handle it" | Circuit breakers need testing too. Chaos tests verify they work. | Write chaos tests |
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
ring:regulatory-templates
5-stage regulatory template orchestrator - manages setup, Gate 1 (analysis + auto-save), Gate 2 (validation), Gate 3 (generation), optional Test Gate, optional Contribution Gate. Supports any regulatory template (BACEN, RFB, CVM, SUSEP, COAF, or other).
ring:using-finops-team
3 FinOps agents: 2 for Brazilian financial regulatory compliance (BACEN, RFB, Open Banking), 1 for infrastructure cost estimation when onboarding customers. Supports any regulatory template via open intake system.
ring:regulatory-templates-gate1
Gate 1 sub-skill - performs regulatory compliance analysis, field mapping, batch approval by confidence level, and auto-saves dictionary after approval. Supports both pre-defined templates (dictionary exists) and new templates (any spec).
ring:regulatory-templates-gate2
Gate 2 sub-skill - validates uncertain mappings from Gate 1 and confirms all field specifications through testing.
ring:regulatory-templates-gate3
Gate 3 sub-skill - generates complete .tpl template file with all validated mappings from Gates 1-2.
ring:infrastructure-cost-estimation
Orchestrates infrastructure cost estimation with tier-based or custom TPS sizing. Offers pre-configured tiers (Starter/Growth/Business/Enterprise) or custom TPS input. Skill discovers components, asks shared/dedicated for EACH, selects environment(s), reads actual Helm chart configs, then dispatches agent for accurate calculations.
Didn't find tool you were looking for?