Agent skill
sanitize-git-repo
Guidance for sanitizing git repositories by identifying and removing sensitive credentials, API keys, tokens, and other secrets. This skill should be used when tasks involve cleaning repositories of secrets, preparing code for public release, auditing for credential exposure, or removing sensitive data from version control history.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/sanitize-git-repo
SKILL.md
Repository Sanitization Skill
This skill provides systematic approaches for identifying and removing sensitive information from git repositories, including credentials, API keys, tokens, and other secrets.
When to Use This Skill
- Preparing a private repository for public release
- Auditing repositories for accidentally committed secrets
- Cleaning up after a credential leak
- Removing sensitive data from git history
- Compliance and security reviews
Critical Understanding
Repository sanitization involves two distinct scopes:
- Working Directory Sanitization: Replacing secrets in current file contents
- Git History Sanitization: Removing secrets from all historical commits
Simply editing files in the working directory does NOT remove secrets from git history. The .git/objects directory retains all historical versions of files.
Approach
Phase 1: Comprehensive Secret Detection
Before making any changes, perform exhaustive detection using multiple strategies:
1.1 Pattern-Based Detection
Search for common secret patterns. Refer to references/secret_patterns.md for comprehensive regex patterns covering:
- API keys (AWS, GCP, Azure, GitHub, GitLab, Hugging Face, OpenAI, etc.)
- Authentication tokens (JWT, OAuth, Bearer tokens)
- Database credentials (connection strings, passwords)
- Private keys (RSA, SSH, PGP)
- Environment variable assignments containing secrets
- Base64-encoded secrets
- Webhook URLs with embedded tokens
1.2 File-Based Detection
Check files commonly containing secrets:
.env,.env.*files- Configuration files:
*.yaml,*.yml,*.json,*.toml,*.ini,*.cfg - Docker files:
Dockerfile,docker-compose.yml - CI/CD configs:
.github/workflows/*,.gitlab-ci.yml,Jenkinsfile - Cloud configs:
terraform.tfvars,*.tfstate - Credential files:
credentials,secrets,*.pem,*.key
1.3 Entropy-Based Detection
High-entropy strings often indicate secrets. Look for:
- Strings with mixed case, numbers, and special characters
- Strings longer than 20 characters without dictionary words
- Base64-encoded blobs in unexpected locations
Phase 2: Systematic Verification
After initial detection:
- Document all findings before making changes
- Categorize secrets by type (API key, password, token, etc.)
- Identify false positives (environment variable references vs actual values)
- Check all files in directories where secrets are found (not just the first match)
- Examine binary files that might contain embedded secrets
Phase 3: Sanitization Strategy
3.1 Placeholder Format
Use consistent placeholder formats:
<your-aws-access-key>for AWS keys<your-api-key>for generic API keys<your-database-password>for passwords${ENV_VAR_NAME}for values that should come from environment
3.2 Working Directory Sanitization
- Create a backup branch before modifications
- Replace secrets systematically, one type at a time
- Verify each replacement maintains file validity (especially JSON/YAML)
- Run syntax validation on modified configuration files
3.3 Git History Sanitization
For complete sanitization, the git history must also be cleaned:
Option A: BFG Repo-Cleaner (Recommended)
# Remove specific strings from history
bfg --replace-text secrets.txt repo.git
Option B: git filter-repo
# Remove file containing secrets from all history
git filter-repo --path sensitive-file.txt --invert-paths
Option C: git filter-branch (Legacy)
# Use only if other tools unavailable
git filter-branch --tree-filter 'command' HEAD
After history rewriting:
- Force push to remote (coordinate with team)
- All collaborators must re-clone
- Invalidate/rotate all exposed credentials
Phase 4: Verification
4.1 Post-Sanitization Checks
- Re-run all detection patterns to confirm no secrets remain
- Verify file syntax (JSON, YAML, etc.) is still valid
- Check that placeholder format is consistent throughout
- Confirm application can start (may fail due to missing secrets - expected)
4.2 Git History Verification
# Search git history for secret patterns
git log -p --all -S 'secret_pattern' --source
Common Pitfalls
Incomplete Detection
- Pitfall: Searching for only specific patterns (e.g.,
ghp_for GitHub tokens) - Solution: Use comprehensive pattern list; account for older token formats and variations
Ignoring File Types
- Pitfall: Only checking text files, missing secrets in JSON, notebooks, or binary files
- Solution: Check all file types; use specialized tools for binary inspection
Forgetting Git History
- Pitfall: Only sanitizing working directory, leaving secrets in git history
- Solution: Always warn about git history; use BFG or git-filter-repo for complete removal
Inconsistent Placeholders
- Pitfall: Using different placeholder formats (
<token>,YOUR_TOKEN,xxx) - Solution: Define placeholder convention upfront; use search-replace consistently
Missing Related Secrets
- Pitfall: Finding one secret in a file but not checking for related secrets
- Solution: When a secret is found, thoroughly examine the entire file and directory
Partial Directory Scanning
- Pitfall: Checking only the first matching file in a directory
- Solution: Systematically check all files matching patterns in directories where secrets are found
Encoded Secrets
- Pitfall: Missing Base64-encoded or otherwise obfuscated secrets
- Solution: Decode suspicious Base64 strings; check for common encoding patterns
Verification Checklist
Before declaring sanitization complete:
- All secret patterns from
references/secret_patterns.mdsearched - All common secret-containing file types checked
- All files in directories with findings examined
- Placeholder format is consistent
- Configuration file syntax validated
- Git history addressed (cleaned or documented as still containing secrets)
- Comprehensive re-scan confirms no remaining secrets
- Credential rotation recommended to user for any exposed secrets
Didn't find tool you were looking for?