Agent skill
bigconfig-generator
Use this skill when creating or updating Bigeye monitoring configurations (bigconfig.yml files) for BigQuery tables. Works with metadata-manager skill.
Install this agent skill to your Project
npx add-skill https://github.com/mozilla/bigquery-etl-skills/tree/main/skills/bigconfig-generator
SKILL.md
Bigconfig Generator
Composable: Works with metadata-manager (for schema/metadata generation) and bigquery-etl-core (for conventions) When to use: Creating/updating Bigeye configurations, data quality monitoring
Overview
Generate and manage Bigeye monitoring configurations for BigQuery tables in the Mozilla bigquery-etl repository. Bigeye is Mozilla's data quality monitoring platform that checks for freshness, volume anomalies, null values, uniqueness violations, and custom business logic validation.
This skill helps configure monitoring through:
- metadata.yaml - High-level monitoring settings (freshness, volume, collections)
- bigconfig.yml - Detailed metric definitions (auto-generated via bqetl CLI)
- bigeye_custom_rules.sql - Custom SQL validation rules (optional, for complex business logic)
Official Documentation:
- bigConfig Reference: https://mozilla.github.io/bigquery-etl/reference/bigconfig/ (docs/reference/bigconfig.md)
- Bigeye Intro: https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html
- Bigeye Official Docs: https://docs.bigeye.com/docs/bigconfig
🚨 REQUIRED READING - Start Here
BEFORE creating monitoring configurations, READ these resources:
-
Existing Collections: READ
references/existing_collections.md- Collections already in use across the repository
- Notification channels by dataset/team
- Helps maintain consistency and avoid creating duplicate collections
-
Monitoring Patterns: READ
references/monitoring_patterns.md- Common monitoring scenarios
- Freshness vs volume monitoring
- When to use custom rules
- Configuration workflow
📋 Templates - Copy These Structures
When adding monitoring to metadata.yaml, READ and COPY from these templates:
-
Basic monitoring (most tables)? → READ
assets/metadata_monitoring_basic.yaml- Standard freshness and volume checks
- Collection assignment
-
Critical table (high priority)? → READ
assets/metadata_monitoring_critical.yaml- More aggressive monitoring settings
- Faster alerting
-
View (non-partitioned)? → READ
assets/metadata_monitoring_view.yaml- Monitoring for views without partitions
For custom validation rules:
- Custom SQL checks? → READ
assets/custom_rules_template.sql- Template for bigeye_custom_rules.sql
- Shows how to write validation queries
When to Use This Skill
Use this skill when:
- Creating new tables and user wants to enable monitoring
- User explicitly requests "create a bigeye config for..."
- User asks about adding data quality monitoring
- Setting up freshness or volume checks
- Creating custom validation rules
- Troubleshooting monitoring configurations
Integration with metadata-manager: When metadata-manager creates new tables, it should ask the user: "Would you like to enable Bigeye monitoring for this table?" If yes, invoke this skill.
🚨 IMPORTANT: Deployment Safety
Manual deployment is BLOCKED for safety reasons.
If a user asks to run ./bqetl monitoring deploy, warn them:
⚠️ Manual deployment can accidentally delete existing metrics. The recommended workflow is to commit your changes and let the
bqetl_artifact_deploymentDAG deploy automatically. Manual deployment is disabled in this environment.If you need to manually deploy for testing purposes, you'll need to:
- Ensure you have
BIGEYE_API_KEYset- Understand that deploying only specific tables can remove metrics from other tables
- Use
--dry-runfirst to review changes- Contact Data Engineering if you're unsure
Proceed with caution - this can affect production monitoring.
The standard workflow (update → validate → commit → push) is safe and recommended.
Prerequisites
- Table must have metadata.yaml file
- Table must be deployed to BigQuery
- Understanding of table's update schedule (daily, hourly, etc.)
- For manual deployment (discouraged):
BIGEYE_API_KEYenvironment variable must be set
Staying Current with Documentation
Always prefer official documentation over this skill's references:
- For bigConfig syntax and structure: Read docs/reference/bigconfig.md or use WebFetch on https://mozilla.github.io/bigquery-etl/reference/bigconfig/
- For available saved metrics: Check sql/bigconfig.yml in the repository (source of truth)
- For Bigeye concepts: Use WebFetch on https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html
- For bqetl CLI commands: Check
./bqetl monitoring --helpor the monitoring.py source code
When to use WebFetch:
- User asks about specific bigConfig features not covered in this skill
- Need to verify current syntax or available options
- References in this skill seem outdated or incomplete
- Troubleshooting issues not covered in common patterns
This skill focuses on workflow and decision-making rather than being a comprehensive bigConfig reference.
Workflow
Step 1: Determine Monitoring Requirements
Ask the user what type of monitoring they need:
For new tables created by metadata-manager: "Would you like to enable Bigeye monitoring for this table? This can check for:
- Freshness (when data was last updated)
- Volume (row count anomalies)
- Column-level validation (nulls, uniqueness, formats)
- Custom business logic validation"
For existing tables: "What type of monitoring would you like to configure?
- Basic (freshness + volume)
- Critical (freshness + volume with blocking)
- Column-level validation
- Custom SQL rules
- All of the above"
After determining monitoring type, check existing collections:
Before configuring metadata.yaml, READ references/existing_collections.md to:
- Find the dataset in "Collections by Dataset" section
- Check if there's an existing collection for this dataset/team
- Note the notification channels used by similar tables
Ask the user: "Based on existing configurations, would you like to use the [Collection Name] collection with [notification channels]? Or create a new collection?"
Step 2: Configure metadata.yaml
Add a monitoring section to metadata.yaml based on table type:
- Basic (most tables):
assets/metadata_monitoring_basic.yaml- Freshness + volume, non-blocking - Critical (production):
assets/metadata_monitoring_critical.yaml- Blocking failures, collection assignment - Views:
assets/metadata_monitoring_view.yaml- Requires explicit partition_column
Key settings:
blocking: true- Failures block deployments (use for critical tables)collection- Groups related tables, configures alertspartition_column- Required for views (or null if non-partitioned)
Step 3: Generate bigconfig.yml
Use the bqetl CLI to auto-generate bigconfig.yml from metadata.yaml:
./bqetl monitoring update <dataset>.<table>
This command:
- Reads monitoring settings from metadata.yaml
- Generates appropriate metric definitions in bigconfig.yml
- Adds freshness/volume checks based on configuration
- Uses saved metrics from sql/bigconfig.yml
What gets generated:
- If
freshness.enabled: true→ Adds freshness metric - If
volume.enabled: true→ Adds volume metric - If
blocking: true→ Usesfreshness_fail/volume_failvariants - If
collectionspecified → Groups under that collection
Step 4: Customize bigconfig.yml (Optional)
Manually edit the generated bigconfig.yml for advanced use cases:
Column-level validation: Add tag_deployments section with column_selectors and metrics (is_not_null, is_unique, is_valid_client_id, etc.). See sql/bigconfig.yml for all available saved metrics.
Lookback windows: Adjust how far back Bigeye scans data (0=latest partition, 7=last 7 days, 28=last 28 days). Use longer lookback for tables with sporadic updates.
When to customize: Column-specific validation, custom thresholds, infrequent updates, different notification channels per metric.
See references/monitoring_patterns.md for examples.
Step 5: Add Custom SQL Rules (Optional)
For complex business logic validation (cross-column checks, format validation, business rules), create bigeye_custom_rules.sql in the table directory.
Use template: assets/custom_rules_template.sql contains structure, JSON configuration block, and examples.
Key points:
- Query returns percentage (0-100) or count
- JSON comment block configures name, range, collections, owner, schedule
- Supports Jinja variables:
{{ project_id }},{{ dataset_id }},{{ table_name }}
Step 6: Validate Configuration
Validate bigconfig.yml syntax and configuration:
./bqetl monitoring validate <dataset>.<table>
What it checks:
- Valid YAML syntax
- No duplicate metric deployments
- Saved metric IDs exist
- For views: partition_column is explicitly set in metadata.yaml
Common validation errors:
- "Duplicate deployments" → Consolidate metrics under single deployment
- "Invalid metric" → Check saved_metric_id exists in sql/bigconfig.yml
- "Partition column needs to be configured" → Set
partition_columnandpartition_column_set: truefor views
Step 7: Deploy to Bigeye
Recommended approach: Automatic deployment via Airflow DAG
After validation passes, commit and push your changes to the main branch:
git add sql/<project>/<dataset>/<table>/
git commit -m "Add Bigeye monitoring for <dataset>.<table>"
git push origin main
What happens automatically:
- The
bqetl_artifact_deploymentDAG detects bigconfig.yml changes - The
publish_bigeye_monitorstask deploys all bigConfig files - Bigeye metrics are created/updated based on your configuration
- Custom SQL rules are deployed (if bigeye_custom_rules.sql exists)
This approach is recommended because:
- Ensures all bigconfig.yml files are deployed together (prevents accidental deletions)
- No need to manage
BIGEYE_API_KEYlocally - Consistent with Mozilla's deployment practices
- Deployment history tracked in git
Alternative: Manual deployment (discouraged)
⚠️ CAUTION: Avoid running
./bqetl monitoring deploylocally unless absolutely necessary. Local deployment can accidentally delete metrics if config files are not included. See docs/reference/bigconfig.md for details.
If you must deploy manually (e.g., for testing in non-production):
./bqetl monitoring deploy <dataset>.<table> --dry-run # Review changes first
./bqetl monitoring deploy <dataset>.<table> # Requires BIGEYE_API_KEY
Step 8: Test Monitoring (Optional)
After deployment, you can manually trigger monitoring checks to verify configuration:
./bqetl monitoring run <dataset>.<table> # Requires BIGEYE_API_KEY
What it does:
- Triggers all metric checks for the table
- Runs custom SQL rules
- Returns success/failure status
- Provides links to Bigeye UI for details
When to test:
- After automatic deployment via DAG completes
- After modifying monitoring configuration
- Debugging false positives/negatives
Alternative: Wait for Bigeye's scheduled runs or check results in the Bigeye UI
Common Monitoring Patterns
Standard workflow for all patterns:
- Add/update
monitoringsection in metadata.yaml - Run:
./bqetl monitoring update <dataset>.<table> - Run:
./bqetl monitoring validate <dataset>.<table> - Commit and push to main branch (automatic deployment)
Pattern 1: Basic Daily Table
Use assets/metadata_monitoring_basic.yaml template. Enables freshness and volume checks, non-blocking.
Pattern 2: Critical Production Table
Use assets/metadata_monitoring_critical.yaml template. Sets blocking: true and assigns to "Operational Checks" collection.
Pattern 3: View with Monitoring
Use assets/metadata_monitoring_view.yaml template. Must set partition_column and partition_column_set: true.
Pattern 4: Column-Level Validation
After generating basic bigconfig.yml, manually edit to add column-specific metrics. See sql/bigconfig.yml for available saved metrics (is_not_null, is_unique, is_valid_client_id, etc.).
Pattern 5: Custom Business Logic
Create bigeye_custom_rules.sql using assets/custom_rules_template.sql. Query must return percentage (0-100) or count. Configure via JSON comment block.
Integration with Other Skills
Works with metadata-manager
When metadata-manager creates new tables:
- metadata-manager should ask: "Would you like to enable Bigeye monitoring?"
- If yes, metadata-manager invokes bigconfig-generator skill
- bigconfig-generator adds monitoring configuration to metadata.yaml
- Generates bigconfig.yml via bqetl CLI
Workflow:
- metadata-manager creates schema.yaml, metadata.yaml
- metadata-manager asks about monitoring
- If yes → invoke bigconfig-generator
- bigconfig-generator adds monitoring section to metadata.yaml
- bigconfig-generator runs
./bqetl monitoring update - User validates, commits, and pushes to main (automatic deployment via DAG)
Works with bigquery-etl-core
- Uses project structure conventions
- Follows naming patterns (dataset.table)
- References common partitioning strategies (submission_date)
Troubleshooting
Deployment Errors
Deployment delays:
- Deployment happens automatically after merge to main via
bqetl_artifact_deploymentDAG - Check DAG status in Airflow UI if deployment seems delayed
- Typical deployment time: within 1 hour of merge
"Table does not exist in Bigeye"
- Table not yet ingested by Bigeye
- Wait for next schema sync or manually sync in Bigeye UI
- Check with Data Engineering if table is not appearing
"Partition column does not exist"
- Verify
partition_columnmatches actual column in schema.yaml - Check for typos in column name
Manual deployment errors (if using ./bqetl monitoring deploy): "Bigeye API token needs to be set"
- Set
BIGEYE_API_KEYenvironment variable - Note: Manual deployment is discouraged; prefer automatic DAG deployment
Validation Errors
"Duplicate deployments"
- Same column selector appears multiple times
- Consolidate metrics under single deployment
"Invalid metric"
- Referencing non-existent saved_metric_id
- Check sql/bigconfig.yml for available metrics
"Partition column needs to be configured"
- For views with monitoring enabled
- Add
partition_columnandpartition_column_set: trueto metadata.yaml
False Positives
Freshness checks failing:
- Verify table actually updated (query BigQuery)
- Check partition_column is correct
- Verify Bigeye's schedule aligns with table update schedule
- Consider longer lookback window
Volume checks failing:
- Normal for tables with varying row counts
- Consider disabling volume checks
- Use longer lookback window
- Adjust thresholds in bigconfig.yml
Best Practices
When to Enable Monitoring
Always enable:
- Production tables in dashboards/reports
- Tables with SLAs or freshness requirements
- Critical pipeline outputs
Consider enabling:
- Development/staging tables (for testing configs)
- Tables with known data quality issues
Skip monitoring:
- Temporary/scratch tables
- One-time analysis tables
- Tables with no consumers
Blocking vs Non-Blocking
Use blocking: true when:
- Failures must halt deployments
- Table is production-critical
- False positives are rare and quickly resolved
Use blocking: false when:
- Failures should alert but not block
- Table is still stabilizing
- False positives are expected
Collections
Use consistent naming:
- Group related tables by team/product
- Configure notification channels once per collection
- Makes alert management easier
Common collections:
- Team: "Subscription Platform", "Ads Team", "Growth Team"
- Function: "Operational Checks", "Data Quality"
- Environment: "Test", "Staging"
Custom Rules
Best practices:
- Return percentage (0-100) for "value" alert_conditions
- Return count for "count" alert_conditions
- Use descriptive rule names
- Set appropriate min/max ranges
- Document rule purpose in comments
- Test rules manually before deploying
Reference Documentation
Official Documentation (Always Preferred):
- docs/reference/bigconfig.md - Canonical reference for bigConfig in this repository
- sql/bigconfig.yml - Source of truth for available saved metrics
- https://mozilla.github.io/bigquery-etl/reference/bigconfig/ - Published docs
- https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html - Bigeye intro
- https://docs.bigeye.com/docs/bigconfig - Bigeye official documentation
Quick Reference (This Skill):
references/monitoring_patterns.md- Workflow guidance and common patterns (may be outdated)assets/metadata_monitoring_basic.yaml- Basic monitoring config templateassets/metadata_monitoring_critical.yaml- Critical table config templateassets/metadata_monitoring_view.yaml- View monitoring config templateassets/custom_rules_template.sql- Custom SQL rule template
Priority: When in doubt, read docs/reference/bigconfig.md or use WebFetch on the online docs.
Quick Reference: bqetl Monitoring Commands
# Refresh the collections reference file (run periodically to stay current)
python3 .claude/skills/bigconfig-generator/scripts/extract_collections.py
# Generate/update bigconfig.yml from metadata.yaml
./bqetl monitoring update <dataset>.<table>
# Validate bigconfig.yml syntax and configuration
./bqetl monitoring validate <dataset>.<table>
# ⚠️ DISCOURAGED: Manual deployment (prefer automatic DAG deployment)
./bqetl monitoring deploy <dataset>.<table> --dry-run # Requires BIGEYE_API_KEY
./bqetl monitoring deploy <dataset>.<table> # Requires BIGEYE_API_KEY
# Manually trigger monitoring checks (requires BIGEYE_API_KEY)
./bqetl monitoring run <dataset>.<table>
# Delete deployed monitoring (requires BIGEYE_API_KEY)
./bqetl monitoring delete <dataset>.<table> --metrics --custom-sql
Recommended workflow:
- Check
references/existing_collections.mdfor appropriate collection/channels - Update/create bigconfig.yml using
monitoring update - Validate using
monitoring validate - Commit and push to main branch
bqetl_artifact_deploymentDAG automatically deploys changes
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
model-requirements
Use this skill when gathering requirements for new BigQuery data models OR when asked to edit existing queries in bqetl. For new models, guides structured requirements interviews. For existing queries, understands current model, checks downstream dependencies, and gathers requirements for changes. Works as pre-planning before query-writer skill.
metadata-manager
Use this skill when creating or updating DAG configurations (dags.yaml), schema.yaml, and metadata.yaml files for BigQuery tables. Handles creating new DAGs when needed and coordinates test updates when queries are modified (invokes sql-test-generator as needed). Works with bigquery-etl-core, query-writer, and sql-test-generator skills.
bigquery-etl-core
The core skill for working within the bigquery-etl repository. Use this skill when understanding project structure, conventions, and common patterns. Works with model-requirements, query-writer, metadata-manager, sql-test-generator, and bigconfig-generator skills.
query-writer
Use this skill when writing or updating SQL queries (query.sql) or Python ETL scripts (query.py) following Mozilla BigQuery ETL conventions. ALWAYS checks for and updates existing tests when modifying queries. Coordinates downstream updates to schemas and tests. Works with bigquery-etl-core, metadata-manager, and sql-test-generator skills.
schema-readme-generator
Use this skill to create or update README.md files for BigQuery ETL tables in the mozilla bigquery-etl repository. Follows layout conventions derived from comparing README files across the repo — rich style with emoji headings, Mermaid data flow diagram, graduated example queries, and concise metadata overview table. Requires schema.yaml with complete descriptions (run schema-enricher first if needed) and a complete metadata.yaml.
sql-test-generator
ALWAYS use this skill when users ask to create, generate, or write UNIT TESTS for BigQuery SQL queries. Invoke proactively whenever the request includes "test" or "tests" with a query/table name. This skill is for unit testing ONLY (not data quality checks - use bigconfig-generator for Bigeye monitoring). Works with bigquery-etl-core skill to understand query patterns.
Didn't find tool you were looking for?