Agent skill
spark-basics
PySpark fundamentals for distributed data processing.
Install this agent skill to your Project
npx add-skill https://github.com/timequity/vibe-coder/tree/main/skills/data/spark-basics
SKILL.md
Spark Basics
SparkSession
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("ETL Job") \
.config("spark.sql.adaptive.enabled", "true") \
.getOrCreate()
Reading Data
# CSV
df = spark.read.csv("s3://bucket/data.csv", header=True, inferSchema=True)
# Parquet
df = spark.read.parquet("s3://bucket/data/")
# JSON
df = spark.read.json("s3://bucket/data.json")
# Delta Lake
df = spark.read.format("delta").load("s3://bucket/delta/")
Transformations
from pyspark.sql import functions as F
# Select and rename
df = df.select(
F.col("id").alias("user_id"),
F.col("name"),
F.col("created_at").cast("timestamp")
)
# Filter
df = df.filter(F.col("status") == "active")
# Aggregate
summary = df.groupBy("category").agg(
F.count("*").alias("count"),
F.sum("amount").alias("total"),
F.avg("amount").alias("average")
)
# Join
result = orders.join(customers, "customer_id", "left")
# Window functions
from pyspark.sql.window import Window
window = Window.partitionBy("user_id").orderBy("created_at")
df = df.withColumn("row_num", F.row_number().over(window))
Writing Data
# Parquet with partitions
df.write \
.partitionBy("year", "month") \
.mode("overwrite") \
.parquet("s3://bucket/output/")
# Delta Lake
df.write \
.format("delta") \
.mode("merge") \
.save("s3://bucket/delta/")
Optimization
- Use
cache()for reused DataFrames - Avoid
collect()on large data - Broadcast small tables
- Repartition before joins
- Use predicate pushdown
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
mvp-help
Help and documentation for Idea to MVP plugin. Use when: user asks about building MVPs, vibe coding, or available commands. Triggers: "help", "what can you do", "mvp help", "how to build".
verification-gate
Hidden quality gate that runs before showing "Done!" to user - ensures all tests pass, build succeeds, and requirements met before claiming completion
brainstorming
Refine ideas into detailed designs through Socratic dialogue. Use when: user has rough idea, needs to clarify requirements, explore approaches. Triggers: "brainstorm", "discuss idea", "I'm thinking about", "what if", "help me think through", "explore options", "/brainstorm".
subagent-creator
Guide for creating effective subagents (custom agents). Use when users want to create a new subagent that can be dispatched via Task tool for autonomous work. Covers frontmatter fields (name, description, tools, model, permissionMode, skills), prompt design, and when to use subagents vs skills.
backend-rust
Modern Rust backend with Axum, SQLx, tokio + CI/CD automation. Use when: building Rust APIs, high-performance services, or needing build/test/lint/audit automation. Triggers: "axum", "rust backend", "rust api", "sqlx", "tokio", "cargo build", "cargo test", "clippy", "rustfmt", "cargo-audit", "cross-compile", "rust ci", "release build", "rust security", "shuttle", "actix".
test-driven-development
Write failing test first, then minimal code to pass. Red-Green-Refactor cycle. Use when: implementing features, fixing bugs, refactoring code. Triggers: "implement", "add feature", "fix bug", "tdd", "test first", "write tests", "test-driven".
Didn't find tool you were looking for?