Agent skill

spark-basics

PySpark fundamentals for distributed data processing.

Stars 0
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/timequity/vibe-coder/tree/main/skills/data/spark-basics

SKILL.md

Spark Basics

SparkSession

python
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("ETL Job") \
    .config("spark.sql.adaptive.enabled", "true") \
    .getOrCreate()

Reading Data

python
# CSV
df = spark.read.csv("s3://bucket/data.csv", header=True, inferSchema=True)

# Parquet
df = spark.read.parquet("s3://bucket/data/")

# JSON
df = spark.read.json("s3://bucket/data.json")

# Delta Lake
df = spark.read.format("delta").load("s3://bucket/delta/")

Transformations

python
from pyspark.sql import functions as F

# Select and rename
df = df.select(
    F.col("id").alias("user_id"),
    F.col("name"),
    F.col("created_at").cast("timestamp")
)

# Filter
df = df.filter(F.col("status") == "active")

# Aggregate
summary = df.groupBy("category").agg(
    F.count("*").alias("count"),
    F.sum("amount").alias("total"),
    F.avg("amount").alias("average")
)

# Join
result = orders.join(customers, "customer_id", "left")

# Window functions
from pyspark.sql.window import Window

window = Window.partitionBy("user_id").orderBy("created_at")
df = df.withColumn("row_num", F.row_number().over(window))

Writing Data

python
# Parquet with partitions
df.write \
    .partitionBy("year", "month") \
    .mode("overwrite") \
    .parquet("s3://bucket/output/")

# Delta Lake
df.write \
    .format("delta") \
    .mode("merge") \
    .save("s3://bucket/delta/")

Optimization

  • Use cache() for reused DataFrames
  • Avoid collect() on large data
  • Broadcast small tables
  • Repartition before joins
  • Use predicate pushdown

Expand your agent's capabilities with these related and highly-rated skills.

timequity/vibe-coder

mvp-help

Help and documentation for Idea to MVP plugin. Use when: user asks about building MVPs, vibe coding, or available commands. Triggers: "help", "what can you do", "mvp help", "how to build".

0 0
Explore
timequity/vibe-coder

verification-gate

Hidden quality gate that runs before showing "Done!" to user - ensures all tests pass, build succeeds, and requirements met before claiming completion

0 0
Explore
timequity/vibe-coder

brainstorming

Refine ideas into detailed designs through Socratic dialogue. Use when: user has rough idea, needs to clarify requirements, explore approaches. Triggers: "brainstorm", "discuss idea", "I'm thinking about", "what if", "help me think through", "explore options", "/brainstorm".

0 0
Explore
timequity/vibe-coder

subagent-creator

Guide for creating effective subagents (custom agents). Use when users want to create a new subagent that can be dispatched via Task tool for autonomous work. Covers frontmatter fields (name, description, tools, model, permissionMode, skills), prompt design, and when to use subagents vs skills.

0 0
Explore
timequity/vibe-coder

backend-rust

Modern Rust backend with Axum, SQLx, tokio + CI/CD automation. Use when: building Rust APIs, high-performance services, or needing build/test/lint/audit automation. Triggers: "axum", "rust backend", "rust api", "sqlx", "tokio", "cargo build", "cargo test", "clippy", "rustfmt", "cargo-audit", "cross-compile", "rust ci", "release build", "rust security", "shuttle", "actix".

0 0
Explore
timequity/vibe-coder

test-driven-development

Write failing test first, then minimal code to pass. Red-Green-Refactor cycle. Use when: implementing features, fixing bugs, refactoring code. Triggers: "implement", "add feature", "fix bug", "tdd", "test first", "write tests", "test-driven".

0 0
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results