Agent skill

cudf-analytics

Use for GPU-accelerated data analysis on datasets, CSVs, or tabular data using NVIDIA cuDF. Triggers when tasks involve groupby aggregations, statistical summaries, anomaly detection, or large-scale data profiling.

Stars 18,556
Forks 2,584

Install this agent skill to your Project

npx add-skill https://github.com/langchain-ai/deepagents/tree/main/examples/nvidia_deep_agent/skills/cudf-analytics

SKILL.md

cuDF Analytics Skill

GPU-accelerated data analysis using NVIDIA RAPIDS cuDF. cuDF provides a pandas-like API that runs on NVIDIA GPUs, enabling massive speedups on large datasets.

When to Use This Skill

Use this skill when:

  • Analyzing CSV files, datasets, or tabular data
  • Computing statistical summaries (mean, median, std, quartiles)
  • Performing groupby aggregations
  • Detecting anomalies or outliers in data
  • Profiling datasets with millions of rows
  • Computing correlation matrices

Initialization (REQUIRED)

Always start every script with this boilerplate. It tests actual GPU operations, not just import.

python
import pandas as pd

try:
    import cudf
    # Smoke-test: verify GPU compute AND host transfer both work
    _test = cudf.Series([1, 2, 3])
    assert _test.sum() == 6
    assert _test.to_pandas().tolist() == [1, 2, 3]
    GPU = True
except Exception as e:
    print(f"[GPU] cudf unavailable, falling back to pandas: {e}")
    GPU = False

def read_csv(path):
    return cudf.read_csv(path) if GPU else pd.read_csv(path)

def to_pd(df):
    """Convert cuDF DataFrame/Series to pandas. Use this instead of .to_pandas() directly."""
    if not GPU:
        return df
    try:
        return df.to_pandas()
    except Exception as e:
        print(f"[GPU] .to_pandas() failed, using Arrow fallback: {e}")
        return df.to_arrow().to_pandas()

Quick Reference

cuDF mirrors the pandas API. Common operations:

Read Data

python
df = read_csv("data.csv")

Statistical Summary

python
# Use to_pd() when you need pandas output
summary = to_pd(df[["value", "score"]].describe())

# Scalar values work directly with float()
mean_val = float(df["value"].mean())
q1 = float(df["value"].quantile(0.25))

# Correlation
corr = float(df["value"].corr(df["score"]))

Groupby Aggregation

python
result = df.groupby("category").agg({
    "revenue": ["sum", "mean", "count"],
    "quantity": ["sum", "mean"],
})
result_pd = to_pd(result)

Anomaly Detection (IQR Method)

python
col = "value"
Q1 = float(df[col].quantile(0.25))
Q3 = float(df[col].quantile(0.75))
IQR = Q3 - Q1
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
outliers = to_pd(df[(df[col] < lower) | (df[col] > upper)])

Anomaly Detection (Z-Score Method)

python
mean = float(df[col].mean())
std = float(df[col].std())
df["z_score"] = (df[col] - mean) / std
anomalies = to_pd(df[df["z_score"].abs() > 3])

Filtering and Selection

python
# Filter rows
filtered = df[df["status"] == "active"]

# Select columns
subset = df[["name", "revenue", "date"]]

# Sort
sorted_df = df.sort_values("revenue", ascending=False)

# Convert to pandas for final output / iteration
result_pd = to_pd(sorted_df)

Data Type Requirements

cuDF requires explicit type specification for optimal performance:

  • Use float32 or float64 for numeric data
  • Use int32 or int64 for integer data
  • String columns use cuDF's string dtype automatically

Output Guidelines

When reporting analysis results:

  • Include dataset dimensions (rows x columns)
  • Show key statistics in formatted tables
  • Highlight notable patterns, trends, or anomalies
  • Provide both summary statistics and specific examples
  • Note any data quality issues (missing values, outliers)

Expand your agent's capabilities with these related and highly-rated skills.

langchain-ai/deepagents

cuml-machine-learning

Use for GPU-accelerated machine learning on tabular data using NVIDIA cuML. Triggers when tasks involve classification, regression, clustering, dimensionality reduction, or model training on datasets.

18,556 2,584
Explore
langchain-ai/deepagents

data-visualization

Use for creating publication-quality charts and multi-panel analysis summaries. Triggers when tasks involve visualizing data, plotting results, creating charts, or producing visual reports from analysis output.

18,556 2,584
Explore
langchain-ai/deepagents

gpu-document-processing

Use when processing large PDFs, document collections, or bulk text extraction tasks that benefit from GPU-accelerated processing. Triggers when the user provides large documents or needs bulk document analysis.

18,556 2,584
Explore
langchain-ai/deepagents

schema-exploration

Lists tables, describes columns and data types, identifies foreign key relationships, and maps entity relationships in a database. Use when the user asks about database schema, table structure, column types, what tables exist, ERD, foreign keys, or how entities relate.

18,556 2,584
Explore
langchain-ai/deepagents

query-writing

Writes and executes SQL queries from simple SELECTs to complex multi-table JOINs, aggregations, and subqueries. Use when the user asks to query a database, write SQL, run a SELECT statement, retrieve data, filter records, or generate reports from database tables.

18,556 2,584
Explore
langchain-ai/deepagents

social-media

Drafts engaging social media posts, writes hooks, suggests hashtags, creates thread structures, and generates companion images. Use when the user asks to write a LinkedIn post, tweet, Twitter/X thread, social media caption, social post, or repurpose content for social platforms.

18,556 2,584
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results