Agent skill
gsplat-optimizer
Optimize 3D Gaussian Splat scenes for real-time rendering on iOS, macOS, and visionOS. Use when working with .ply or .splat files, targeting mobile/Apple GPU performance, or needing LOD, pruning, or compression strategies for 3DGS scenes.
Install this agent skill to your Project
npx add-skill https://github.com/ckorhonen/claude-skills/tree/main/skills/gsplat-optimizer
SKILL.md
Gaussian Splat Optimizer
Optimize 3D Gaussian Splatting scenes for real-time rendering on Apple platforms (iOS, macOS, visionOS) using Metal.
When to Use
- Optimizing
.plyor.splatfiles for mobile/Apple GPU targets - Reducing gaussian count for performance (pruning strategies)
- Implementing Level-of-Detail (LOD) for large scenes
- Compressing splat data for bandwidth/storage constraints
- Profiling and optimizing Metal rendering performance
- Targeting specific FPS goals on Apple hardware
Quick Start
Input: Provide a .ply/.splat file path, target device class, and FPS target.
# Analyze a splat file
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --device iphone --fps 60
Output: The skill provides:
- Point/gaussian pruning plan (opacity, size, error thresholds)
- LOD scheme suggestion (distance bins, gaussian subsets)
- Compression recommendation (if bandwidth/storage bound)
- Metal profiling checklist with shader/compute tips
Optimization Workflow
Step 1: Analyze the Scene
First, understand your scene characteristics:
- Gaussian count: Total number of splats
- Opacity distribution: Histogram of opacity values
- Size distribution: Gaussian scale statistics
- Memory footprint: Estimated GPU memory usage
Step 2: Determine Target Device
| Device Class | GPU Budget | Max Gaussians (60fps) | Storage Mode |
|---|---|---|---|
| iPhone (A15+) | 4-6GB unified | ~2-4M | Shared |
| iPad Pro (M1+) | 8-16GB unified | ~6-8M | Shared |
| Mac (M1-M3) | 8-24GB unified | ~8-12M | Shared/Managed |
| Vision Pro | 16GB unified | ~4-6M (stereo) | Shared |
| Mac (discrete GPU) | 8-24GB VRAM | ~10-15M | Private |
Step 3: Apply Pruning
If gaussian count exceeds device budget:
- Opacity threshold: Remove gaussians with opacity < 0.01-0.05
- Size culling: Remove sub-pixel gaussians (< 1px at target resolution)
- Importance pruning: Use LODGE algorithm for error-proxy selection
- Foveated rendering: For Vision Pro, reduce density in peripheral view
See references/pruning-strategies.md for details.
Step 4: Implement LOD (Large Scenes)
For scenes exceeding single-frame budget:
- Distance bins: Near (0-10m), Mid (10-50m), Far (50m+)
- Hierarchical structure: Octree or LoD tree for spatial queries
- Chunk streaming: Load/unload based on camera position
- Smooth transitions: Opacity blending at chunk boundaries
See references/lod-schemes.md for details.
Step 5: Apply Compression (If Needed)
For bandwidth/storage constraints:
| Method | Compression | Use Case |
|---|---|---|
| SOGS | 20x | Web delivery, moderate quality |
| SOG | 24x | Web delivery, better quality |
| CodecGS | 30x+ | Maximum compression |
| C3DGS | 31x | Fast rendering priority |
See references/compression.md for details.
Step 6: Profile and Optimize Metal
- Choose storage mode: Private for static data, Shared for dynamic
- Optimize shaders: Function constants, thread occupancy
- Profile with Xcode: GPU Frame Capture, Metal System Trace
- Iterate: Measure, optimize, repeat
See references/metal-profiling.md for details.
Common Pitfalls
1. Point Cloud Density Mismatch
Problem: Gaussian count doesn't match your scene complexity, causing either visual artifacts or wasted GPU resources.
- Too sparse (undersampling): Visible gaps, blockiness, loss of fine details
- Too dense (oversampling): Exceeds device budget, causes frame drops, GPU thrashing
Debugging:
# Analyze gaussian distribution
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --histogram
# Check against device budget
# Compare total_gaussians vs. device_max in the output table
Strategy:
- Start with device budget from Step 2 (e.g., 4M for iPhone)
- If scene exceeds budget by >20%, apply pruning before training
- If visual quality drops too much after pruning, consider LOD or chunking
- Use importance-weighted sampling (LODGE) to remove low-contribution gaussians, not just opaque ones
2. Training Instability (Gradient Explosions, Divergence)
Problem: During optimization (if fine-tuning on device), gaussian parameters diverge, causing:
- Loss suddenly jumps to NaN
- Gaussians disappear or explode in scale
- Model becomes unrecoverable mid-session
Debugging:
# Monitor loss during training
tail -f training.log | grep -E "loss|nan|inf"
# Check gradient magnitudes
python -c "
import numpy as np
from plyfile import PlyData
ply_data = PlyData.read('scene.ply')
scales = ply_data['vertex']['scale_0'].data
print(f'Scale range: {scales.min():.6f} to {scales.max():.6f}')
print(f'Any NaN: {np.isnan(scales).any()}')
"
Strategy:
- Gradient clipping: Cap gradient updates to ±0.1 scale per step
- Learning rate decay: Start at 1e-4, decay by 0.95 every epoch
- Loss regularization: Add L2 penalty on scale magnitudes to prevent explosions
- Checkpoint early: Save state every 10 iterations; rollback if loss spikes
- Freeze covariance: If converged, stop updating scale/rotation after 80% of training
- For device training: Reduce batch size or resolution if instability persists
3. Memory Limitations (OOM Errors on Large Scenes)
Problem: Scene exceeds available unified memory, causing allocation failures or GPU stalls.
- iPhone: 4–6GB shared between app + GPU
- iPad Pro: 8–16GB shared
- Vision Pro: 16GB (but stereo doubles gaussian count)
Debugging:
# Estimate memory footprint
python << 'EOF'
num_gaussians = 5_000_000 # Your count
bytes_per_gaussian = 56 # pos (12) + scale (12) + rot quaternion (16) + opacity (4) + SH DC (12)
total_mb = (num_gaussians * bytes_per_gaussian) / (1024 ** 2)
print(f"Est. memory: {total_mb:.1f} MB")
print(f"Safe for iPhone A15: {total_mb < 2000}") # Leave headroom for app
EOF
# Monitor live memory in Xcode
# Memory graph + Allocations instrument during scene load
Strategy:
- Chunking for large scenes: Break into 1–4M gaussian chunks, stream based on camera distance
- Quantization: Store gaussians in FP16 instead of FP32 (2x memory reduction)
- Pruning first: Remove <0.01 opacity or sub-pixel gaussians before transfer to device
- Lazy loading: Keep only active LOD level in memory; unload far chunks
- Vision Pro consideration: Dual-eye rendering = 2x gaussian count; cap at 4M per eye
4. Quality/Speed Trade-Offs (Over-Optimization for One Metric)
Problem: Optimizing heavily for one metric breaks another:
- Maximize FPS → visual artifacts: Over-pruning removes important geometry
- Maximize quality → frame drops: Too many gaussians for target device
- Minimize memory → banding/posterization: Excessive quantization or LOD culling
Debugging:
# Profile before/after each change
python << 'EOF'
metrics = {
"original": {"fps": 60, "gaussians": 5_000_000, "artifacts": "none"},
"after_pruning": {"fps": 58, "gaussians": 3_500_000, "artifacts": "block edges visible"},
}
for label, m in metrics.items():
print(f"{label}: {m['fps']}fps, {m['gaussians']/1e6:.1f}M, {m['artifacts']}")
EOF
Strategy:
- Define priority: Is this device speed-critical (AR, real-time) or quality-focused (preview)?
- Measure baseline: Profile original unoptimized scene first
- Iterate incrementally: Apply one optimization (pruning OR compression OR LOD), measure, decide
- Preserve quality metrics: Keep PSNR/SSIM scores; stop pruning if quality drops >1dB
- Target range: Aim for 50–60fps headroom (don't max out at exactly 60fps; device will throttle)
5. Real-Time Rendering Failures (Frame Drops, Shader Compilation)
Problem: Rendering pipeline stalls despite low gaussian count:
- First frame (cold start): 2–5s delay while shaders compile
- Mid-scene: Frame drops spike when new LOD levels load
- Smooth playback → stuttering after 30–60s
Debugging:
# Capture Metal frame statistics
# In Xcode: Product > Scheme > Edit > Run > Diagnostics
# Enable: Metal API Validation, GPU Frame Capture
# Check shader compilation time
python ~/.claude/skills/gsplat-optimizer/scripts/metal_profile.py \
--capture-shader-compile \
--target iphone14
# Monitor frame time distribution
tail -f xcode.log | grep -E "frame_time|stutter"
Strategy:
- Pre-warm shader cache: Compile all function variants on first load (avoid runtime jank)
- Limit LOD transitions: If using multiple LOD levels, cap transitions to 2 per frame
- Asynchronous streaming: Load new geometry chunks on background thread, upload in-between frames
- Device-specific tuning:
- iPhone: Keep draw calls < 50, geometry per call < 500K gaussians
- Mac: More generous; aim for < 2M gaussians per draw call
- Vision Pro: Account for stereo; effective capacity is half the budget
- Profile regimen: Run Metal System Trace before and after each optimization; track:
- GPU utilization (target 70–85%)
- Shader time (target <10ms)
- Memory bandwidth (target <50GB/s)
Key Metrics
| Metric | Target | How to Measure |
|---|---|---|
| Frame time | 16.6ms (60fps) | Metal System Trace |
| GPU memory | < device budget | Xcode Memory Graph |
| Bandwidth | < 50GB/s | GPU Counters |
| Shader time | < 10ms | GPU Frame Capture |
Reference Implementation
MetalSplatter is the primary reference for Swift/Metal gaussian splatting:
- Repository: https://github.com/scier/MetalSplatter
- Supports iOS, macOS, visionOS
- ~8M splat capacity with v1.1 optimizations
- Stereo rendering for Vision Pro
Getting Started with MetalSplatter
git clone https://github.com/scier/MetalSplatter.git
cd MetalSplatter
open SampleApp/MetalSplatter_SampleApp.xcodeproj
# Set to Release scheme for best performance
Resources
Reference Documentation
- Pruning Strategies - Gaussian reduction techniques
- LOD Schemes - Level-of-detail approaches
- Compression - Bandwidth/storage optimization
- Metal Profiling - Apple GPU optimization
Research Papers
- LODGE - LOD for large-scale scenes
- FLoD - Flexible LOD for variable hardware
- Voyager - City-scale mobile rendering
- 3DGS Compression Survey
Apple Developer Resources
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
subway-info
Get real-time NYC transit information — subway, bus, ferry, and commuter rail — via the subway-info CLI or REST API at subwayinfo.nyc. Use when asked about NYC subway status, train times, bus routes, ferry schedules, transit delays, MTA service alerts, or "what's the next train to X".
codex-advisor
Get a second opinion from OpenAI Codex CLI for plan reviews, code reviews, architecture decisions, and hard problems. Use when you need external validation, want to compare approaches, or are stuck on a difficult problem.
brainstorming
Explore user intent, requirements and design before implementation through structured dialogue and design proposals. Use when asked to: create features, build components, add functionality, modify behavior, plan projects, or when user says 'help me design X', 'what should we build', 'let's brainstorm', or starts describing a new feature without a design.
direct-mail-strategist
Expert direct mail marketing strategist for writing compelling copy, designing high-converting mail pieces, and developing measurement strategies. Use when planning direct mail campaigns, writing mailer copy, designing postcards/letters, or measuring campaign effectiveness with incremental lift analysis.
gemini-image-generator
Generate images using Google's Gemini API. Use when creating images from text prompts, editing existing images, or combining reference images for AI-generated visual content.
ui-design
Opinionated constraints for building better interfaces with agents. Use when building UI components, implementing animations, designing layouts, reviewing frontend accessibility, or working with Tailwind CSS, motion/react, or accessible primitives like Radix/Base UI.
Didn't find tool you were looking for?