Agent skills
senior-computer-vision

Agent skill

senior-computer-vision

Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.

View SKILL.md on GitHub Repository

Stars 8,805

Forks 1,070

Install this agent skill to your Project

npx add-skill https://github.com/alirezarezvani/claude-skills/tree/main/engineering-team/senior-computer-vision

SKILL.md

Senior Computer Vision Engineer

Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.

Quick Start
Core Expertise
Tech Stack
Workflow 1: Object Detection Pipeline
Workflow 2: Model Optimization and Deployment
Workflow 3: Custom Dataset Preparation
Architecture Selection Guide
Reference Documentation
Common Commands

Quick Start

bash

# Generate training configuration for YOLO or Faster R-CNN
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8

# Analyze model for optimization opportunities (quantization, pruning)
python scripts/inference_optimizer.py model.pt --target onnx --benchmark

# Build dataset pipeline with augmentations
python scripts/dataset_pipeline_builder.py images/ --format coco --augment

Core Expertise

This skill provides guidance on:

Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2
Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything)
Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
Video Analysis: Object tracking (ByteTrack, SORT), action recognition
3D Vision: Depth estimation, point cloud processing, NeRF
Production Deployment: ONNX, TensorRT, OpenVINO, CoreML

Tech Stack

Category	Technologies
Frameworks	PyTorch, torchvision, timm
Detection	Ultralytics (YOLO), Detectron2, MMDetection
Segmentation	segment-anything, mmsegmentation
Optimization	ONNX, TensorRT, OpenVINO, torch.compile
Image Processing	OpenCV, Pillow, albumentations
Annotation	CVAT, Label Studio, Roboflow
Experiment Tracking	MLflow, Weights & Biases
Serving	Triton Inference Server, TorchServe

Workflow 1: Object Detection Pipeline

Use this workflow when building an object detection system from scratch.

Step 1: Define Detection Requirements

Analyze the detection task requirements:

Detection Requirements Analysis:
- Target objects: [list specific classes to detect]
- Real-time requirement: [yes/no, target FPS]
- Accuracy priority: [speed vs accuracy trade-off]
- Deployment target: [cloud GPU, edge device, mobile]
- Dataset size: [number of images, annotations per class]

Step 2: Select Detection Architecture

Choose architecture based on requirements:

Requirement	Recommended Architecture	Why
Real-time (>30 FPS)	YOLOv8/v11, RT-DETR	Single-stage, optimized for speed
High accuracy	Faster R-CNN, DINO	Two-stage, better localization
Small objects	YOLO + SAHI, Faster R-CNN + FPN	Multi-scale detection
Edge deployment	YOLOv8n, MobileNetV3-SSD	Lightweight architectures
Transformer-based	DETR, DINO, RT-DETR	End-to-end, no NMS required

Step 3: Prepare Dataset

Convert annotations to required format:

bash

# COCO format (recommended)
python scripts/dataset_pipeline_builder.py data/images/ \
    --annotations data/labels/ \
    --format coco \
    --split 0.8 0.1 0.1 \
    --output data/coco/

# Verify dataset
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"

Step 4: Configure Training

Generate training configuration:

bash

# For Ultralytics YOLO
python scripts/vision_model_trainer.py data/coco/ \
    --task detection \
    --arch yolov8m \
    --epochs 100 \
    --batch 16 \
    --imgsz 640 \
    --output configs/

# For Detectron2
python scripts/vision_model_trainer.py data/coco/ \
    --task detection \
    --arch faster_rcnn_R_50_FPN \
    --framework detectron2 \
    --output configs/

Step 5: Train and Validate

bash

# Ultralytics training
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640

# Detectron2 training
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1

# Validate on test set
yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml

Step 6: Evaluate Results

Key metrics to analyze:

Metric	Target	Description
mAP@50	>0.7	Mean Average Precision at IoU 0.5
mAP@50:95	>0.5	COCO primary metric
Precision	>0.8	Low false positives
Recall	>0.8	Low missed detections
Inference time	<33ms	For 30 FPS real-time

Workflow 2: Model Optimization and Deployment

Use this workflow when preparing a trained model for production deployment.

Step 1: Benchmark Baseline Performance

bash

# Measure current model performance
python scripts/inference_optimizer.py model.pt \
    --benchmark \
    --input-size 640 640 \
    --batch-sizes 1 4 8 16 \
    --warmup 10 \
    --iterations 100

Expected output:

Baseline Performance (PyTorch FP32):
- Batch 1: 45.2ms (22.1 FPS)
- Batch 4: 89.4ms (44.7 FPS)
- Batch 8: 165.3ms (48.4 FPS)
- Memory: 2.1 GB
- Parameters: 25.9M

Step 2: Select Optimization Strategy

Deployment Target	Optimization Path
NVIDIA GPU (cloud)	PyTorch → ONNX → TensorRT FP16
NVIDIA GPU (edge)	PyTorch → TensorRT INT8
Intel CPU	PyTorch → ONNX → OpenVINO
Apple Silicon	PyTorch → CoreML
Generic CPU	PyTorch → ONNX Runtime
Mobile	PyTorch → TFLite or ONNX Mobile

Step 3: Export to ONNX

bash

# Export with dynamic batch size
python scripts/inference_optimizer.py model.pt \
    --export onnx \
    --input-size 640 640 \
    --dynamic-batch \
    --simplify \
    --output model.onnx

# Verify ONNX model
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"

Step 4: Apply Quantization (Optional)

For INT8 quantization with calibration:

bash

# Generate calibration dataset
python scripts/inference_optimizer.py model.onnx \
    --quantize int8 \
    --calibration-data data/calibration/ \
    --calibration-samples 500 \
    --output model_int8.onnx

Quantization impact analysis:

Precision	Size	Speed	Accuracy Drop
FP32	100%	1x	0%
FP16	50%	1.5-2x	<0.5%
INT8	25%	2-4x	1-3%

Step 5: Convert to Target Runtime

bash

# TensorRT (NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16

# OpenVINO (Intel)
mo --input_model model.onnx --output_dir openvino/

# CoreML (Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"

Step 6: Benchmark Optimized Model

bash

python scripts/inference_optimizer.py model.engine \
    --benchmark \
    --runtime tensorrt \
    --compare model.pt

Expected speedup:

Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAP

Workflow 3: Custom Dataset Preparation

Use this workflow when preparing a computer vision dataset for training.

Step 1: Audit Raw Data

bash

# Analyze image dataset
python scripts/dataset_pipeline_builder.py data/raw/ \
    --analyze \
    --output analysis/

Analysis report includes:

Dataset Analysis:
- Total images: 5,234
- Image sizes: 640x480 to 4096x3072 (variable)
- Formats: JPEG (4,891), PNG (343)
- Corrupted: 12 files
- Duplicates: 45 pairs

Annotation Analysis:
- Format detected: Pascal VOC XML
- Total annotations: 28,456
- Classes: 5 (car, person, bicycle, dog, cat)
- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
- Empty images: 234

Step 2: Clean and Validate

bash

# Remove corrupted and duplicate images
python scripts/dataset_pipeline_builder.py data/raw/ \
    --clean \
    --remove-corrupted \
    --remove-duplicates \
    --output data/cleaned/

Step 3: Convert Annotation Format

bash

# Convert VOC to COCO format
python scripts/dataset_pipeline_builder.py data/cleaned/ \
    --annotations data/annotations/ \
    --input-format voc \
    --output-format coco \
    --output data/coco/

Supported format conversions:

From	To
Pascal VOC XML	COCO JSON
YOLO TXT	COCO JSON
COCO JSON	YOLO TXT
LabelMe JSON	COCO JSON
CVAT XML	COCO JSON

Step 4: Apply Augmentations

bash

# Generate augmentation config
python scripts/dataset_pipeline_builder.py data/coco/ \
    --augment \
    --aug-config configs/augmentation.yaml \
    --output data/augmented/

Recommended augmentations for detection:

yaml

# configs/augmentation.yaml
augmentations:
  geometric:
    - horizontal_flip: { p: 0.5 }
    - vertical_flip: { p: 0.1 }  # Only if orientation invariant
    - rotate: { limit: 15, p: 0.3 }
    - scale: { scale_limit: 0.2, p: 0.5 }

  color:
    - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
    - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
    - blur: { blur_limit: 3, p: 0.1 }

  advanced:
    - mosaic: { p: 0.5 }  # YOLO-style mosaic
    - mixup: { p: 0.1 }   # Image mixing
    - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }

Step 5: Create Train/Val/Test Splits

bash

python scripts/dataset_pipeline_builder.py data/augmented/ \
    --split 0.8 0.1 0.1 \
    --stratify \
    --seed 42 \
    --output data/final/

Split strategy guidelines:

Dataset Size	Train	Val	Test
<1,000 images	70%	15%	15%
1,000-10,000	80%	10%	10%
>10,000	90%	5%	5%

Step 6: Generate Dataset Configuration

bash

# For Ultralytics YOLO
python scripts/dataset_pipeline_builder.py data/final/ \
    --generate-config yolo \
    --output data.yaml

# For Detectron2
python scripts/dataset_pipeline_builder.py data/final/ \
    --generate-config detectron2 \
    --output detectron2_config.py

Architecture Selection Guide

Object Detection Architectures

Architecture	Speed	Accuracy	Best For
YOLOv8n	1.2ms	37.3 mAP	Edge, mobile, real-time
YOLOv8s	2.1ms	44.9 mAP	Balanced speed/accuracy
YOLOv8m	4.2ms	50.2 mAP	General purpose
YOLOv8l	6.8ms	52.9 mAP	High accuracy
YOLOv8x	10.1ms	53.9 mAP	Maximum accuracy
RT-DETR-L	5.3ms	53.0 mAP	Transformer, no NMS
Faster R-CNN R50	46ms	40.2 mAP	Two-stage, high quality
DINO-4scale	85ms	49.0 mAP	SOTA transformer

Segmentation Architectures

Architecture	Type	Speed	Best For
YOLOv8-seg	Instance	4.5ms	Real-time instance seg
Mask R-CNN	Instance	67ms	High-quality masks
SAM	Promptable	50ms	Zero-shot segmentation
DeepLabV3+	Semantic	25ms	Scene parsing
SegFormer	Semantic	15ms	Efficient semantic seg

CNN vs Vision Transformer Trade-offs

Aspect	CNN (YOLO, R-CNN)	ViT (DETR, DINO)
Training data needed	1K-10K images	10K-100K+ images
Training time	Fast	Slow (needs more epochs)
Inference speed	Faster	Slower
Small objects	Good with FPN	Needs multi-scale
Global context	Limited	Excellent
Positional encoding	Implicit	Explicit

Reference Documentation

→ See references/reference-docs-and-commands.md for details

Performance Targets

Metric	Real-time	High Accuracy	Edge
FPS	>30	>10	>15
mAP@50	>0.6	>0.8	>0.5
Latency P99	<50ms	<150ms	<100ms
GPU Memory	<4GB	<8GB	<2GB
Model Size	<50MB	<200MB	<20MB

Resources

Architecture Guide: references/computer_vision_architectures.md
Optimization Guide: references/object_detection_optimization.md
Deployment Guide: references/production_vision_systems.md
Scripts: scripts/ directory for automation tools

Maintainer

alirezarezvani Core maintainer

Source details

Full Name: alirezarezvani/claude-skills
Branch: main
Path in repo: engineering-team/senior-computer-vision
License: MIT License
Topics: claude-code anthropic-claude agent-skills claude-code-skills codex-skills cursor-skills developer-tools prompt-engineering openclaw claude-skills claude-ai agentic-ai claude-code-plugins ai-coding-agent openclaw-skills openai-codex agent-plugins coding-agent-plugins gemini-cli-skills openclaw-plugins

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

alirezarezvani/claude-skills

business-growth-skills

4 business growth agent skills and plugins for Claude Code, Codex, Gemini CLI, Cursor, OpenClaw. Customer success (health scoring, churn), sales engineer (RFP), revenue operations (pipeline, GTM), contract & proposal writer. Python tools (stdlib-only).

8,805 1,070

Explore

alirezarezvani/claude-skills

contract-and-proposal-writer

Contract & Proposal Writer

8,805 1,070

Explore

alirezarezvani/claude-skills

sales-engineer

Analyzes RFP/RFI responses for coverage gaps, builds competitive feature comparison matrices, and plans proof-of-concept (POC) engagements for pre-sales engineering. Use when responding to RFPs, bids, or proposal requests; comparing product features against competitors; planning or scoring a customer POC or sales demo; preparing a technical proposal; or performing win/loss competitor analysis. Handles tasks described as 'RFP response', 'bid response', 'proposal response', 'competitor comparison', 'feature matrix', 'POC planning', 'sales demo prep', or 'pre-sales engineering'.

8,805 1,070

Explore

alirezarezvani/claude-skills

customer-success-manager

Monitors customer health, predicts churn risk, and identifies expansion opportunities using weighted scoring models for SaaS customer success. Use when analyzing customer accounts, reviewing retention metrics, scoring at-risk customers, or when the user mentions churn, customer health scores, upsell opportunities, expansion revenue, retention analysis, or customer analytics. Runs three Python CLI tools to produce deterministic health scores, churn risk tiers, and prioritized expansion recommendations across Enterprise, Mid-Market, and SMB segments.

8,805 1,070

Explore

alirezarezvani/claude-skills

revenue-operations

Analyzes sales pipeline health, revenue forecasting accuracy, and go-to-market efficiency metrics for SaaS revenue optimization. Use when analyzing sales pipeline coverage, forecasting revenue, evaluating go-to-market performance, reviewing sales metrics, assessing pipeline analysis, tracking forecast accuracy with MAPE, calculating GTM efficiency, or measuring sales efficiency and unit economics for SaaS teams.

8,805 1,070

Explore

alirezarezvani/claude-skills

marketing-skills

42 marketing agent skills and plugins for Claude Code, Codex, Gemini CLI, Cursor, OpenClaw, and 6 more coding agents. 7 pods: content, SEO, CRO, channels, growth, intelligence, sales. Foundation context + orchestration router. 27 Python tools (stdlib-only).

8,805 1,070

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Senior Computer Vision Engineer

Table of Contents

Quick Start

Core Expertise

Tech Stack

Workflow 1: Object Detection Pipeline

Step 1: Define Detection Requirements

Step 2: Select Detection Architecture

Step 3: Prepare Dataset

Step 4: Configure Training

Step 5: Train and Validate

Step 6: Evaluate Results

Workflow 2: Model Optimization and Deployment

Step 1: Benchmark Baseline Performance

Step 2: Select Optimization Strategy

Step 3: Export to ONNX

Step 4: Apply Quantization (Optional)

Step 5: Convert to Target Runtime

Step 6: Benchmark Optimized Model

Workflow 3: Custom Dataset Preparation

Step 1: Audit Raw Data

Step 2: Clean and Validate

Step 3: Convert Annotation Format

Step 4: Apply Augmentations

Step 5: Create Train/Val/Test Splits

Step 6: Generate Dataset Configuration

Architecture Selection Guide

Object Detection Architectures

Segmentation Architectures

CNN vs Vision Transformer Trade-offs

Reference Documentation

Performance Targets

Resources

Recommended Agent Skills

business-growth-skills

contract-and-proposal-writer

sales-engineer

customer-success-manager

revenue-operations

marketing-skills