Agent skill
pair-programmer
AI pair programming with real-time screen and audio context. Use when the user wants to record their screen, start/stop recording, or get context from what they're doing.
Install this agent skill to your Project
npx add-skill https://github.com/video-db/pair-programmer/tree/main/skills/pair-programmer
SKILL.md
VideoDB Pair Programmer
AI pair programming with real-time screen and audio context. Record your screen and audio, with AI-powered indexing that logs visual and audio events in real-time.
Commands
When user asks for a command, read the corresponding file for instructions:
| Command | Description | Reference |
|---|---|---|
/pair-programmer record |
Start screen/audio recording | See commands/record.md |
/pair-programmer stop |
Stop the running recording | See commands/stop.md |
/pair-programmer search |
Search recording context (screen, mic, audio) | See commands/search.md |
/pair-programmer act |
Act on a spoken instruction from the mic | See commands/act.md |
/pair-programmer what-happened |
Summarize recent activity | See commands/what-happened.md |
/pair-programmer setup |
Install deps and configure API key | See commands/setup.md |
/pair-programmer config |
Change indexing and other settings | See commands/config.md |
How It Works
- User runs
/pair-programmer setupto install dependencies and setVIDEO_DB_API_KEYenvironment variable - User runs
/pair-programmer recordto start recording - A picker UI appears to select screen and audio sources
- Recording starts and events are logged to
/tmp/videodb_pp_events.jsonl - User can stop recording from the tray icon (🔴 PP → Stop Recording)
Output Files
| Path | Content |
|---|---|
/tmp/videodb_pp_pid |
Process ID of the recorder |
/tmp/videodb_pp_events.jsonl |
All WebSocket events (JSONL format) |
/tmp/videodb_pp_info.json |
Current session info (session_id, rtstream_ids) |
Event File Format
Events are written as JSONL (one JSON object per line):
{"ts": "2026-03-05T10:15:30.123Z", "unix_ts": 1709374530.12, "channel": "visual_index", "data": {"text": "User is viewing VS Code with auth.ts open"}}
{"ts": "2026-03-05T10:15:31.456Z", "unix_ts": 1709374531.45, "channel": "transcript", "data": {"text": "Let me check the login flow", "is_final": true}}
Environment Variables
The recorder reads these from environment variables:
| Variable | Required | Description |
|---|---|---|
VIDEO_DB_API_KEY |
Yes | VideoDB API key |
VIDEO_DB_BASE_URL |
No | API endpoint (default: https://api.videodb.io) |
Reading Context
Events are in /tmp/videodb_pp_events.jsonl. Use CLI tools to filter — never read the whole file.
| Channel | Content | Density |
|---|---|---|
visual_index |
Screen descriptions | Dense (~1 every 2s) |
transcript |
Mic speech | Sparse (sentences) |
audio_index |
System audio summaries | Sparse (sentences) |
Channel filter — use grep to filter by channel, pipe to tail for recent events:
grep '"channel":"visual_index"' /tmp/videodb_pp_events.jsonl | tail -10
Keyword search — grep across all channels:
grep -i 'keyword' /tmp/videodb_pp_events.jsonl
Time-window filter — filter events from the last N minutes:
- Get current epoch:
$(date +%s) - Compute cutoff:
epoch - N*60 - Filter lines where the
unix_tsJSON field exceeds the cutoff - Pipe through
grepto narrow by channel
Generate the appropriate filtering command (grep, awk, python3, jq) based on complexity.
For semantic search across indexed content, use search-rtstream.js:
node search-rtstream.js --query="your query" --cwd=<PROJECT_ROOT>
<PROJECT_ROOT>is the absolute path to the user's project directory. This is NOT the skill directory — resolve it before running the command.
See commands/search.md for the full search strategy and CLI patterns.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
verl-rl-training
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
openrlhf-training
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.
gguf-quantization
GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.
Claude Code Guide
Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.
qdrant-vector-search
High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.
behavioral-modes
AI operational modes (brainstorm, implement, debug, review, teach, ship, orchestrate). Use to adapt behavior based on task type.
Didn't find tool you were looking for?