Agent skill
Nano Banana Video Generation
Generate videos using Google Veo models via the nano-banana CLI. Use this skill when the user asks to create, generate, animate, or produce videos with AI. Supports text-to-video, image-to-video animation, dialogue with lip-sync, and scene extensions. Trigger on requests like "create a video", "animate this image", "make a video clip", "generate footage", "produce a short film", "add motion to this".
Install this agent skill to your Project
npx add-skill https://github.com/The-Focus-AI/nano-banana-cli/tree/main/skills/nano-banana-videogen
SKILL.md
Nano Banana Video Generation
Generate videos using Google Veo 3.1 models via the nano-banana CLI.
Prerequisites
GEMINI_API_KEYenvironment variable must be set- The CLI is installed via
npx @the-focus-ai/nano-banana
Quick Reference
# Generate a video from text
nano-banana --video "A sunset over mountains, slow dolly-in, cinematic lighting"
# Animate an existing image
nano-banana --video "The character slowly turns and smiles" --file portrait.png
# Cost-optimized development mode
nano-banana --video "Quick test scene" --video-fast --no-audio --resolution 720p
# Specify output path
nano-banana --video "A cat playing" --output cat-video.mp4
# Full control over settings
nano-banana --video "Dramatic reveal scene" \
--duration 8 --aspect 16:9 --resolution 1080p --seed 42
Understanding Video Requests
Before generating, clarify these video-specific aspects:
- Core Scene: What's the main action or subject?
- Camera Movement: Static, dolly, pan, tracking, crane?
- Style: Cinematic, documentary, commercial, casual?
- Audio: Dialogue? Sound effects? Ambient sounds? Music?
- Duration: 4, 6, or 8 seconds?
- Orientation: Landscape (16:9) or portrait (9:16)?
The Five-Part Video Prompt Formula
Structure prompts with these elements:
[Camera Movement] + [Subject] + [Action] + [Environment] + [Audio/Style]
Example - Weak prompt:
"a person walking"
Example - Strong prompt:
"Slow dolly-in shot. A woman in her 30s, shoulder-length wavy black hair,
green jacket, walks confidently through a sunlit park. Golden hour lighting,
warm color grading. Ambient sounds: birds chirping, distant traffic.
Cinematic, aspirational mood. No subtitles, no text overlay."
Workflow
Step 1: Craft the Prompt
Use the prompting-guide.md for comprehensive guidance.
Key principles:
- Start with camera movement (dolly, pan, static, tracking)
- Describe subject in detail (appearance, wardrobe, expression)
- Specify action with timing cues
- Include lighting and environment
- Add audio design (dialogue, SFX, ambient)
- Always end with: "No subtitles, no text overlay, no captions"
Step 2: Consider Cost
Video generation is significantly more expensive than images:
| Model | Cost per Second | 8-Second Video |
|---|---|---|
veo-3.1-generate-001 |
$0.40 | $3.20 |
veo-3.1-fast-generate-001 |
$0.15 | $1.20 |
Development workflow:
- Iterate with
--video-fast --no-audio(cheapest) - Test with
--video-fast(add audio when needed) - Final render with default model (premium quality)
Step 3: Generate
nano-banana --video "your detailed prompt here"
Generation takes 2-4 minutes. Progress is shown in the terminal.
Step 4: Iterate
If the result isn't right:
- Refine camera movement - Be more explicit (e.g., "slow dolly-in over 8 seconds")
- Add negative guidance - Describe what to avoid
- Simplify - Focus on one main action per clip
- Try different duration - 4s or 6s may work better for quick actions
Commands
Text-to-Video
nano-banana --video "<prompt>"
Image-to-Video (Animation)
nano-banana --video "<motion description>" --file <input-image>
The motion description should describe how the image should animate:
- "The character slowly turns their head and smiles"
- "The scene comes alive with subtle wind movement"
- "Zoom out to reveal the full landscape"
Options
| Option | Description | Default |
|---|---|---|
--video |
Enable video mode | (required) |
--video-model <name> |
Veo model to use | veo-3.1-generate-001 |
--video-fast |
Use fast/cheap model | (premium model) |
--duration <sec> |
4, 6, or 8 seconds | 8 |
--aspect <ratio> |
16:9 or 9:16 | 16:9 |
--resolution <res> |
720p, 1080p, or 4K | 1080p |
--audio |
Generate audio | (enabled) |
--no-audio |
Disable audio | - |
--seed <number> |
Reproducibility seed | (random) |
--output <file> |
Output path | output/video-.mp4 |
--file <image> |
Input image to animate | - |
Camera Movement Reference
Use these terms for precise camera control:
| Movement | Description | Example Prompt |
|---|---|---|
| Static | No movement | "Static shot on tripod. A coffee cup steaming..." |
| Pan | Horizontal rotation | "Slow pan left across the city skyline..." |
| Tilt | Vertical rotation | "Tilt down from face to hands..." |
| Dolly In | Camera moves closer | "Slow dolly-in from medium to close-up..." |
| Dolly Out | Camera moves away | "Dolly-out revealing the vast landscape..." |
| Tracking | Parallel to subject | "Tracking shot following character walking..." |
| Crane | Sweeping vertical | "Crane shot ascending from ground level..." |
| Handheld | Realistic shake | "Handheld camera, documentary style..." |
Important: Use ONE primary movement per shot. Don't combine multiple movements.
Dialogue Formatting
For spoken dialogue, use the colon format:
Character description says: "Exact dialogue here."
Example:
"A friendly young woman, excited and cheerful, says: 'Welcome to our store!'
Standing in bright retail environment. Natural lip-sync. No subtitles."
Guidelines:
- Keep dialogue to 6-12 words for 8 seconds
- Describe the speaker's tone and emotion
- Always add "No subtitles, no text overlay"
Audio Design
Structure audio in layers:
- Dialogue (highest priority) - Always clear
- Sound Effects - Specific, timed actions
- Ambient - 3-5 background elements max
- Music - Lowest priority, "ducks under dialogue"
Example:
"Sound effects: Door closing at 2-second mark, footsteps on wood.
Ambient sounds: Quiet office hum, distant typing.
Background music: Soft jazz, low volume, ducks under dialogue."
Best Practices
For Better Results
- Front-load important info - Camera, subject, action first
- Use cinematic terms - "35mm lens", "shallow depth of field", "golden hour"
- Be specific about lighting - "Soft window light from left", not just "good lighting"
- Describe the mood - "Intimate", "epic", "suspenseful", "uplifting"
- Include negative guidance - What to avoid
For Image-to-Video
- Match the image - Describe motion that fits what's in the image
- Start subtle - Small movements work better than dramatic changes
- Keep lighting consistent - Don't describe lighting changes that differ from the image
For Consistency Across Shots
When creating multiple related videos:
- Create a character description and reuse it exactly
- Keep lighting style consistent
- Use the same camera movement style family
- Use
--seedfor more reproducible results
Troubleshooting
"Video generation timeout"
- Generation can take 2-4 minutes
- If persistent, try simpler prompts
- Use
--video-fastfor faster generation
Poor quality or wrong content
- Add more specific descriptions
- Include negative guidance
- Try the premium model instead of fast
Subtitles appearing in video
- Always include "No subtitles, no text overlay, no captions" in prompt
- Veo was trained on videos with subtitles and tends to add them
Audio doesn't match video
- Be more specific about when sounds occur
- Use "Sound effect: X at Y-second mark"
- Simplify audio layers (fewer elements)
Safety filter rejection
- Avoid violence, weapons, explicit content
- Rephrase ambiguous terms
- Try more generic descriptions
Cost Optimization
# Development (cheapest): ~$1.20 per video
nano-banana --video "test prompt" --video-fast --no-audio --resolution 720p
# Testing with audio: ~$1.20 per video
nano-banana --video "test prompt" --video-fast
# Production quality: ~$3.20 per video
nano-banana --video "final prompt" --resolution 1080p
Example Prompts
See the examples/ directory for complete prompt examples:
- cinematic-shots.md - Camera movements
- dialogue-and-audio.md - Speech and sound
- image-to-video.md - Animating images
Environment Setup
Ensure GEMINI_API_KEY is set:
export GEMINI_API_KEY="your-api-key-here"
Or create a .env file in your project:
GEMINI_API_KEY=your-api-key-here
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
Nano Banana Image Generation
Generate and edit images using Google Gemini image models. Use this skill when the user asks to create, generate, make, or edit images with AI. Supports text-to-image, image editing, style transfer, and multi-image composition. Trigger on requests like "create an image", "generate a picture", "make me a logo", "edit this photo", "add X to this image".
generate_video
big_text
Create ASCII art banners using figlet. Use this when the user asks for big text, banners, or ASCII art.
generate_image
browser-automation
Automate Chrome browser via DevTools Protocol. Use when user asks to scrape websites, take screenshots, generate PDFs, interact with web pages, extract content, fill forms, or automate browser tasks.
ubiquitous-language
Extract a DDD-style ubiquitous language glossary from the current conversation, flagging ambiguities and proposing canonical terms. Saves to UBIQUITOUS_LANGUAGE.md. Use when user wants to define domain terms, build a glossary, harden terminology, create a ubiquitous language, or mentions "domain model" or "DDD".
Didn't find tool you were looking for?