Agent skills
Nano Banana Video Generation

Agent skill

Nano Banana Video Generation

Generate videos using Google Veo models via the nano-banana CLI. Use this skill when the user asks to create, generate, animate, or produce videos with AI. Supports text-to-video, image-to-video animation, dialogue with lip-sync, and scene extensions. Trigger on requests like "create a video", "animate this image", "make a video clip", "generate footage", "produce a short film", "add motion to this".

View SKILL.md on GitHub Repository

Stars 14

Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/The-Focus-AI/nano-banana-cli/tree/main/skills/nano-banana-videogen

SKILL.md

Nano Banana Video Generation

Generate videos using Google Veo 3.1 models via the nano-banana CLI.

Prerequisites

GEMINI_API_KEY environment variable must be set
The CLI is installed via npx @the-focus-ai/nano-banana

Quick Reference

bash

# Generate a video from text
nano-banana --video "A sunset over mountains, slow dolly-in, cinematic lighting"

# Animate an existing image
nano-banana --video "The character slowly turns and smiles" --file portrait.png

# Cost-optimized development mode
nano-banana --video "Quick test scene" --video-fast --no-audio --resolution 720p

# Specify output path
nano-banana --video "A cat playing" --output cat-video.mp4

# Full control over settings
nano-banana --video "Dramatic reveal scene" \
  --duration 8 --aspect 16:9 --resolution 1080p --seed 42

Understanding Video Requests

Before generating, clarify these video-specific aspects:

Core Scene: What's the main action or subject?
Camera Movement: Static, dolly, pan, tracking, crane?
Style: Cinematic, documentary, commercial, casual?
Audio: Dialogue? Sound effects? Ambient sounds? Music?
Duration: 4, 6, or 8 seconds?
Orientation: Landscape (16:9) or portrait (9:16)?

The Five-Part Video Prompt Formula

Structure prompts with these elements:

[Camera Movement] + [Subject] + [Action] + [Environment] + [Audio/Style]

Example - Weak prompt:

"a person walking"

Example - Strong prompt:

"Slow dolly-in shot. A woman in her 30s, shoulder-length wavy black hair,
green jacket, walks confidently through a sunlit park. Golden hour lighting,
warm color grading. Ambient sounds: birds chirping, distant traffic.
Cinematic, aspirational mood. No subtitles, no text overlay."

Workflow

Step 1: Craft the Prompt

Use the prompting-guide.md for comprehensive guidance.

Key principles:

Start with camera movement (dolly, pan, static, tracking)
Describe subject in detail (appearance, wardrobe, expression)
Specify action with timing cues
Include lighting and environment
Add audio design (dialogue, SFX, ambient)
Always end with: "No subtitles, no text overlay, no captions"

Step 2: Consider Cost

Video generation is significantly more expensive than images:

Model	Cost per Second	8-Second Video
`veo-3.1-generate-001`	$0.40	$3.20
`veo-3.1-fast-generate-001`	$0.15	$1.20

Development workflow:

Iterate with --video-fast --no-audio (cheapest)
Test with --video-fast (add audio when needed)
Final render with default model (premium quality)

Step 3: Generate

bash

nano-banana --video "your detailed prompt here"

Generation takes 2-4 minutes. Progress is shown in the terminal.

Step 4: Iterate

If the result isn't right:

Refine camera movement - Be more explicit (e.g., "slow dolly-in over 8 seconds")
Add negative guidance - Describe what to avoid
Simplify - Focus on one main action per clip
Try different duration - 4s or 6s may work better for quick actions

Commands

Text-to-Video

bash

nano-banana --video "<prompt>"

Image-to-Video (Animation)

bash

nano-banana --video "<motion description>" --file <input-image>

The motion description should describe how the image should animate:

"The character slowly turns their head and smiles"
"The scene comes alive with subtle wind movement"
"Zoom out to reveal the full landscape"

Options

Option	Description	Default
`--video`	Enable video mode	(required)
`--video-model <name>`	Veo model to use	veo-3.1-generate-001
`--video-fast`	Use fast/cheap model	(premium model)
`--duration <sec>`	4, 6, or 8 seconds	8
`--aspect <ratio>`	16:9 or 9:16	16:9
`--resolution <res>`	720p, 1080p, or 4K	1080p
`--audio`	Generate audio	(enabled)
`--no-audio`	Disable audio	-
`--seed <number>`	Reproducibility seed	(random)
`--output <file>`	Output path	output/video-.mp4
`--file <image>`	Input image to animate	-

Camera Movement Reference

Use these terms for precise camera control:

Movement	Description	Example Prompt
Static	No movement	"Static shot on tripod. A coffee cup steaming..."
Pan	Horizontal rotation	"Slow pan left across the city skyline..."
Tilt	Vertical rotation	"Tilt down from face to hands..."
Dolly In	Camera moves closer	"Slow dolly-in from medium to close-up..."
Dolly Out	Camera moves away	"Dolly-out revealing the vast landscape..."
Tracking	Parallel to subject	"Tracking shot following character walking..."
Crane	Sweeping vertical	"Crane shot ascending from ground level..."
Handheld	Realistic shake	"Handheld camera, documentary style..."

Important: Use ONE primary movement per shot. Don't combine multiple movements.

Dialogue Formatting

For spoken dialogue, use the colon format:

Character description says: "Exact dialogue here."

Example:

"A friendly young woman, excited and cheerful, says: 'Welcome to our store!'
Standing in bright retail environment. Natural lip-sync. No subtitles."

Guidelines:

Keep dialogue to 6-12 words for 8 seconds
Describe the speaker's tone and emotion
Always add "No subtitles, no text overlay"

Audio Design

Structure audio in layers:

Dialogue (highest priority) - Always clear
Sound Effects - Specific, timed actions
Ambient - 3-5 background elements max
Music - Lowest priority, "ducks under dialogue"

Example:

"Sound effects: Door closing at 2-second mark, footsteps on wood.
Ambient sounds: Quiet office hum, distant typing.
Background music: Soft jazz, low volume, ducks under dialogue."

Best Practices

For Better Results

Front-load important info - Camera, subject, action first
Use cinematic terms - "35mm lens", "shallow depth of field", "golden hour"
Be specific about lighting - "Soft window light from left", not just "good lighting"
Describe the mood - "Intimate", "epic", "suspenseful", "uplifting"
Include negative guidance - What to avoid

For Image-to-Video

Match the image - Describe motion that fits what's in the image
Start subtle - Small movements work better than dramatic changes
Keep lighting consistent - Don't describe lighting changes that differ from the image

For Consistency Across Shots

When creating multiple related videos:

Create a character description and reuse it exactly
Keep lighting style consistent
Use the same camera movement style family
Use --seed for more reproducible results

Troubleshooting

"Video generation timeout"

Generation can take 2-4 minutes
If persistent, try simpler prompts
Use --video-fast for faster generation

Poor quality or wrong content

Add more specific descriptions
Include negative guidance
Try the premium model instead of fast

Subtitles appearing in video

Always include "No subtitles, no text overlay, no captions" in prompt
Veo was trained on videos with subtitles and tends to add them

Audio doesn't match video

Be more specific about when sounds occur
Use "Sound effect: X at Y-second mark"
Simplify audio layers (fewer elements)

Safety filter rejection

Avoid violence, weapons, explicit content
Rephrase ambiguous terms
Try more generic descriptions

Cost Optimization

bash

# Development (cheapest): ~$1.20 per video
nano-banana --video "test prompt" --video-fast --no-audio --resolution 720p

# Testing with audio: ~$1.20 per video
nano-banana --video "test prompt" --video-fast

# Production quality: ~$3.20 per video
nano-banana --video "final prompt" --resolution 1080p

Example Prompts

See the examples/ directory for complete prompt examples:

cinematic-shots.md - Camera movements
dialogue-and-audio.md - Speech and sound
image-to-video.md - Animating images

Environment Setup

Ensure GEMINI_API_KEY is set:

bash

export GEMINI_API_KEY="your-api-key-here"

Or create a .env file in your project:

GEMINI_API_KEY=your-api-key-here

Maintainer

The-Focus-AI Core maintainer

Source details

Full Name: The-Focus-AI/nano-banana-cli
Branch: main
Path in repo: skills/nano-banana-videogen
Topics: ai cli nodejs gemini-api batch-processing exif image-processing

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

The-Focus-AI/nano-banana-cli

Nano Banana Image Generation

Generate and edit images using Google Gemini image models. Use this skill when the user asks to create, generate, make, or edit images with AI. Supports text-to-image, image editing, style transfer, and multi-image composition. Trigger on requests like "create an image", "generate a picture", "make me a logo", "edit this photo", "add X to this image".

14 2

Explore

The-Focus-AI/weekend-coding-agent

generate_video

0 1

Explore

The-Focus-AI/weekend-coding-agent

big_text

Create ASCII art banners using figlet. Use this when the user asks for big text, banners, or ASCII art.

0 1

Explore

The-Focus-AI/weekend-coding-agent

generate_image

0 1

Explore

The-Focus-AI/chrome-driver

browser-automation

Automate Chrome browser via DevTools Protocol. Use when user asks to scrape websites, take screenshots, generate PDFs, interact with web pages, extract content, fill forms, or automate browser tasks.

0 0

Explore

petekp/claude-code-setup

ubiquitous-language

Extract a DDD-style ubiquitous language glossary from the current conversation, flagging ambiguities and proposing canonical terms. Saves to UBIQUITOUS_LANGUAGE.md. Use when user wants to define domain terms, build a glossary, harden terminology, create a ubiquitous language, or mentions "domain model" or "DDD".

20 6

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Nano Banana Video Generation

Prerequisites

Quick Reference

Understanding Video Requests

The Five-Part Video Prompt Formula

Workflow

Step 1: Craft the Prompt

Step 2: Consider Cost

Step 3: Generate

Step 4: Iterate

Commands

Text-to-Video

Image-to-Video (Animation)

Options

Camera Movement Reference

Dialogue Formatting

Audio Design

Best Practices

For Better Results

For Image-to-Video

For Consistency Across Shots

Troubleshooting

"Video generation timeout"

Poor quality or wrong content

Subtitles appearing in video

Audio doesn't match video

Safety filter rejection

Cost Optimization

Example Prompts

Environment Setup

Recommended Agent Skills

Nano Banana Image Generation

generate_video

big_text

generate_image

browser-automation

ubiquitous-language