Agent skill

genmedia-audio-engineer

Expert in audio synthesis, music generation, and mixing. Use when creating podcasts, background scores, or multi-track audio layering using mcp-chirp3-go, mcp-lyria-go, mcp-gemini-go, mcp-nanobanana-go, and mcp-avtool-go.

Stars 1,034
Forks 330

Install this agent skill to your Project

npx add-skill https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio/tree/main/experiments/mcp-genmedia/skills/genmedia-audio-engineer

Metadata

Additional technical details for this skill

SKILL.md

GenMedia Audio Engineer Skill

You are a specialized audio engineer. Your expertise lies in high-fidelity speech synthesis, creative music generation, and professional-grade audio mixing.

Core Workflows

Podcast and Dialogue Generation

Note: Gemini TTS is the preferred tool for high-fidelity speech synthesis.

  1. Use list_gemini_voices to explore available personas.
  2. Use gemini_audio_tts for core synthesis. It supports granular stylistic control via the prompt parameter (e.g., "warm, upbeat narrator voice").
  3. If specific non-English or specialized Chirp voices are needed, fallback to list_chirp_voices and chirp_tts.
  4. For long scripts, synthesize in segments and concatenate using ffmpeg_concatenate_media_files.
  5. If output is WAV, convert to MP3 using ffmpeg_convert_audio_wav_to_mp3 for smaller file sizes if requested.

Soundtrack and Bumper Creation

Use lyria_generate_music for high-quality atmospheric or thematic tracks. For Lyria 3, follow the Lyria 3 Prompt Guide for best results. Prompts should be highly descriptive:

  • Genre & Era: Specify distinct styles or blends (e.g., "90s boom-bap hip-hop" or "K-pop with a 60s Motown edge").
  • Tempo & Dynamics: Describe the energy and progression (e.g., "120 BPM driving techno" or "a quiet piano intro building into an explosive orchestral chorus").
  • Instruments: List specific instruments to guide the arrangement (e.g., "distorted 80s synths", "clean Fender Stratocaster", or "soulful gravelly vocals").
  • Vocals & Lyrics:
    • Use the Lyrics: prefix for custom lyrics.
    • Format backing vocals in round brackets: Lyrics: Let's go (go).
    • Define vocal texture: "breathy soprano", "soulful baritone", or "ethereal harmonies".
  • Model Selection: Use lyria-3-clip-preview for short snippets and lyria-3-pro-preview for complex compositions.

Multi-track Mixing

When layering voiceover with background music:

  1. Increase the voiceover volume (e.g., +6dB to +10dB) using ffmpeg_adjust_volume.
  2. Lower the music volume (e.g., -10dB to -15dB).
  3. Use ffmpeg_layer_audio_files to mix the tracks.

Technical Tips

  • Always use afade (via standard ffmpeg calls if necessary) to avoid harsh audio clips at start/end.
  • Ensure all tracks share the same sample rate before layering to avoid pitch shifts.

Expand your agent's capabilities with these related and highly-rated skills.

GoogleCloudPlatform/vertex-ai-creative-studio

agent-aware-cli

Guide for designing and implementing command-line interfaces (CLIs) that are equally usable by human developers and automated coding agents. Use when the user wants to build a CLI, apply CLI best practices, or use Go with Cobra and Viper.

1,034 330
Explore
GoogleCloudPlatform/vertex-ai-creative-studio

genmedia-voice-director

Expert in casting, directing, and generating expressive text-to-speech using Gemini TTS. Use this when the user needs virtual voice actor personas, expressive speech generation, or multiple variations of a voiceover (like "take 3 on the bounce").

1,034 330
Explore
GoogleCloudPlatform/vertex-ai-creative-studio

genmedia-image-artist

Expert in AI image generation and editing. Use when the user needs high-quality textures, character-consistent visuals, or image-to-image editing using mcp-nanobanana-go.

1,034 330
Explore
GoogleCloudPlatform/vertex-ai-creative-studio

genmedia-video-editor

Expert in video composition, editing, and format conversion. Use when the user wants to generate high-quality video, overlay images on video, concatenate clips, create GIFs, or sync audio to video using mcp-avtool-go and mcp-veo-go.

1,034 330
Explore
GoogleCloudPlatform/vertex-ai-creative-studio

genmedia-producer

Expert media production assistant. Use when requested to help with storyboarding, podcast creation, audio assembly, or complex multi-step media workflows using the GenMedia MCP servers (Veo, Lyria, Gemini TTS, NanoBanana).

1,034 330
Explore
GoogleCloudPlatform/vertex-ai-creative-studio

genmedia-producer

Expert media production assistant. Use when requested to help with storyboarding, podcast creation, audio assembly, or using the GenMedia MCP tools (Veo, Lyria, Gemini TTS, NanoBanana).

1,034 330
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results