Agent skill

image-generation

Guide to image generation and editing in MassGen. Use when creating images, editing existing images, iterating on image designs, or choosing between image backends (OpenAI, Google Gemini/Imagen, Grok, OpenRouter).

View SKILL.md on GitHub Repository

Stars 914

Forks 144

Install this agent skill to your Project

npx add-skill https://github.com/massgen/MassGen/tree/main/massgen/skills/image-generation

SKILL.md

Image Generation

Generate images using generate_media with mode="image". The system auto-selects the best backend based on available API keys.

Quick Start

python

# Simple text-to-image (auto-selects backend)
generate_media(prompt="A cat in space", mode="image")

# Specify backend and quality
generate_media(prompt="A logo for a coffee shop", mode="image",
               backend_type="openai", quality="high")

# Batch generation (parallel)
generate_media(prompts=["sunset over ocean", "mountain landscape", "city at night"],
               mode="image", max_concurrent=3)

Backend Comparison

Backend	Default Model	Strengths	API Key
Google (priority 1)	`gemini-3.1-flash-image-preview` (Nano Banana 2)	Fast, flexible sizes, image editing, multi-turn	`GOOGLE_API_KEY` or `GEMINI_API_KEY`
OpenAI (priority 2)	`gpt-5.4`	High quality, transparent backgrounds, continuation via response ID	`OPENAI_API_KEY`
Grok (priority 3)	`grok-imagine-image`	1k resolution, continuation via stored data URI	`XAI_API_KEY`
OpenRouter (priority 4)	`google/gemini-3.1-flash-image-preview`	Access to multiple models via single API	`OPENROUTER_API_KEY`

Key Parameters

Parameter	Description	Example
`prompt`	Text description of the image	`"A watercolor painting of mountains"`
`backend_type`	Force a specific backend	`"google"`, `"openai"`, `"grok"`, `"openrouter"`
`model`	Override default model	`"gemini-3-pro-image-preview"` for studio quality
`quality`	Image quality (OpenAI)	`"low"`, `"medium"`, `"high"`, `"auto"`
`size`	Image dimensions	See backends reference
`aspect_ratio`	Aspect ratio	`"16:9"`, `"1:1"`, `"4:5"`
`input_images`	Source images for image-to-image editing	`["photo.jpg"]`
`continue_from`	Continuation ID for multi-turn editing	`result["continuation_id"]`

Image-to-Image Editing

Transform existing images by providing input_images:

python

generate_media(
    prompt="Make it look like a watercolor painting",
    mode="image",
    input_images=["photo.jpg"]
)

Supported backends for image-to-image: Google (Gemini), OpenAI, Grok. The system auto-selects if your current backend doesn't support it.

Multi-Turn Editing (Continuation)

Iteratively refine images using continue_from:

python

# First generation
result = generate_media(prompt="A logo for a coffee shop", mode="image")

# Refine using the continuation ID
result2 = generate_media(
    prompt="Make the text larger and add a cup icon",
    mode="image",
    continue_from=result["continuation_id"]
)

Each backend uses a different continuation mechanism:

OpenAI: Passes previous_response_id (stateless)
Google Gemini: In-memory chat store (LRU, 50 items)
Grok: In-memory data URI store (LRU, 50 items)

Continuation only works for single image generation (not batch).

Google: Gemini vs Imagen

Google supports two API paths. Gemini (Nano Banana 2) is the default and recommended for most use cases. Imagen is only needed for advanced reference-image editing features.

Gemini models (gemini-*): generate_content() — text-to-image, image editing via input_images, multi-turn continuation
Imagen models (imagen-*): generate_images() / edit_image() — text-to-image with negative_prompt/seed/guidance_scale, plus style transfer, control editing, and subject consistency via reference images

For studio-quality precision and text rendering, use: model="gemini-3-pro-image-preview" (Pro-tier).

Need More Control?

Per-backend sizes, quality options, and quirks: See references/backends.md
Complete extra_params reference: See references/extra_params.md
Advanced editing (inpainting, style transfer, control, subject): See references/editing.md

Maintainer

massgen Core maintainer

Source details

Full Name: massgen/MassGen
Branch: main
Path in repo: massgen/skills/image-generation
License: Other
Topics: agent cli llm model-context-protocol python agentic-ai autonomous-agents multi-agent llm-orchestration genai generative-ai collaborative-ai conversational-ai terminal-ui test-time-scaling tool-calling

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

massgen/MassGen

audio-generation

Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.

914 144

Explore

massgen/MassGen

textual-ui-developer

Develop and debug the MassGen Textual TUI with deterministic replay, snapshot regression tests, and targeted runtime checks.

914 144

Explore

massgen/MassGen

evolving-skill-creator

Guide for creating evolving skills - detailed workflow plans that capture what you'll do, what tools you'll create, and learnings from execution. Use this when starting a new task that could benefit from a reusable workflow.

914 144

Explore

massgen/MassGen

pr-checks

Run comprehensive PR checks including reviewing CodeRabbit comments, ensuring PR description quality, running pre-commit hooks, tests, and validation. Use on an existing PR to address review feedback.

914 144

Explore

massgen/MassGen

serena

This skill provides symbol-level code understanding and navigation using Language Server Protocol (LSP). Enables IDE-like capabilities for finding symbols, tracking references, and making precise code edits at the symbol level.

914 144

Explore

massgen/MassGen

massgen-config-creator

Guide for creating properly structured YAML configuration files for MassGen. This skill should be used when agents need to create new configs for examples, case studies, testing, or demonstrating features.

914 144

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Image Generation

Quick Start

Backend Comparison

Key Parameters

Image-to-Image Editing

Multi-Turn Editing (Continuation)

Google: Gemini vs Imagen

Need More Control?

Recommended Agent Skills

audio-generation

textual-ui-developer

evolving-skill-creator

pr-checks

serena

massgen-config-creator