Nano Banana API (Gemini): Complete Developer Guide
Nano Banana is the nickname for Gemini’s native image generation capability. If you’ve seen people talk about “Nano Banana API,” they usually mean: “How do I generate or edit images using Gemini models that return actual images in the response?” This guide explains the two Nano Banana models, how to call them via the Gemini API, how to handle mixed TEXT + IMAGE outputs, and how to ship this reliably in production with safety, cost control, and good UX.
What is Nano Banana API?
“Nano Banana API” is an informal name used by developers and creators to describe Gemini’s built-in image generation and editing functionality exposed through the Gemini API (Google AI for Developers). Unlike older “text-only” chat endpoints, Nano Banana calls can return actual image bytes (often Base64-encoded) inside the model’s output. This enables a conversational workflow where you can:
- Generate images from text (text-to-image).
- Edit an existing image using natural language instructions.
- Blend or compose multiple images to create consistent scenes or product shots (depending on model).
- Iterate in a single conversation: generate → tweak → regenerate → refine.
If you’re building a product, Nano Banana is most valuable when you need both: (1) a developer API that can be integrated into workflows, and (2) a model that follows instructions well enough to behave like a creative “tool” rather than a random image slot machine.
Models: Nano Banana vs Nano Banana Pro
In the Gemini API, “Nano Banana” refers to two image-capable models. The easiest way to think about them is: speed vs fidelity and instruction-following.
| Brand name | Gemini model ID | Best for | Typical output |
|---|---|---|---|
| Nano Banana | gemini-2.5-flash-image | High-volume, low-latency image generation, fast iterations, UI previews, “good enough” creative outputs at speed. | Images at 1024px resolution (common default), plus optional text parts. |
| Nano Banana Pro (Preview) | gemini-3-pro-image-preview | Professional asset production, complex instruction following, better text rendering and composition planning. | Up to higher resolutions (including 2K/4K settings depending on config), plus optional text parts. |
How to choose the right model
Choose gemini-2.5-flash-image when you need speed, lots of images, and predictable latency. It’s especially good for apps where users generate multiple drafts and then pick a favorite.
Choose gemini-3-pro-image-preview when you need stricter instruction following: “Put the product on a shelf, include a readable label, render a barcode, keep the background minimal, match the lighting,” and so on. This model is typically better for e-commerce assets, marketing mockups, studio-quality compositions, and anything where details matter.
What you can build with Nano Banana API
Nano Banana is more than “generate a pretty picture.” When you integrate it into a real app, it becomes a reusable visual tool: a function that turns structured intent into usable assets. Here are high-value product use cases.
1) Creator tools (social, marketing, and design)
- Thumbnail & cover generation: You provide title + topic + style and generate a set of variations.
- Brand kits: Prompt templates that keep consistent colors, tone, lighting, and framing across assets.
- Sticker/icon packs: Generate small UI assets with consistent 3D style, outlines, or flat iconography.
- Ad creatives: Generate multiple compositions for A/B testing (within policy and licensing constraints).
2) Product photography and e-commerce assets
- Product shots: A clean studio shot with controlled lighting and consistent angles.
- Background replacements: Keep the product the same, change the setting (e.g., “kitchen counter,” “outdoor lifestyle”).
- Bundle compositions: Place multiple items together in a visually coherent arrangement.
- Localized packaging mockups: When allowed, generate language-specific layouts and variants.
3) Editorial and storytelling workflows
- Storyboards: Generate a sequence of scene frames based on a script or outline.
- Visual explainers: Generate diagrams-like images (non-technical) such as “an isometric office” or “a concept illustration.”
- Character consistency: Create a character reference and iterate on scenes without constantly redesigning from scratch.
4) Developer tooling and internal automation
- Auto-generated UI mockups: “A dashboard in a minimal style…” for internal brainstorming.
- Documentation visuals: Generate icons and illustrations for docs or help centers.
- Content pipelines: Generate images in bulk based on a CMS schedule and review queue.
The most successful products wrap Nano Banana behind a structured UI and policy layer. Users rarely want to “prompt engineer.” They want reliable outcomes: “make this icon,” “edit this photo,” “generate a product background,” and “keep the subject consistent.”
When to choose Nano Banana vs Imagen
Gemini also offers Imagen (a specialized image generation model family) through the Gemini API. A practical rule of thumb:
- Nano Banana (Gemini image models): Best when you want a conversational, instruction-following model that can reason through complex edits and return images as part of a multimodal response.
- Imagen: Often preferred when you want a dedicated image model optimized for certain image-generation use cases, especially when you don’t need the “conversational” multimodal tool behavior.
If your product is “image generation as a tool inside a larger assistant,” Nano Banana is usually the better fit. If your product is “high-throughput image generation pipeline,” you may test both and pick the one that matches quality/cost/latency goals.
Authentication (Gemini API key)
Nano Banana calls are made through the Gemini API. For REST requests, you typically send your API key using: x-goog-api-key: YOUR_KEY. If you use official SDKs, authentication can be handled by the client library based on your environment.
Recommended environment variables
GEMINI_API_KEY="..."
NANO_BANANA_MODEL_FAST="gemini-2.5-flash-image"
NANO_BANANA_MODEL_PRO="gemini-3-pro-image-preview"
APP_PUBLIC_BASE_URL="https://yourapp.com"
DEFAULT_ASPECT_RATIO="1:1"
DEFAULT_IMAGE_SIZE="2K" # for pro model, if supported by your workflow
Endpoints & request shape
A common REST pattern for image generation is calling :generateContent on a model endpoint. Example base endpoint:
POST https://generativelanguage.googleapis.com/v1beta/models/{MODEL_ID}:generateContent
Headers:
x-goog-api-key: $GEMINI_API_KEY
Content-Type: application/json
In a minimal text-to-image request, you send a contents array with a text part. The model can respond with both text parts and image parts. In many SDK examples, you loop through response parts and detect whether a part is text or inline image bytes.
Understanding the response (TEXT + IMAGE)
Gemini responses are structured as candidates containing a content object with parts. A “part” can be:
- Text (helpful for instructions, captions, or explanations), and/or
- Inline image data (often base64 in JSON, or raw bytes in some SDK helpers).
In production, treat every response as potentially mixed: a model might return a short text note plus an image, or multiple images, or in rare cases text only (for example, if an image request is blocked by policy and the model instead explains why).
Robust response-handling checklist
- Always iterate through parts; do not assume the image is always “the first part.”
- Handle multiple images (store each with an index).
- Store the text output (useful for logs, captions, or debugging), but do not expose internal reasoning.
- If an image is missing, surface a user-friendly error and allow retry or prompt adjustment.
Quickstart (Python / JavaScript / REST)
Below are practical quickstarts you can adapt into your app. They follow the same pattern: choose a model ID, send a prompt, and save the returned image.
Python (SDK-style)
from google import genai
from google.genai import types
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")
prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents=[prompt],
)
# Response can include both text and image parts.
for part in response.parts:
if getattr(part, "text", None) is not None:
print(part.text)
elif getattr(part, "inline_data", None) is not None:
image = part.as_image() # helper
image.save("nano_banana.png")
JavaScript (Node.js)
import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme";
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-image",
contents: prompt,
});
for (const part of response.candidates[0].content.parts) {
if (part.text) console.log(part.text);
if (part.inlineData?.data) {
const buffer = Buffer.from(part.inlineData.data, "base64");
fs.writeFileSync("nano_banana.png", buffer);
console.log("Saved nano_banana.png");
}
}
REST (curl)
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [
{"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}
]
}]
}'
Text-to-image (how to get consistently good results)
Text-to-image is the “hello world” of Nano Banana. But shipping a reliable product requires more than a prompt box. The best approach is to provide structured controls and generate a high-quality prompt behind the scenes.
What a strong prompt includes
- Subject: the main object/person/scene (e.g., “a minimalist smartwatch on a table”).
- Setting: environment details (studio, outdoors, cafe, office, nature).
- Lighting: softbox, golden hour, neon, dramatic rim light.
- Camera framing: macro close-up, wide shot, isometric, top-down.
- Style constraints: “no text,” “clean background,” “photorealistic,” or “flat vector style.”
Prompt templates (copy/paste)
1) Studio product shot:
"Professional studio photo of [PRODUCT], centered on a clean white background, softbox lighting, crisp reflections, premium commercial look, high detail, no text, no watermark."
2) App icon / sticker:
"A cute [SUBJECT] icon on a white background, colorful tactile 3D style, clean outline, high contrast, no text."
3) Cinematic scene:
"Cinematic photo of [SUBJECT] in [LOCATION] at [TIME OF DAY], shallow depth of field, subtle film grain, realistic lighting, natural composition, no text overlays."
In a SaaS product, you can keep a consistent “brand prompt prefix” and append user intent. This yields more predictable output and fewer support tickets.
Image editing (instructed edits)
A huge reason developers choose Nano Banana is that it can edit images using natural language instructions. The common workflow looks like this:
- User uploads an image (you store it securely).
- User types an instruction: “Change the background to a cozy cafe,” or “Make it sunset lighting,” or “Remove the logo.”
- Your backend sends the original image plus the instruction to Gemini.
- The response returns an edited image (plus optional text notes).
Editing guidance that improves success rate
- Be specific: “Replace the wall with a white brick wall” beats “make it nicer.”
- One change at a time: do multiple edits in steps instead of one mega-instruction.
- Protect the subject: include constraints like “keep the product unchanged” or “preserve face identity.”
- Avoid conflicting instructions: “dark night scene” + “bright daylight lighting” creates randomness.
Multi-image & composition workflows
For advanced use cases—like combining multiple photos into a single composition—Nano Banana Pro (Preview) is often the better choice. Multi-image workflows are powerful in marketing and design: you can ask the model to create a scene that includes multiple objects, maintain consistency across people, or blend content in a realistic way.
In real products, you usually need “guardrails” around multi-image inputs:
- Limit the number of images per request for latency and cost.
- Resize inputs consistently and strip metadata when appropriate.
- Apply policy checks to user uploads (especially photos of people).
- Provide a review step before publishing or sharing the output.
Practical examples
- Product bundle: Combine product A + product B into a clean studio shot with matching lighting.
- Before/after: Keep the same scene but change style, lighting, and “mood.”
- Campaign set: Generate 10 images with consistent character identity and style for a marketing carousel.
imageConfig: aspectRatio & imageSize
Gemini’s image generation supports an imageConfig object inside your generation config. Two of the most useful controls are:
- aspectRatio — e.g., "1:1", "16:9", "9:16".
- imageSize — e.g., "2K" (often used with the pro preview model), plus higher options depending on model support.
JavaScript example with imageConfig
const response = await ai.models.generateContent({
model: "gemini-3-pro-image-preview",
contents: "A premium product photo of a modern black smartwatch on a shelf in a designer store. No other text.",
config: {
imageConfig: {
aspectRatio: "16:9",
imageSize: "2K"
}
}
});
REST example with imageConfig
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"contents": [{
"parts": [
{"text": "Create a clean isometric photo of a modern office interior, perfectly aligned, no text."}
]
}],
"generationConfig": {
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}'
Tokens & resolution table (practical budgeting)
A key part of integrating Nano Banana in production is predicting cost and performance. The official docs include a helpful table showing how token usage can map to aspect ratio and resolution. Even if your billing plan varies, this is useful for internal budgeting and UI estimation.
| Aspect ratio | 1K resolution | 1K tokens | 2K resolution | 2K tokens | 4K resolution | 4K tokens |
|---|---|---|---|---|---|---|
| 1:1 | 1024×1024 | 1120 | 2048×2048 | 1120 | 4096×4096 | 2000 |
| 2:3 | 848×1264 | 1120 | 1696×2528 | 1120 | 3392×5056 | 2000 |
| 3:2 | 1264×848 | 1120 | 2528×1696 | 1120 | 5056×3392 | 2000 |
| 3:4 | 896×1200 | 1120 | 1792×2400 | 1120 | 3584×4800 | 2000 |
How do you use this table in a product? You translate it into a “cost estimate” UI: if a user picks 4K, show that it consumes more tokens (and therefore more cost) than 1K/2K in many setups. If you sell credits, you can convert tokens into credits and show a friendly “Estimated credits” label.
SynthID watermark & policy implications
All images generated by Nano Banana models include a SynthID watermark. SynthID is designed to help identify AI-generated content. For developers, this matters in three ways:
- User expectations: If your users want “no watermark,” you should not promise that for Nano Banana outputs.
- Compliance: Watermarking can be important for transparency, elections policies, and media provenance.
- Pipeline design: If your downstream workflow includes re-encoding or heavy processing, you should understand how it affects provenance signals and labels.
Prompting for quality & control (the “ship it” playbook)
Great results in demos often come from a human iterating prompts. Great results in products come from a prompt system: templates, constraints, defaults, and automated checks. Here’s a framework that consistently improves output quality and reduces randomness.
1) Use a structured prompt template
Give your model a predictable format. Instead of a single sentence, build prompts like:
[GOAL]
Create a single image that matches the following.
[SUBJECT]
- Main subject: ...
- Keep subject consistent: ...
[SCENE]
- Location: ...
- Background: ...
- Props: ...
[CAMERA & LIGHTING]
- Shot type: ...
- Lighting: ...
- Depth of field: ...
[STYLE]
- Style: ...
- Color palette: ...
- Texture: ...
[CONSTRAINTS]
- No text unless requested
- Clean composition
- No logos/watermarks in-scene
2) Keep “no text” as a default unless the user wants text
Text rendering can be great with the pro preview model, but it’s still a specialized request. For most product shots and icons, “no text” reduces failure modes and improves consistency. When the user does want text (like a label or magazine cover), switch to Nano Banana Pro and ask for explicit wording, font style, and placement.
3) Use negative prompts sparingly
Negative prompts can help (e.g., “no watermark, no blurry, no distorted face”), but overly long negative prompts sometimes make results worse. Start with a short default set and let power users customize.
4) Provide “variation” controls without forcing users to retype
Variation is a normal part of creative workflows. Implement a “Regenerate variations” button that keeps the same structured settings (model, ratio, style) and only changes the seed or phrasing slightly. This creates a clean UX and keeps results on-brand.
Safety, moderation & compliance
If your product lets users generate or edit images, you need safety controls—both for policy compliance and for product trust. A solid system uses layered safeguards:
Layer 1: Input validation
- Reject extremely long prompts or suspicious repetitive prompts.
- Block disallowed content requests (sexual content involving minors, self-harm imagery, explicit violence, hate symbols, etc.).
- Enforce consent rules for user-uploaded photos of people (especially private individuals).
Layer 2: Provider safety filters
Gemini models have built-in safety behavior. In practice, this means some prompts will return explanations instead of images. Your app should treat “no image returned” as a safe failure: show a friendly message and allow the user to adjust the request.
Layer 3: Output review & reporting
- Add a “Report” button for outputs.
- Keep an internal review queue if users can publish publicly through your platform.
- Log prompts and outputs (securely, with access control) for abuse investigation.
Pricing patterns & cost control
Pricing for Nano Banana usage depends on your Gemini API plan and the model used. Regardless of exact pricing, you can build a robust cost-control system with a few proven principles:
Cost-control levers that work
- Resolution gating: Draft at 1K/2K, export at 4K only for paid tiers.
- Model gating: Use Flash Image for drafts, Pro Preview for finals or text-heavy compositions.
- Quota limits: Per-user daily image caps; per-workspace monthly caps.
- Batching: For internal automation, batch similar requests and review results together.
- Template reuse: For campaigns, reuse a “base prompt” and only vary the product name or minor scene details.
How to show pricing honestly in your UI
Users hate surprise costs. The simplest UI is:
- Show: Model + Resolution + Aspect Ratio + Estimated usage.
- Offer: Draft vs Final buttons.
- For teams: add usage dashboards (images generated, estimated tokens, top users).
Rate limits, retries & reliability
Image generation APIs must handle spikes. Even if you don’t hit explicit rate limits, you will see occasional transient errors: timeouts, 429s, or short-lived service disruptions. Design for reliability from day one.
Recommended production defaults
- Backoff retries: Retry 429, 500, 503, and network timeouts using exponential backoff with jitter.
- No retries on validation errors: If the prompt is too long or payload invalid, fix the request.
- Per-user throttles: Prevent one user from spamming thousands of requests and blowing your budget.
- Queue-based generation: For high volume, enqueue jobs and process with controlled concurrency.
// Pseudo logic for robust retries:
for attempt in 1..4:
resp = callGemini()
if resp.ok: return resp
if resp.status in [429, 500, 503] or timeout:
sleep(exponentialBackoffWithJitter(attempt))
continue
else:
// 4xx validation errors: do not retry
throw resp.error
Production architecture (what scales safely)
The biggest mistake is calling Gemini directly from a public client with a real API key. The correct architecture is:
Client → Your Backend → Gemini API → Your Backend → Client
Why your backend is essential
- Key security: Keeps your Gemini API key private.
- Abuse prevention: Rate limits, quotas, bot detection, and plan enforcement.
- Cost control: Hard caps and alerts when users spike usage.
- Policy enforcement: Pre-check prompts and uploaded images, block disallowed content.
- Durable storage: Save generated images to your own storage/CDN for consistent access.
Suggested system components
- API gateway (auth, quotas, input validation)
- Generation service (calls Gemini, normalizes responses)
- Queue + workers (controls concurrency for high volume)
- Object storage + CDN (stores output images and serves them fast)
- Database (prompts, settings, costs, user plans, audit logs)
- Moderation tools (review queue, user reporting, enforcement)
Minimal job state machine
| State | Meaning | UI display |
|---|---|---|
| CREATED | Job stored, not processed yet | Preparing… |
| RUNNING | Backend calling Gemini and waiting for response | Generating… |
| STORING | Saving outputs to storage / CDN | Finalizing… |
| SUCCEEDED | Image stored and ready | Ready |
| FAILED | Error occurred | Failed (retry) |
Logging, QA & monitoring
To operate Nano Banana in production, you need observability. The goal is to answer: “Is the system healthy?” and “Why did this request fail?” without guessing.
Metrics to track
- Latency: time from request → response.
- Success rate: percent of requests returning images.
- Safety blocks: how often prompts are refused or filtered.
- Usage: images generated per user/day, token estimates per job, 4K exports per workspace.
- Cost anomalies: spikes by user, IP, or automation scripts.
Quality assurance that actually helps
Create a “golden set” of prompts (and optionally reference images) that represent your product’s core use cases: icons, product shots, backgrounds, character scenes, and any text-rendering scenarios you support. Run this set on a schedule and compare outputs. This catches regressions when models, settings, or templates change.
FAQ
Official links & resources
- Gemini API image generation docs (Nano Banana): ai.google.dev/gemini-api/docs/image-generation
- Google AI Studio (try models): aistudio.google.com
- Gemini image generation overview (Nano Banana Pro): gemini.google/overview/image-generation
- Gemini API base endpoint (REST): https://generativelanguage.googleapis.com/v1beta/
Changelog
- Initial publication of Nano Banana API guide (models, endpoints, imageConfig, tokens table, SynthID, production patterns).