Design: white + black Platform: Gemini API Models: 2.5 Flash Image + 3 Pro Image Preview Modes: generate + edit Output: TEXT + IMAGE parts

Nano Banana API (Gemini): Complete Developer Guide

Nano Banana is the nickname for Gemini’s native image generation capability. If you’ve seen people talk about “Nano Banana API,” they usually mean: “How do I generate or edit images using Gemini models that return actual images in the response?” This guide explains the two Nano Banana models, how to call them via the Gemini API, how to handle mixed TEXT + IMAGE outputs, and how to ship this reliably in production with safety, cost control, and good UX.

Nano Banana refers to two distinct Gemini API models: gemini-2.5-flash-image (fast, efficient) and gemini-3-pro-image-preview (pro preview, more complex instruction following). All generated images include a SynthID watermark.

What is Nano Banana API?

“Nano Banana API” is an informal name used by developers and creators to describe Gemini’s built-in image generation and editing functionality exposed through the Gemini API (Google AI for Developers). Unlike older “text-only” chat endpoints, Nano Banana calls can return actual image bytes (often Base64-encoded) inside the model’s output. This enables a conversational workflow where you can:

  • Generate images from text (text-to-image).
  • Edit an existing image using natural language instructions.
  • Blend or compose multiple images to create consistent scenes or product shots (depending on model).
  • Iterate in a single conversation: generate → tweak → regenerate → refine.

If you’re building a product, Nano Banana is most valuable when you need both: (1) a developer API that can be integrated into workflows, and (2) a model that follows instructions well enough to behave like a creative “tool” rather than a random image slot machine.

Terminology note: The name “Nano Banana” can appear in many third-party sites. In official Gemini docs, it specifically maps to the Gemini image models and their model IDs. For production, treat model IDs and capabilities from official docs as the source of truth.

Models: Nano Banana vs Nano Banana Pro

In the Gemini API, “Nano Banana” refers to two image-capable models. The easiest way to think about them is: speed vs fidelity and instruction-following.

Brand name Gemini model ID Best for Typical output
Nano Banana gemini-2.5-flash-image High-volume, low-latency image generation, fast iterations, UI previews, “good enough” creative outputs at speed. Images at 1024px resolution (common default), plus optional text parts.
Nano Banana Pro (Preview) gemini-3-pro-image-preview Professional asset production, complex instruction following, better text rendering and composition planning. Up to higher resolutions (including 2K/4K settings depending on config), plus optional text parts.

How to choose the right model

Choose gemini-2.5-flash-image when you need speed, lots of images, and predictable latency. It’s especially good for apps where users generate multiple drafts and then pick a favorite.

Choose gemini-3-pro-image-preview when you need stricter instruction following: “Put the product on a shelf, include a readable label, render a barcode, keep the background minimal, match the lighting,” and so on. This model is typically better for e-commerce assets, marketing mockups, studio-quality compositions, and anything where details matter.

Pro tip: Many products use a two-step pipeline: Draft with Nano Banana (Flash Image) → Finalize with Nano Banana Pro (3 Pro Image Preview). This keeps costs down while still delivering premium “final exports.”

What you can build with Nano Banana API

Nano Banana is more than “generate a pretty picture.” When you integrate it into a real app, it becomes a reusable visual tool: a function that turns structured intent into usable assets. Here are high-value product use cases.

1) Creator tools (social, marketing, and design)

  • Thumbnail & cover generation: You provide title + topic + style and generate a set of variations.
  • Brand kits: Prompt templates that keep consistent colors, tone, lighting, and framing across assets.
  • Sticker/icon packs: Generate small UI assets with consistent 3D style, outlines, or flat iconography.
  • Ad creatives: Generate multiple compositions for A/B testing (within policy and licensing constraints).

2) Product photography and e-commerce assets

  • Product shots: A clean studio shot with controlled lighting and consistent angles.
  • Background replacements: Keep the product the same, change the setting (e.g., “kitchen counter,” “outdoor lifestyle”).
  • Bundle compositions: Place multiple items together in a visually coherent arrangement.
  • Localized packaging mockups: When allowed, generate language-specific layouts and variants.

3) Editorial and storytelling workflows

  • Storyboards: Generate a sequence of scene frames based on a script or outline.
  • Visual explainers: Generate diagrams-like images (non-technical) such as “an isometric office” or “a concept illustration.”
  • Character consistency: Create a character reference and iterate on scenes without constantly redesigning from scratch.

4) Developer tooling and internal automation

  • Auto-generated UI mockups: “A dashboard in a minimal style…” for internal brainstorming.
  • Documentation visuals: Generate icons and illustrations for docs or help centers.
  • Content pipelines: Generate images in bulk based on a CMS schedule and review queue.

The most successful products wrap Nano Banana behind a structured UI and policy layer. Users rarely want to “prompt engineer.” They want reliable outcomes: “make this icon,” “edit this photo,” “generate a product background,” and “keep the subject consistent.”


When to choose Nano Banana vs Imagen

Gemini also offers Imagen (a specialized image generation model family) through the Gemini API. A practical rule of thumb:

  • Nano Banana (Gemini image models): Best when you want a conversational, instruction-following model that can reason through complex edits and return images as part of a multimodal response.
  • Imagen: Often preferred when you want a dedicated image model optimized for certain image-generation use cases, especially when you don’t need the “conversational” multimodal tool behavior.

If your product is “image generation as a tool inside a larger assistant,” Nano Banana is usually the better fit. If your product is “high-throughput image generation pipeline,” you may test both and pick the one that matches quality/cost/latency goals.


Authentication (Gemini API key)

Nano Banana calls are made through the Gemini API. For REST requests, you typically send your API key using: x-goog-api-key: YOUR_KEY. If you use official SDKs, authentication can be handled by the client library based on your environment.

Never expose your Gemini API key in a public frontend. If you’re building a web or mobile app, put the API calls behind your own backend. Your backend can enforce quotas, prevent abuse, and keep billing safe.

Recommended environment variables

GEMINI_API_KEY="..."
NANO_BANANA_MODEL_FAST="gemini-2.5-flash-image"
NANO_BANANA_MODEL_PRO="gemini-3-pro-image-preview"

APP_PUBLIC_BASE_URL="https://yourapp.com"
DEFAULT_ASPECT_RATIO="1:1"
DEFAULT_IMAGE_SIZE="2K"   # for pro model, if supported by your workflow

Endpoints & request shape

A common REST pattern for image generation is calling :generateContent on a model endpoint. Example base endpoint:

POST https://generativelanguage.googleapis.com/v1beta/models/{MODEL_ID}:generateContent
Headers:
  x-goog-api-key: $GEMINI_API_KEY
  Content-Type: application/json

In a minimal text-to-image request, you send a contents array with a text part. The model can respond with both text parts and image parts. In many SDK examples, you loop through response parts and detect whether a part is text or inline image bytes.

Key idea: Nano Banana is “native image generation” inside Gemini. You’re not calling a separate “image endpoint.” You’re calling Gemini’s content generation endpoint, and the response may include images.

Understanding the response (TEXT + IMAGE)

Gemini responses are structured as candidates containing a content object with parts. A “part” can be:

  • Text (helpful for instructions, captions, or explanations), and/or
  • Inline image data (often base64 in JSON, or raw bytes in some SDK helpers).

In production, treat every response as potentially mixed: a model might return a short text note plus an image, or multiple images, or in rare cases text only (for example, if an image request is blocked by policy and the model instead explains why).

Robust response-handling checklist

  1. Always iterate through parts; do not assume the image is always “the first part.”
  2. Handle multiple images (store each with an index).
  3. Store the text output (useful for logs, captions, or debugging), but do not expose internal reasoning.
  4. If an image is missing, surface a user-friendly error and allow retry or prompt adjustment.

Quickstart (Python / JavaScript / REST)

Below are practical quickstarts you can adapt into your app. They follow the same pattern: choose a model ID, send a prompt, and save the returned image.

Python (SDK-style)

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_GEMINI_API_KEY")

prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[prompt],
)

# Response can include both text and image parts.
for part in response.parts:
    if getattr(part, "text", None) is not None:
        print(part.text)
    elif getattr(part, "inline_data", None) is not None:
        image = part.as_image()  # helper
        image.save("nano_banana.png")

JavaScript (Node.js)

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme";

const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-image",
  contents: prompt,
});

for (const part of response.candidates[0].content.parts) {
  if (part.text) console.log(part.text);
  if (part.inlineData?.data) {
    const buffer = Buffer.from(part.inlineData.data, "base64");
    fs.writeFileSync("nano_banana.png", buffer);
    console.log("Saved nano_banana.png");
  }
}

REST (curl)

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}
      ]
    }]
  }'
Production tip: Don’t call Gemini directly from the browser with your real key. Use your backend to proxy requests and enforce limits.

Text-to-image (how to get consistently good results)

Text-to-image is the “hello world” of Nano Banana. But shipping a reliable product requires more than a prompt box. The best approach is to provide structured controls and generate a high-quality prompt behind the scenes.

What a strong prompt includes

  • Subject: the main object/person/scene (e.g., “a minimalist smartwatch on a table”).
  • Setting: environment details (studio, outdoors, cafe, office, nature).
  • Lighting: softbox, golden hour, neon, dramatic rim light.
  • Camera framing: macro close-up, wide shot, isometric, top-down.
  • Style constraints: “no text,” “clean background,” “photorealistic,” or “flat vector style.”

Prompt templates (copy/paste)

1) Studio product shot:
"Professional studio photo of [PRODUCT], centered on a clean white background, softbox lighting, crisp reflections, premium commercial look, high detail, no text, no watermark."

2) App icon / sticker:
"A cute [SUBJECT] icon on a white background, colorful tactile 3D style, clean outline, high contrast, no text."

3) Cinematic scene:
"Cinematic photo of [SUBJECT] in [LOCATION] at [TIME OF DAY], shallow depth of field, subtle film grain, realistic lighting, natural composition, no text overlays."

In a SaaS product, you can keep a consistent “brand prompt prefix” and append user intent. This yields more predictable output and fewer support tickets.


Image editing (instructed edits)

A huge reason developers choose Nano Banana is that it can edit images using natural language instructions. The common workflow looks like this:

  1. User uploads an image (you store it securely).
  2. User types an instruction: “Change the background to a cozy cafe,” or “Make it sunset lighting,” or “Remove the logo.”
  3. Your backend sends the original image plus the instruction to Gemini.
  4. The response returns an edited image (plus optional text notes).

Editing guidance that improves success rate

  • Be specific: “Replace the wall with a white brick wall” beats “make it nicer.”
  • One change at a time: do multiple edits in steps instead of one mega-instruction.
  • Protect the subject: include constraints like “keep the product unchanged” or “preserve face identity.”
  • Avoid conflicting instructions: “dark night scene” + “bright daylight lighting” creates randomness.
Best practice: Provide an “Edit scope” control in your UI: “Only background” vs “Whole image” vs “Subject only.” Even if your API payload doesn’t include a formal mask, you can translate this into prompt constraints and reduce unwanted changes.

Multi-image & composition workflows

For advanced use cases—like combining multiple photos into a single composition—Nano Banana Pro (Preview) is often the better choice. Multi-image workflows are powerful in marketing and design: you can ask the model to create a scene that includes multiple objects, maintain consistency across people, or blend content in a realistic way.

In real products, you usually need “guardrails” around multi-image inputs:

  • Limit the number of images per request for latency and cost.
  • Resize inputs consistently and strip metadata when appropriate.
  • Apply policy checks to user uploads (especially photos of people).
  • Provide a review step before publishing or sharing the output.

Practical examples

  • Product bundle: Combine product A + product B into a clean studio shot with matching lighting.
  • Before/after: Keep the same scene but change style, lighting, and “mood.”
  • Campaign set: Generate 10 images with consistent character identity and style for a marketing carousel.

imageConfig: aspectRatio & imageSize

Gemini’s image generation supports an imageConfig object inside your generation config. Two of the most useful controls are:

  • aspectRatio — e.g., "1:1", "16:9", "9:16".
  • imageSize — e.g., "2K" (often used with the pro preview model), plus higher options depending on model support.

JavaScript example with imageConfig

const response = await ai.models.generateContent({
  model: "gemini-3-pro-image-preview",
  contents: "A premium product photo of a modern black smartwatch on a shelf in a designer store. No other text.",
  config: {
    imageConfig: {
      aspectRatio: "16:9",
      imageSize: "2K"
    }
  }
});

REST example with imageConfig

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Create a clean isometric photo of a modern office interior, perfectly aligned, no text."}
      ]
    }],
    "generationConfig": {
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "2K"
      }
    }
  }'
Compatibility note: Not all options apply to all models. If you set an unsupported size/ratio, you may get an error or a fallback behavior. Implement validation (and sensible defaults) in your backend.

Tokens & resolution table (practical budgeting)

A key part of integrating Nano Banana in production is predicting cost and performance. The official docs include a helpful table showing how token usage can map to aspect ratio and resolution. Even if your billing plan varies, this is useful for internal budgeting and UI estimation.

Aspect ratio 1K resolution 1K tokens 2K resolution 2K tokens 4K resolution 4K tokens
1:1 1024×1024 1120 2048×2048 1120 4096×4096 2000
2:3 848×1264 1120 1696×2528 1120 3392×5056 2000
3:2 1264×848 1120 2528×1696 1120 5056×3392 2000
3:4 896×1200 1120 1792×2400 1120 3584×4800 2000

How do you use this table in a product? You translate it into a “cost estimate” UI: if a user picks 4K, show that it consumes more tokens (and therefore more cost) than 1K/2K in many setups. If you sell credits, you can convert tokens into credits and show a friendly “Estimated credits” label.

Two-tier strategy: Default users to 1K/2K for drafts. Reserve 4K for paid tiers or explicit final exports. This is the simplest and most effective cost-control lever for image generation products.

SynthID watermark & policy implications

All images generated by Nano Banana models include a SynthID watermark. SynthID is designed to help identify AI-generated content. For developers, this matters in three ways:

  1. User expectations: If your users want “no watermark,” you should not promise that for Nano Banana outputs.
  2. Compliance: Watermarking can be important for transparency, elections policies, and media provenance.
  3. Pipeline design: If your downstream workflow includes re-encoding or heavy processing, you should understand how it affects provenance signals and labels.
Don’t market around removing provenance. Build your product around disclosure and responsible use: “AI-generated” labels, transparent policies, and moderation.

Prompting for quality & control (the “ship it” playbook)

Great results in demos often come from a human iterating prompts. Great results in products come from a prompt system: templates, constraints, defaults, and automated checks. Here’s a framework that consistently improves output quality and reduces randomness.

1) Use a structured prompt template

Give your model a predictable format. Instead of a single sentence, build prompts like:

[GOAL]
Create a single image that matches the following.

[SUBJECT]
- Main subject: ...
- Keep subject consistent: ...

[SCENE]
- Location: ...
- Background: ...
- Props: ...

[CAMERA & LIGHTING]
- Shot type: ...
- Lighting: ...
- Depth of field: ...

[STYLE]
- Style: ...
- Color palette: ...
- Texture: ...

[CONSTRAINTS]
- No text unless requested
- Clean composition
- No logos/watermarks in-scene

2) Keep “no text” as a default unless the user wants text

Text rendering can be great with the pro preview model, but it’s still a specialized request. For most product shots and icons, “no text” reduces failure modes and improves consistency. When the user does want text (like a label or magazine cover), switch to Nano Banana Pro and ask for explicit wording, font style, and placement.

3) Use negative prompts sparingly

Negative prompts can help (e.g., “no watermark, no blurry, no distorted face”), but overly long negative prompts sometimes make results worse. Start with a short default set and let power users customize.

4) Provide “variation” controls without forcing users to retype

Variation is a normal part of creative workflows. Implement a “Regenerate variations” button that keeps the same structured settings (model, ratio, style) and only changes the seed or phrasing slightly. This creates a clean UX and keeps results on-brand.


Safety, moderation & compliance

If your product lets users generate or edit images, you need safety controls—both for policy compliance and for product trust. A solid system uses layered safeguards:

Layer 1: Input validation

  • Reject extremely long prompts or suspicious repetitive prompts.
  • Block disallowed content requests (sexual content involving minors, self-harm imagery, explicit violence, hate symbols, etc.).
  • Enforce consent rules for user-uploaded photos of people (especially private individuals).

Layer 2: Provider safety filters

Gemini models have built-in safety behavior. In practice, this means some prompts will return explanations instead of images. Your app should treat “no image returned” as a safe failure: show a friendly message and allow the user to adjust the request.

Layer 3: Output review & reporting

  • Add a “Report” button for outputs.
  • Keep an internal review queue if users can publish publicly through your platform.
  • Log prompts and outputs (securely, with access control) for abuse investigation.
Practical policy UI: Provide short guidance near the prompt box: “Don’t upload photos you don’t have rights to use,” “No harmful or deceptive content,” and “AI-generated images will include provenance.” This reduces policy violations and support burden.

Pricing patterns & cost control

Pricing for Nano Banana usage depends on your Gemini API plan and the model used. Regardless of exact pricing, you can build a robust cost-control system with a few proven principles:

Cost-control levers that work

  • Resolution gating: Draft at 1K/2K, export at 4K only for paid tiers.
  • Model gating: Use Flash Image for drafts, Pro Preview for finals or text-heavy compositions.
  • Quota limits: Per-user daily image caps; per-workspace monthly caps.
  • Batching: For internal automation, batch similar requests and review results together.
  • Template reuse: For campaigns, reuse a “base prompt” and only vary the product name or minor scene details.

How to show pricing honestly in your UI

Users hate surprise costs. The simplest UI is:

  • Show: Model + Resolution + Aspect Ratio + Estimated usage.
  • Offer: Draft vs Final buttons.
  • For teams: add usage dashboards (images generated, estimated tokens, top users).
Don’t guess exact prices on-page unless you’re pulling them from your live billing system. Pricing can change. The safest approach is: “See current pricing in your account dashboard,” and show “estimated usage” locally.

Rate limits, retries & reliability

Image generation APIs must handle spikes. Even if you don’t hit explicit rate limits, you will see occasional transient errors: timeouts, 429s, or short-lived service disruptions. Design for reliability from day one.

Recommended production defaults

  • Backoff retries: Retry 429, 500, 503, and network timeouts using exponential backoff with jitter.
  • No retries on validation errors: If the prompt is too long or payload invalid, fix the request.
  • Per-user throttles: Prevent one user from spamming thousands of requests and blowing your budget.
  • Queue-based generation: For high volume, enqueue jobs and process with controlled concurrency.
// Pseudo logic for robust retries:
for attempt in 1..4:
  resp = callGemini()
  if resp.ok: return resp
  if resp.status in [429, 500, 503] or timeout:
    sleep(exponentialBackoffWithJitter(attempt))
    continue
  else:
    // 4xx validation errors: do not retry
    throw resp.error
User experience tip: If an image request is queued or slow, show progress states: “Starting,” “Generating,” “Finalizing.” People are more patient when they see a clear status.

Production architecture (what scales safely)

The biggest mistake is calling Gemini directly from a public client with a real API key. The correct architecture is:

Client → Your Backend → Gemini API → Your Backend → Client

Why your backend is essential

  • Key security: Keeps your Gemini API key private.
  • Abuse prevention: Rate limits, quotas, bot detection, and plan enforcement.
  • Cost control: Hard caps and alerts when users spike usage.
  • Policy enforcement: Pre-check prompts and uploaded images, block disallowed content.
  • Durable storage: Save generated images to your own storage/CDN for consistent access.

Suggested system components

  • API gateway (auth, quotas, input validation)
  • Generation service (calls Gemini, normalizes responses)
  • Queue + workers (controls concurrency for high volume)
  • Object storage + CDN (stores output images and serves them fast)
  • Database (prompts, settings, costs, user plans, audit logs)
  • Moderation tools (review queue, user reporting, enforcement)

Minimal job state machine

State Meaning UI display
CREATED Job stored, not processed yet Preparing…
RUNNING Backend calling Gemini and waiting for response Generating…
STORING Saving outputs to storage / CDN Finalizing…
SUCCEEDED Image stored and ready Ready
FAILED Error occurred Failed (retry)

Logging, QA & monitoring

To operate Nano Banana in production, you need observability. The goal is to answer: “Is the system healthy?” and “Why did this request fail?” without guessing.

Metrics to track

  • Latency: time from request → response.
  • Success rate: percent of requests returning images.
  • Safety blocks: how often prompts are refused or filtered.
  • Usage: images generated per user/day, token estimates per job, 4K exports per workspace.
  • Cost anomalies: spikes by user, IP, or automation scripts.

Quality assurance that actually helps

Create a “golden set” of prompts (and optionally reference images) that represent your product’s core use cases: icons, product shots, backgrounds, character scenes, and any text-rendering scenarios you support. Run this set on a schedule and compare outputs. This catches regressions when models, settings, or templates change.

Be careful with logs: Prompts and user-uploaded images can contain sensitive information. Restrict access, redact where possible, and set retention policies.

FAQ

“Nano Banana” is a nickname used in Gemini documentation and developer communities to describe Gemini’s native image generation capability. In the Gemini API, it maps to specific model IDs like gemini-2.5-flash-image and gemini-3-pro-image-preview.
Nano Banana (Flash Image) is optimized for speed and high-volume usage. Nano Banana Pro (3 Pro Image Preview) is designed for professional asset production and complex instruction following, including stronger composition planning and better text rendering in many cases.
Yes. Nano Banana outputs include a SynthID watermark for AI provenance. Plan your product messaging accordingly and avoid promising “no watermark.”
Yes. A common workflow is to send an existing image along with an instruction (for example, “change the background to a studio setup”), and the model returns an edited image. For best results, make edits step-by-step and include constraints like “keep the subject unchanged.”
Choose based on destination: 1:1 for feeds, 16:9 for web/video thumbnails, 9:16 for stories and short-form vertical content. Provide a ratio selector and keep defaults simple.
Use a draft→final pipeline: generate drafts with Flash Image at 1K/2K, and reserve Pro Preview and 4K exports for paid tiers. Add per-user quotas, rate limits, and a clear “estimated usage” label before generating.

Official links & resources


Changelog

  • Initial publication of Nano Banana API guide (models, endpoints, imageConfig, tokens table, SynthID, production patterns).