Stability AI API - Complete Developer Guide

The Stability AI Platform API is a REST API that allows developers to build applications that generate and transform media using Stability’s models and services. If you are building a creator tool, a marketing automation pipeline, a design assistant, a game studio workflow, a batch renderer, or an internal content system, this API gives you the core primitives: authenticate with an API key, pick an engine/model, send a prompt (and optionally an input image), and receive either image bytes (PNG) or JSON payloads with results.

Last updated: February 2026. Exact engines available to your organization depend on your plan, region, and feature flags. Always verify on your instance/account.

Accuracy note: This guide focuses on the official REST API surfaces and operational behaviors documented by Stability’s API docs and KB. Where model names, endpoints, or credit costs vary across versions, treat this page as a “how to integrate correctly” guide and confirm exact parameters in the official API reference.

1) Overview: what the Stability AI API is, and what you can build

What “Stability AI API” usually means

When developers say “Stability AI API,” they are typically referring to the official Platform API that’s used to:

  • Generate images from text (text-to-image)
  • Transform images (image-to-image)
  • Upscale images (higher resolution / clarity)
  • Mask or inpaint (edit parts of an image using a mask)
  • List available engines and manage account/billing via API

Common real-world use cases

Creator apps & editor plugins

Prompt-to-image inside a web app, Photoshop plugin, Figma-like canvas, or mobile editor with fast previews and final renders.

Marketing & ecommerce automation

Generate product hero images, seasonal variations, ad creative, backgrounds, and A/B test assets automatically.

Game & media pipelines

Concept art generation, batch rendering, asset variations, style exploration, and rapid ideation workflows.

Internal tooling

Teams generate mockups, UI illustrations, storyboards, or internal documentation images using repeatable prompts and templates.

Why teams choose a dedicated image API

An image generation API becomes valuable when you need reliability, guardrails, and operational control: consistent auth, clear error modes, measurable costs, and integrations that can be audited. Stability’s API provides standardized endpoints and rate limiting behavior, making it suitable for production systems where you want to track usage, plan budgets, and build predictable user experiences.

Key operational facts from the official docs
The official REST docs show the base host pattern https://api.stability.ai and endpoints like /v1/user/account, /v1/user/balance, /v1/engines/list, and generation endpoints under /v1/generation. Requests authenticate using an API key passed in the Authorization header as a Bearer token, and the documented rate limit is 150 requests per 10 seconds.

2) Getting started: keys, base URL, and your first successful call

Base URL and API host

The REST API is hosted under a Stability domain and, in the official docs, examples default to a base host of:

https://api.stability.ai

Most examples also allow an API_HOST override so you can point at a different host (for example, a staging host, regional host, or an enterprise endpoint). In production systems, make the host configurable by environment (dev/staging/prod) so you can upgrade safely.

Store your API key safely

Your API key is the credential that authorizes requests and charges usage to your account/organization. Treat it like a password:

  • Store it in a secrets manager or CI secret store
  • Never commit it to Git
  • Never embed it in client-side JavaScript (browser apps)
  • Rotate it on a schedule and immediately if exposed

First call: “Who am I?” (Account) and “How many credits do I have?” (Balance)

The best first two calls are account and balance because they confirm authentication and help you validate billing scope before you generate anything. In the official REST docs, these endpoints exist under /v1/user.

Get account

curl -sS https://api.stability.ai/v1/user/account \
  -H "Authorization: Bearer $STABILITY_API_KEY" \
  -H "Accept: application/json"

Get credit balance

curl -sS https://api.stability.ai/v1/user/balance \
  -H "Authorization: Bearer $STABILITY_API_KEY" \
  -H "Accept: application/json"

If you receive 401, your key is missing or invalid. If you receive 429, you are sending too many requests too quickly. If you receive 5xx, the service may be experiencing an incident—check the provider’s status page and retry with backoff.

3) Authentication: Bearer tokens, organizations, and client identification

Bearer token auth (the standard pattern)

In the official REST docs, requests authenticate by including your Stability API key in the Authorization header as a Bearer token. This means your header should look like:

Authorization: Bearer YOUR_STABILITY_API_KEY

Organization scoping

Some endpoints accept an Organization header that lets you scope requests to an organization other than your default. This matters if your user belongs to multiple orgs, you run a multi-tenant integration, or you want strict cost allocation across teams.

Organization: org-123456

Client ID / Client Version headers (recommended for clarity)

The REST docs show optional headers like Stability-Client-ID and Stability-Client-Version. These are useful for:

  • Debugging: quickly identify which app is causing errors
  • Billing clarity: segment usage by product or internal service
  • Support requests: provide clear client metadata when reporting issues
Stability-Client-ID: my-app
Stability-Client-Version: 1.2.1
Do not expose your API key to browsers
If you’re building a web app, create a backend endpoint that signs requests using the key. Your frontend should call your backend, not Stability directly. This prevents key leakage, allows throttling, and lets you enforce safety filters and budgets.

Recommended auth architecture (frontend vs backend)

Frontend (browser/mobile)

Uses your app’s auth (sessions/JWT). Calls your backend for generation jobs. Never holds Stability keys.

Backend (server)

Holds Stability keys, enforces rate limits, validates prompts, stores job state, and calls Stability endpoints.

4) Endpoint map: account, engines, and generation

The official REST API docs (v1) include these core groups:

Group Typical endpoints Use case Notes
User /v1/user/account
/v1/user/balance
Identity verification and credit checks Best first calls; helps you enforce budget gates.
Engines /v1/engines/list Discover which engines/models you can use Engine availability depends on your org/plan.
Generation /v1/generation/{engine_id}/text-to-image
/v1/generation/{engine_id}/image-to-image
/v1/generation/{engine_id}/image-to-image/upscale
/v1/generation/{engine_id}/image-to-image/masking
Create images, transform images, upscale, and mask/inpaint Most product experiences are built here.

Engine discovery: why you should list engines dynamically

Many developers hardcode an “engine_id” and ship it. That works until:

  • The engine name changes or becomes deprecated
  • A new default engine becomes available to your org
  • Your plan changes and certain engines are no longer enabled
  • You want to A/B test engines by user tier or use case

Instead, build a startup “capabilities discovery” step:

  1. Call /v1/engines/list once per environment boot or daily cache refresh
  2. Store engines in your DB/cache with metadata and a safe allowlist
  3. Select engine dynamically depending on prompt, resolution, or user plan

List engines (cURL)

curl -sS https://api.stability.ai/v1/engines/list \
  -H "Authorization: Bearer $STABILITY_API_KEY" \
  -H "Accept: application/json"

Response formats: JSON vs image/png

For some endpoints, the docs show an Accept header where you can choose JSON or a PNG response:

Accept: application/json
# or
Accept: image/png

In production, JSON responses are often preferable because they can include additional metadata and multiple artifacts. Returning raw PNG is convenient for simple services, but you’ll usually still store metadata (engine, seed, params, cost) in your system.

5) Image generation fundamentals (text-to-image)

Text-to-image request anatomy

A typical text-to-image request includes:

  • Dimensions (width, height) — usually multiples of 64
  • Prompts — an array of prompt objects (often with weights)
  • Guidance (e.g., CFG scale) — how strongly the output adheres to the prompt
  • Sampler/steps — affects quality, compute cost, and runtime
  • Seed — enables reproducibility (when supported)
  • Output format — JSON payload or image bytes
Design principle: “interactive preview” vs “final render”
Most successful products provide a fast preview mode (lower steps, smaller size, cheaper engine) and a final mode (higher steps, higher resolution). That reduces cost and improves perceived speed.

Example: text-to-image (cURL skeleton)

Replace {engine_id} with an engine from /v1/engines/list. Parameters vary by engine/version—confirm in the docs for the engine you use.

curl -sS "https://api.stability.ai/v1/generation/{engine_id}/text-to-image" \
  -H "Authorization: Bearer $STABILITY_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "width": 1024,
    "height": 1024,
    "text_prompts": [
      { "text": "A clean product photo of a ceramic mug on a white table, soft studio lighting", "weight": 1 }
    ],
    "cfg_scale": 7,
    "steps": 30
  }'

Prompting: the “structured prompt” method

While you can write a single sentence, production results improve when you use a consistent structure:

  1. Subject: what is in the scene (object/person/landscape)
  2. Composition: camera angle, framing, perspective, depth of field
  3. Style: realism, illustration, cinematic, etc.
  4. Lighting: studio, natural light, rim light, soft box, sunset
  5. Constraints: “no text,” “no watermark,” “clean background,” etc.

Store prompt templates as versioned assets in your system. You can then improve outputs over time without changing application code: update the template, and the entire product improves.

Negative prompts and “what not to generate”

Many diffusion systems support negative prompts or negative weights to reduce unwanted features (e.g., extra limbs, artifacts, text). If your chosen engine supports it, you can add a second prompt object with a negative weight, or use the engine’s explicit negative prompt fields. Because behavior varies by engine, build your request builder to support both patterns and enable them via feature flags.

6) Advanced workflows: image-to-image, upscale, and masking/inpainting

Image-to-image (img2img): controlled variation and style transfer

Image-to-image starts from an existing image and applies your prompt to transform it. Product teams use this for:

  • Generating variations while keeping composition consistent
  • Style transfer (e.g., turn a sketch into a rendered scene)
  • Background changes while preserving subject identity
  • “Re-render” improvements (lighting, realism) without redesigning from scratch

The most important parameter conceptually is “how much to change the original.” Some APIs expose this as strength or noise: lower values preserve more of the original, higher values allow more transformation.

Upscale: when and how to use it

Upscaling is not just “make it bigger.” In production pipelines, upscale is used when:

  • You generated a preview at low resolution and want a final high-res output
  • You need better texture detail for print or large-format assets
  • Your product must produce consistent output sizes (e.g., 2048 square)

Best practice: upscale only the selected “final” images, not every preview. This cuts cost dramatically.

Masking / inpainting: edit a region precisely

Masking endpoints allow you to specify a mask image that defines which pixels can change. This is how you:

  • Remove objects (and fill the background plausibly)
  • Add new objects into a scene
  • Fix hands, faces, or artifacts without regenerating the entire image
  • Swap logos, labels, or UI elements in mockups (careful: follow licensing and trademark rules)

For best results, generate masks with soft edges (feathering) so the model blends content naturally. Hard edges can cause visible seams.

Production tip: build an “edit stack”
Store the original image and a sequence of edits (prompt + mask + params). That allows redo/undo, reproducibility, and auditability—especially important in enterprise workflows.

7) Pricing and cost control (credits-based billing)

The credit model

Stability’s Platform API pricing is commonly expressed in credits. The official pricing page explains that API usage is credit-based, and that 1 credit = $0.01. Different models/actions consume different credits per generation. Pricing can change over time as models and infrastructure improve.

Why cost control is a product feature

If you are building a consumer app, your users will explore. They will refresh, tweak prompts, and iterate. If you are building an enterprise tool, teams will run batch jobs. In both cases, cost can spike unless you design explicit controls:

  • Preview mode: smaller images, fewer steps, cheaper engine
  • Final mode: higher steps / resolution, only for selected images
  • Result limits: cap number of variants generated per action
  • Daily budgets: per user/team budgets enforced by your backend
  • Queue & approvals: require confirmation for expensive operations (upscale, large batches)

Cost estimator formula (simple and useful)

If your model costs C credits per generation and you generate N images:

Total credits = C * N
Total USD ≈ (Total credits) * $0.01

Put this estimate directly in your UI before the user runs the job. Users love knowing “this will cost ~X credits.” For internal tools, log the estimate and the actual usage so you can tune defaults.

Practical cost reduction playbook

  1. Do fewer, better generations: use templates, guided inputs, and constraints so users don’t spam refresh.
  2. Use “small or fast” models for exploration: reserve premium models for final assets.
  3. Stop early: limit steps for previews; show a “quality slider.”
  4. Cache identical requests: if a user reruns the exact same request, return the cached result when appropriate.
  5. Deduplicate retries: ensure your retry logic doesn’t accidentally create duplicates (use idempotency keys if you build a proxy).

8) Rate limits, 429s, and reliable retries

Documented request limit

The official docs and KB describe a rate limit of 150 requests per 10 seconds. The KB article also notes that exceeding the limit results in a 429 response and is followed by a timeout lasting 60 seconds.

Interpretation for production
Treat 429 as “slow down and spread requests.” Build a queue, reduce concurrency, and implement backoff with jitter. If you keep spamming after 429 you risk spending most of your time in enforced timeouts.

Client-side throttling strategy

A simple strategy that works well:

  • Limit concurrency per API key (e.g., 5–15 in-flight requests depending on your workload).
  • Use a token bucket: allow bursts but enforce a steady rate under 150/10s.
  • On 429: pause the key for a cooldown (start with 60s or read provider guidance), then resume gradually.
  • Rotate multiple keys only if your account and policies allow it (and you can justify the complexity).

Retry policy (recommended)

StatusMeaningRetry?What to do
200 Success No Store result + metadata; return to user.
400 Bad request No Fix params, validate dimensions, prompt schema, engine_id.
401 Unauthorized No Rotate key, verify env secrets; do not loop retries.
403 Forbidden No Engine not allowed / policy restriction; show a clear message.
429 Rate limited Yes Backoff with jitter; reduce concurrency; cooldown the key.
5xx Server error Yes Retry a few times with exponential backoff; check status page if persistent.

Practical backoff algorithm

Use exponential backoff with jitter. Example delays: 1s, 2s, 4s, 8s, then stop (or cap at 10–20s). For 429 specifically, consider a bigger initial wait because the KB mentions a 60-second timeout period after exceeding the limit.

9) Quality, prompting, and debugging “why does my image look wrong?”

Quality is a system, not a single knob

In most products, output quality depends on:

  • Prompt clarity (explicit subject + composition + style + constraints)
  • Resolution (bigger images can hold more detail, but can cost more)
  • Steps (more steps can improve fidelity, but increase latency/cost)
  • CFG/guidance (too low: vague; too high: artifacts or over-constrained)
  • Sampler (behavior varies; choose a safe default and test)
  • Seed control (reproducibility for workflows and A/B testing)

Prompt templates that ship well

Here are examples of templates you can store and reuse:

// Product photo template
"Photorealistic studio product photo of {subject}, placed on {surface}, {lighting}, shallow depth of field, 85mm lens, ultra sharp, clean background, no text, no watermark"

// Illustration template
"High-quality illustration of {subject}, {style}, crisp lines, balanced composition, soft shading, high detail, no text, no watermark"

Debugging checklist

  1. Confirm engine_id: call /v1/engines/list and ensure the engine exists and is enabled.
  2. Validate dimensions: ensure width/height are permitted and multiples of 64.
  3. Reduce complexity: start with a simple prompt; remove extra clauses; then add constraints one by one.
  4. Stabilize with a seed: if supported, fix a seed to compare parameter changes fairly.
  5. Lower CFG: if artifacts appear, reduce guidance slightly.
  6. Increase steps: if images look undercooked, try more steps (especially for final render mode).
Product tip: show “good defaults,” hide the rest
Most users want a simple UI. Provide a “Quality slider” that changes a known-good set of params (steps, size) and keep advanced controls behind an “Advanced” panel.

10) Safety, compliance, and responsible deployment

Safety is part of your API integration

Building on an image generation API means you are shipping a content system. Even if the API enforces policies, your product still needs:

  • Clear Terms: what users can and cannot generate
  • Abuse prevention: rate limits per user, anti-spam protections, and logging of suspicious patterns
  • Moderation workflow: review flags for public galleries or user-shared content
  • Privacy controls: do not store more than you need; protect images and prompts

Recommended logging (privacy-first)

Log:

  • timestamp, user id (or hashed), job id
  • engine_id, resolution, steps, cfg, and any cost estimate
  • status code + error class (400, 401, 429, 5xx)
  • latency and retry count

Avoid logging raw prompts and full images unless you have a clear privacy policy and a legitimate reason (debugging, safety, enterprise audit), and even then use retention limits and access controls.

Enterprise considerations

  • Data boundaries: determine whether prompts/images can be used for training (check provider terms and your contract).
  • Access controls: who can generate, who can upscale, who can run batches.
  • Budgeting: per-team credit budgets and billing attribution.
  • Auditability: reproducibility (seed), versioned prompt templates, and change logs.

11) Production architecture: queues, caching, retries, and UX that scales

Reference architecture

  1. Frontend (web/mobile) — collects prompt + options; never touches API keys
  2. Backend API — validates input, enforces budgets, signs Stability requests
  3. Job queue — holds generation jobs, supports retries without duplication
  4. Workers — execute calls to Stability with controlled concurrency
  5. Storage — stores outputs (S3/GCS) and metadata (DB)
  6. Observability — metrics, logs, alerts for 429 spikes and latency

Why queues are non-negotiable

If you call an image API synchronously from a user request, you risk:

  • Slow UI and timeouts when jobs take longer
  • Thundering herds (many users generate at once)
  • Hard-to-control rate limits
  • Duplicate retries that waste credits

With a queue, you can shape traffic: accept jobs quickly, process steadily, and give users live progress updates.

Cache strategy

Caching can reduce costs and load:

  • Cache /v1/engines/list for hours (it rarely changes minute-to-minute).
  • Cache account/balance for short TTL (e.g., 10–60 seconds) if you call it frequently.
  • Cache identical preview generations where appropriate (careful: many products choose not to cache final outputs because users expect uniqueness).

Idempotency: avoid double-charging on retries

If your request fails due to a network hiccup, you might retry. But retries can create duplicate generations. The safest approach is to implement idempotency in your backend:

  • Compute a hash of the request payload (engine + params + prompt + input image hash).
  • If a job with that hash is already “running,” return its job id instead of starting a new request.
  • If it completed recently, return the stored result when appropriate (especially for previews).

Minimal Node.js worker skeleton

// Node 18+ example: a safe fetch wrapper (server-side)
const API_HOST = process.env.API_HOST || "https://api.stability.ai";
const KEY = process.env.STABILITY_API_KEY;

function sleep(ms){ return new Promise(r => setTimeout(r, ms)); }

async function callStability(path, { method="GET", headers={}, body } = {}) {
  if (!KEY) throw new Error("Missing STABILITY_API_KEY");
  const url = `${API_HOST}${path}`;

  const h = {
    "Authorization": `Bearer ${KEY}`,
    "Accept": "application/json",
    ...headers,
  };

  const maxAttempts = 5;
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const ctrl = new AbortController();
    const t = setTimeout(() => ctrl.abort(), 20000);

    try {
      const res = await fetch(url, {
        method,
        headers: body ? { "Content-Type":"application/json", ...h } : h,
        body: body ? JSON.stringify(body) : undefined,
        signal: ctrl.signal
      });

      if (res.status === 429 || (res.status >= 500 && res.status <= 599)) {
        if (attempt === maxAttempts) throw new Error(`Retry limit hit: ${res.status}`);
        // Bigger waits for 429 because provider docs mention timeouts after exceeding limits
        const base = res.status === 429 ? 5000 : 800;
        const backoff = Math.min(base * Math.pow(2, attempt - 1), 60000);
        const jitter = Math.floor(Math.random() * 400);
        await sleep(backoff + jitter);
        continue;
      }

      const text = await res.text();
      if (!res.ok) throw new Error(`HTTP ${res.status}: ${text.slice(0, 600)}`);

      return text ? JSON.parse(text) : {};
    } finally {
      clearTimeout(t);
    }
  }
}

async function getBalance() {
  return callStability("/v1/user/balance");
}

Observability checklist

✅ Latency p50/p95
✅ 429 rate + cooldown events
✅ Credits spent per user/team
✅ Retry count + duplicate prevention
✅ Engine usage distribution
✅ Error taxonomy (400/401/403/5xx)

12) FAQ

What is the base URL for the Stability REST API?

The official REST docs show examples using a base host of https://api.stability.ai (with an optional API_HOST override), and endpoints like /v1/user/account, /v1/user/balance, and /v1/engines/list.

How do I authenticate?

Use your Stability API key in the Authorization header as a Bearer token: Authorization: Bearer YOUR_KEY. Keep the key server-side (never in browser code).

How do I check my available credits before generating?

Call /v1/user/balance and enforce budget gates in your backend (e.g., “must have ≥ X credits”). This is also useful for showing a real-time “credits remaining” indicator.

How do I find which models/engines I can use?

Call /v1/engines/list and build a safe allowlist from what your account returns. Don’t hardcode engine IDs without a fallback plan.

What is the rate limit and what happens if I exceed it?

The documented limit is 150 requests per 10 seconds. Exceeding it triggers 429 responses, and the KB notes a timeout period (60 seconds) after the limit is exceeded. Design with throttling, queues, and backoff.

How does pricing work?

Platform API usage is priced in credits. The pricing page explains 1 credit equals $0.01, and different models/actions consume different credits. Build a cost estimator and expose it to users before they run expensive jobs.

Should I call the API directly from my frontend?

No. Put the API key in your backend only. Your frontend should call your backend, which then calls Stability. This prevents key leaks and allows budgets and safety checks.

What is the best “cheap preview → expensive final” approach?

Use smaller sizes and fewer steps for preview, then only upscale or render at high quality once the user selects the best candidate. This can cut costs dramatically while improving UX.

How do I prevent duplicate generations when retries happen?

Use a queue and an idempotency strategy: hash the request payload and re-use an existing job if the same request is already running or recently completed.

Where can I verify the latest endpoints and parameters?

Use the official API documentation, including the API reference and the REST docs (OpenAPI/Redoc). Also check the status page if you see persistent 5xx errors.

13) Official resources (bookmark these)

  • Platform API (getting started / reference): platform.stability.ai/docs
  • REST API docs (v1 OpenAPI/Redoc): staging-api.stability.ai/docs
  • KB article on rate limits: kb.stability.ai
  • Pricing (credits): platform.stability.ai/pricing
  • Status page: (linked from the REST docs)
If you publish this page on your site
Add a “Last updated” date and a small changelog section. API platforms evolve and readers trust pages that clearly track updates.

Changelog template

DateChangeImpactAction
YYYY-MM-DD Endpoint/model/pricing/rate limit change Low / Medium / High Update client, revise docs, adjust budgets

Developer checklist

✅ Store API key server-side
✅ Call /v1/user/balance before big jobs
✅ Discover engines via /v1/engines/list
✅ Queue & throttle to avoid 429
✅ Backoff with jitter
✅ Preview then upscale/final
✅ Log safely (no secrets)
✅ Budget per user/team