Stability AI API - Complete Developer Guide

The Stability AI Platform API is a REST API that allows developers to build applications that generate and transform media using Stability’s models and services. If you are building a creator tool, a marketing automation pipeline, a design assistant, a game studio workflow, a batch renderer, or an internal content system, this API gives you the core primitives: authenticate with an API key, pick an engine/model, send a prompt (and optionally an input image), and receive either image bytes (PNG) or JSON payloads with results.

Last updated: February 2026. Exact engines available to your organization depend on your plan, region, and feature flags. Always verify on your instance/account.

Accuracy note: This guide focuses on the official REST API surfaces and operational behaviors documented by Stability’s API docs and KB. Where model names, endpoints, or credit costs vary across versions, treat this page as a “how to integrate correctly” guide and confirm exact parameters in the official API reference.

1) Overview: what the Stability AI API is, and what you can build

What “Stability AI API” usually means

When developers say “Stability AI API,” they are typically referring to the official Platform API that’s used to:

Generate images from text (text-to-image)
Transform images (image-to-image)
Upscale images (higher resolution / clarity)
Mask or inpaint (edit parts of an image using a mask)
List available engines and manage account/billing via API

Common real-world use cases

Creator apps & editor plugins

Prompt-to-image inside a web app, Photoshop plugin, Figma-like canvas, or mobile editor with fast previews and final renders.

Marketing & ecommerce automation

Generate product hero images, seasonal variations, ad creative, backgrounds, and A/B test assets automatically.

Game & media pipelines

Concept art generation, batch rendering, asset variations, style exploration, and rapid ideation workflows.

Internal tooling

Teams generate mockups, UI illustrations, storyboards, or internal documentation images using repeatable prompts and templates.

Why teams choose a dedicated image API

An image generation API becomes valuable when you need reliability, guardrails, and operational control: consistent auth, clear error modes, measurable costs, and integrations that can be audited. Stability’s API provides standardized endpoints and rate limiting behavior, making it suitable for production systems where you want to track usage, plan budgets, and build predictable user experiences.

Key operational facts from the official docs

The official REST docs show the base host pattern https://api.stability.ai and endpoints like /v1/user/account, /v1/user/balance, /v1/engines/list, and generation endpoints under /v1/generation. Requests authenticate using an API key passed in the Authorization header as a Bearer token, and the documented rate limit is 150 requests per 10 seconds.

2) Getting started: keys, base URL, and your first successful call

Base URL and API host

The REST API is hosted under a Stability domain and, in the official docs, examples default to a base host of:

https://api.stability.ai

Most examples also allow an API_HOST override so you can point at a different host (for example, a staging host, regional host, or an enterprise endpoint). In production systems, make the host configurable by environment (dev/staging/prod) so you can upgrade safely.

Store your API key safely

Your API key is the credential that authorizes requests and charges usage to your account/organization. Treat it like a password:

Store it in a secrets manager or CI secret store
Never commit it to Git
Never embed it in client-side JavaScript (browser apps)
Rotate it on a schedule and immediately if exposed

First call: “Who am I?” (Account) and “How many credits do I have?” (Balance)

The best first two calls are account and balance because they confirm authentication and help you validate billing scope before you generate anything. In the official REST docs, these endpoints exist under /v1/user.

Get account

curl -sS https://api.stability.ai/v1/user/account \
  -H "Authorization: Bearer $STABILITY_API_KEY" \
  -H "Accept: application/json"

Get credit balance

curl -sS https://api.stability.ai/v1/user/balance \
  -H "Authorization: Bearer $STABILITY_API_KEY" \
  -H "Accept: application/json"

If you receive 401, your key is missing or invalid. If you receive 429, you are sending too many requests too quickly. If you receive 5xx, the service may be experiencing an incident—check the provider’s status page and retry with backoff.

3) Authentication: Bearer tokens, organizations, and client identification

Bearer token auth (the standard pattern)

In the official REST docs, requests authenticate by including your Stability API key in the Authorization header as a Bearer token. This means your header should look like:

Authorization: Bearer YOUR_STABILITY_API_KEY

Organization scoping

Some endpoints accept an Organization header that lets you scope requests to an organization other than your default. This matters if your user belongs to multiple orgs, you run a multi-tenant integration, or you want strict cost allocation across teams.

Organization: org-123456

Client ID / Client Version headers (recommended for clarity)

The REST docs show optional headers like Stability-Client-ID and Stability-Client-Version. These are useful for:

Debugging: quickly identify which app is causing errors
Billing clarity: segment usage by product or internal service
Support requests: provide clear client metadata when reporting issues

Stability-Client-ID: my-app
Stability-Client-Version: 1.2.1

Do not expose your API key to browsers

If you’re building a web app, create a backend endpoint that signs requests using the key. Your frontend should call your backend, not Stability directly. This prevents key leakage, allows throttling, and lets you enforce safety filters and budgets.

Recommended auth architecture (frontend vs backend)

Frontend (browser/mobile)

Uses your app’s auth (sessions/JWT). Calls your backend for generation jobs. Never holds Stability keys.

Backend (server)

Holds Stability keys, enforces rate limits, validates prompts, stores job state, and calls Stability endpoints.

4) Endpoint map: account, engines, and generation

The official REST API docs (v1) include these core groups:

Group	Typical endpoints	Use case	Notes
User	`/v1/user/account` `/v1/user/balance`	Identity verification and credit checks	Best first calls; helps you enforce budget gates.
Engines	`/v1/engines/list`	Discover which engines/models you can use	Engine availability depends on your org/plan.
Generation	`/v1/generation/{engine_id}/text-to-image` `/v1/generation/{engine_id}/image-to-image` `/v1/generation/{engine_id}/image-to-image/upscale` `/v1/generation/{engine_id}/image-to-image/masking`	Create images, transform images, upscale, and mask/inpaint	Most product experiences are built here.

Engine discovery: why you should list engines dynamically

Many developers hardcode an “engine_id” and ship it. That works until:

The engine name changes or becomes deprecated
A new default engine becomes available to your org
Your plan changes and certain engines are no longer enabled
You want to A/B test engines by user tier or use case

Instead, build a startup “capabilities discovery” step:

Call /v1/engines/list once per environment boot or daily cache refresh
Store engines in your DB/cache with metadata and a safe allowlist
Select engine dynamically depending on prompt, resolution, or user plan

List engines (cURL)

curl -sS https://api.stability.ai/v1/engines/list \
  -H "Authorization: Bearer $STABILITY_API_KEY" \
  -H "Accept: application/json"

Response formats: JSON vs image/png

For some endpoints, the docs show an Accept header where you can choose JSON or a PNG response:

Accept: application/json
# or
Accept: image/png

In production, JSON responses are often preferable because they can include additional metadata and multiple artifacts. Returning raw PNG is convenient for simple services, but you’ll usually still store metadata (engine, seed, params, cost) in your system.

5) Image generation fundamentals (text-to-image)

Text-to-image request anatomy

A typical text-to-image request includes:

Dimensions (width, height) — usually multiples of 64
Prompts — an array of prompt objects (often with weights)
Guidance (e.g., CFG scale) — how strongly the output adheres to the prompt
Sampler/steps — affects quality, compute cost, and runtime
Seed — enables reproducibility (when supported)
Output format — JSON payload or image bytes

Design principle: “interactive preview” vs “final render”

Most successful products provide a fast preview mode (lower steps, smaller size, cheaper engine) and a final mode (higher steps, higher resolution). That reduces cost and improves perceived speed.

Example: text-to-image (cURL skeleton)

Replace {engine_id} with an engine from /v1/engines/list. Parameters vary by engine/version—confirm in the docs for the engine you use.

curl -sS "https://api.stability.ai/v1/generation/{engine_id}/text-to-image" \
  -H "Authorization: Bearer $STABILITY_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "width": 1024,
    "height": 1024,
    "text_prompts": [
      { "text": "A clean product photo of a ceramic mug on a white table, soft studio lighting", "weight": 1 }
    ],
    "cfg_scale": 7,
    "steps": 30
  }'

Prompting: the “structured prompt” method

While you can write a single sentence, production results improve when you use a consistent structure:

Subject: what is in the scene (object/person/landscape)
Composition: camera angle, framing, perspective, depth of field
Style: realism, illustration, cinematic, etc.
Lighting: studio, natural light, rim light, soft box, sunset
Constraints: “no text,” “no watermark,” “clean background,” etc.

Store prompt templates as versioned assets in your system. You can then improve outputs over time without changing application code: update the template, and the entire product improves.

Negative prompts and “what not to generate”

Many diffusion systems support negative prompts or negative weights to reduce unwanted features (e.g., extra limbs, artifacts, text). If your chosen engine supports it, you can add a second prompt object with a negative weight, or use the engine’s explicit negative prompt fields. Because behavior varies by engine, build your request builder to support both patterns and enable them via feature flags.

6) Advanced workflows: image-to-image, upscale, and masking/inpainting

Image-to-image (img2img): controlled variation and style transfer

Image-to-image starts from an existing image and applies your prompt to transform it. Product teams use this for:

Generating variations while keeping composition consistent
Style transfer (e.g., turn a sketch into a rendered scene)
Background changes while preserving subject identity
“Re-render” improvements (lighting, realism) without redesigning from scratch

The most important parameter conceptually is “how much to change the original.” Some APIs expose this as strength or noise: lower values preserve more of the original, higher values allow more transformation.

Upscale: when and how to use it

Upscaling is not just “make it bigger.” In production pipelines, upscale is used when:

You generated a preview at low resolution and want a final high-res output
You need better texture detail for print or large-format assets
Your product must produce consistent output sizes (e.g., 2048 square)

Best practice: upscale only the selected “final” images, not every preview. This cuts cost dramatically.

Masking / inpainting: edit a region precisely

Masking endpoints allow you to specify a mask image that defines which pixels can change. This is how you:

Remove objects (and fill the background plausibly)
Add new objects into a scene
Fix hands, faces, or artifacts without regenerating the entire image
Swap logos, labels, or UI elements in mockups (careful: follow licensing and trademark rules)

For best results, generate masks with soft edges (feathering) so the model blends content naturally. Hard edges can cause visible seams.

Production tip: build an “edit stack”

Store the original image and a sequence of edits (prompt + mask + params). That allows redo/undo, reproducibility, and auditability—especially important in enterprise workflows.

7) Pricing and cost control (credits-based billing)

The credit model

Stability’s Platform API pricing is commonly expressed in credits. The official pricing page explains that API usage is credit-based, and that 1 credit = $0.01. Different models/actions consume different credits per generation. Pricing can change over time as models and infrastructure improve.

Why cost control is a product feature

If you are building a consumer app, your users will explore. They will refresh, tweak prompts, and iterate. If you are building an enterprise tool, teams will run batch jobs. In both cases, cost can spike unless you design explicit controls:

Preview mode: smaller images, fewer steps, cheaper engine
Final mode: higher steps / resolution, only for selected images
Result limits: cap number of variants generated per action
Daily budgets: per user/team budgets enforced by your backend
Queue & approvals: require confirmation for expensive operations (upscale, large batches)

Cost estimator formula (simple and useful)

If your model costs C credits per generation and you generate N images:

Total credits = C * N
Total USD ≈ (Total credits) * $0.01

Put this estimate directly in your UI before the user runs the job. Users love knowing “this will cost ~X credits.” For internal tools, log the estimate and the actual usage so you can tune defaults.

Practical cost reduction playbook

Do fewer, better generations: use templates, guided inputs, and constraints so users don’t spam refresh.
Use “small or fast” models for exploration: reserve premium models for final assets.
Stop early: limit steps for previews; show a “quality slider.”
Cache identical requests: if a user reruns the exact same request, return the cached result when appropriate.
Deduplicate retries: ensure your retry logic doesn’t accidentally create duplicates (use idempotency keys if you build a proxy).

8) Rate limits, 429s, and reliable retries

Documented request limit

The official docs and KB describe a rate limit of 150 requests per 10 seconds. The KB article also notes that exceeding the limit results in a 429 response and is followed by a timeout lasting 60 seconds.

Interpretation for production

Treat 429 as “slow down and spread requests.” Build a queue, reduce concurrency, and implement backoff with jitter. If you keep spamming after 429 you risk spending most of your time in enforced timeouts.

Client-side throttling strategy

A simple strategy that works well:

Limit concurrency per API key (e.g., 5–15 in-flight requests depending on your workload).
Use a token bucket: allow bursts but enforce a steady rate under 150/10s.
On 429: pause the key for a cooldown (start with 60s or read provider guidance), then resume gradually.
Rotate multiple keys only if your account and policies allow it (and you can justify the complexity).

Retry policy (recommended)

Status	Meaning	Retry?	What to do
`200`	Success	No	Store result + metadata; return to user.
`400`	Bad request	No	Fix params, validate dimensions, prompt schema, engine_id.
`401`	Unauthorized	No	Rotate key, verify env secrets; do not loop retries.
`403`	Forbidden	No	Engine not allowed / policy restriction; show a clear message.
`429`	Rate limited	Yes	Backoff with jitter; reduce concurrency; cooldown the key.
`5xx`	Server error	Yes	Retry a few times with exponential backoff; check status page if persistent.

Practical backoff algorithm

Use exponential backoff with jitter. Example delays: 1s, 2s, 4s, 8s, then stop (or cap at 10–20s). For 429 specifically, consider a bigger initial wait because the KB mentions a 60-second timeout period after exceeding the limit.

9) Quality, prompting, and debugging “why does my image look wrong?”

Quality is a system, not a single knob

In most products, output quality depends on:

Prompt clarity (explicit subject + composition + style + constraints)
Resolution (bigger images can hold more detail, but can cost more)
Steps (more steps can improve fidelity, but increase latency/cost)
CFG/guidance (too low: vague; too high: artifacts or over-constrained)
Sampler (behavior varies; choose a safe default and test)
Seed control (reproducibility for workflows and A/B testing)

Prompt templates that ship well

Here are examples of templates you can store and reuse:

// Product photo template
"Photorealistic studio product photo of {subject}, placed on {surface}, {lighting}, shallow depth of field, 85mm lens, ultra sharp, clean background, no text, no watermark"

// Illustration template
"High-quality illustration of {subject}, {style}, crisp lines, balanced composition, soft shading, high detail, no text, no watermark"

Debugging checklist

Confirm engine_id: call /v1/engines/list and ensure the engine exists and is enabled.
Validate dimensions: ensure width/height are permitted and multiples of 64.
Reduce complexity: start with a simple prompt; remove extra clauses; then add constraints one by one.
Stabilize with a seed: if supported, fix a seed to compare parameter changes fairly.
Lower CFG: if artifacts appear, reduce guidance slightly.
Increase steps: if images look undercooked, try more steps (especially for final render mode).

Product tip: show “good defaults,” hide the rest

Most users want a simple UI. Provide a “Quality slider” that changes a known-good set of params (steps, size) and keep advanced controls behind an “Advanced” panel.

10) Safety, compliance, and responsible deployment

Safety is part of your API integration

Building on an image generation API means you are shipping a content system. Even if the API enforces policies, your product still needs:

Clear Terms: what users can and cannot generate
Abuse prevention: rate limits per user, anti-spam protections, and logging of suspicious patterns
Moderation workflow: review flags for public galleries or user-shared content
Privacy controls: do not store more than you need; protect images and prompts

Recommended logging (privacy-first)

Log:

timestamp, user id (or hashed), job id
engine_id, resolution, steps, cfg, and any cost estimate
status code + error class (400, 401, 429, 5xx)
latency and retry count

Avoid logging raw prompts and full images unless you have a clear privacy policy and a legitimate reason (debugging, safety, enterprise audit), and even then use retention limits and access controls.

Enterprise considerations

Data boundaries: determine whether prompts/images can be used for training (check provider terms and your contract).
Access controls: who can generate, who can upscale, who can run batches.
Budgeting: per-team credit budgets and billing attribution.
Auditability: reproducibility (seed), versioned prompt templates, and change logs.

11) Production architecture: queues, caching, retries, and UX that scales

Reference architecture

Frontend (web/mobile) — collects prompt + options; never touches API keys
Backend API — validates input, enforces budgets, signs Stability requests
Job queue — holds generation jobs, supports retries without duplication
Workers — execute calls to Stability with controlled concurrency
Storage — stores outputs (S3/GCS) and metadata (DB)
Observability — metrics, logs, alerts for 429 spikes and latency

Why queues are non-negotiable

If you call an image API synchronously from a user request, you risk:

Slow UI and timeouts when jobs take longer
Thundering herds (many users generate at once)
Hard-to-control rate limits
Duplicate retries that waste credits

With a queue, you can shape traffic: accept jobs quickly, process steadily, and give users live progress updates.

Cache strategy

Caching can reduce costs and load:

Cache /v1/engines/list for hours (it rarely changes minute-to-minute).
Cache account/balance for short TTL (e.g., 10–60 seconds) if you call it frequently.
Cache identical preview generations where appropriate (careful: many products choose not to cache final outputs because users expect uniqueness).

Idempotency: avoid double-charging on retries

If your request fails due to a network hiccup, you might retry. But retries can create duplicate generations. The safest approach is to implement idempotency in your backend:

Compute a hash of the request payload (engine + params + prompt + input image hash).
If a job with that hash is already “running,” return its job id instead of starting a new request.
If it completed recently, return the stored result when appropriate (especially for previews).

Minimal Node.js worker skeleton

// Node 18+ example: a safe fetch wrapper (server-side)
const API_HOST = process.env.API_HOST || "https://api.stability.ai";
const KEY = process.env.STABILITY_API_KEY;

function sleep(ms){ return new Promise(r => setTimeout(r, ms)); }

async function callStability(path, { method="GET", headers={}, body } = {}) {
  if (!KEY) throw new Error("Missing STABILITY_API_KEY");
  const url = `${API_HOST}${path}`;

  const h = {
    "Authorization": `Bearer ${KEY}`,
    "Accept": "application/json",
    ...headers,
  };

  const maxAttempts = 5;
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const ctrl = new AbortController();
    const t = setTimeout(() => ctrl.abort(), 20000);

    try {
      const res = await fetch(url, {
        method,
        headers: body ? { "Content-Type":"application/json", ...h } : h,
        body: body ? JSON.stringify(body) : undefined,
        signal: ctrl.signal
      });

      if (res.status === 429 || (res.status >= 500 && res.status <= 599)) {
        if (attempt === maxAttempts) throw new Error(`Retry limit hit: ${res.status}`);
        // Bigger waits for 429 because provider docs mention timeouts after exceeding limits
        const base = res.status === 429 ? 5000 : 800;
        const backoff = Math.min(base * Math.pow(2, attempt - 1), 60000);
        const jitter = Math.floor(Math.random() * 400);
        await sleep(backoff + jitter);
        continue;
      }

      const text = await res.text();
      if (!res.ok) throw new Error(`HTTP ${res.status}: ${text.slice(0, 600)}`);

      return text ? JSON.parse(text) : {};
    } finally {
      clearTimeout(t);
    }
  }
}

async function getBalance() {
  return callStability("/v1/user/balance");
}

Observability checklist

✅ Latency p50/p95

✅ 429 rate + cooldown events

✅ Credits spent per user/team

✅ Retry count + duplicate prevention

✅ Engine usage distribution

✅ Error taxonomy (400/401/403/5xx)

12) FAQ

What is the base URL for the Stability REST API?

The official REST docs show examples using a base host of https://api.stability.ai (with an optional API_HOST override), and endpoints like /v1/user/account, /v1/user/balance, and /v1/engines/list.

How do I authenticate?

Use your Stability API key in the Authorization header as a Bearer token: Authorization: Bearer YOUR_KEY. Keep the key server-side (never in browser code).

How do I check my available credits before generating?

Call /v1/user/balance and enforce budget gates in your backend (e.g., “must have ≥ X credits”). This is also useful for showing a real-time “credits remaining” indicator.

How do I find which models/engines I can use?

Call /v1/engines/list and build a safe allowlist from what your account returns. Don’t hardcode engine IDs without a fallback plan.

What is the rate limit and what happens if I exceed it?

The documented limit is 150 requests per 10 seconds. Exceeding it triggers 429 responses, and the KB notes a timeout period (60 seconds) after the limit is exceeded. Design with throttling, queues, and backoff.

How does pricing work?

Platform API usage is priced in credits. The pricing page explains 1 credit equals $0.01, and different models/actions consume different credits. Build a cost estimator and expose it to users before they run expensive jobs.

Should I call the API directly from my frontend?

No. Put the API key in your backend only. Your frontend should call your backend, which then calls Stability. This prevents key leaks and allows budgets and safety checks.

What is the best “cheap preview → expensive final” approach?

Use smaller sizes and fewer steps for preview, then only upscale or render at high quality once the user selects the best candidate. This can cut costs dramatically while improving UX.

How do I prevent duplicate generations when retries happen?

Use a queue and an idempotency strategy: hash the request payload and re-use an existing job if the same request is already running or recently completed.

Where can I verify the latest endpoints and parameters?

Use the official API documentation, including the API reference and the REST docs (OpenAPI/Redoc). Also check the status page if you see persistent 5xx errors.

13) Official resources (bookmark these)

Platform API (getting started / reference): platform.stability.ai/docs
REST API docs (v1 OpenAPI/Redoc): staging-api.stability.ai/docs
KB article on rate limits: kb.stability.ai
Pricing (credits): platform.stability.ai/pricing
Status page: (linked from the REST docs)

If you publish this page on your site

Add a “Last updated” date and a small changelog section. API platforms evolve and readers trust pages that clearly track updates.

Changelog template

Date	Change	Impact	Action
YYYY-MM-DD	Endpoint/model/pricing/rate limit change	Low / Medium / High	Update client, revise docs, adjust budgets

Developer checklist

✅ Store API key server-side

✅ Call /v1/user/balance before big jobs

✅ Discover engines via /v1/engines/list

✅ Queue & throttle to avoid 429

✅ Backoff with jitter

✅ Preview then upscale/final

✅ Log safely (no secrets)

✅ Budget per user/team