Stability AI API - Complete Developer Guide
The Stability AI Platform API is a REST API that allows developers to build applications that generate and transform media using Stability’s models and services. If you are building a creator tool, a marketing automation pipeline, a design assistant, a game studio workflow, a batch renderer, or an internal content system, this API gives you the core primitives: authenticate with an API key, pick an engine/model, send a prompt (and optionally an input image), and receive either image bytes (PNG) or JSON payloads with results.
Last updated: February 2026. Exact engines available to your organization depend on your plan, region, and feature flags. Always verify on your instance/account.
1) Overview: what the Stability AI API is, and what you can build
What “Stability AI API” usually means
When developers say “Stability AI API,” they are typically referring to the official Platform API that’s used to:
- Generate images from text (text-to-image)
- Transform images (image-to-image)
- Upscale images (higher resolution / clarity)
- Mask or inpaint (edit parts of an image using a mask)
- List available engines and manage account/billing via API
Common real-world use cases
Creator apps & editor plugins
Prompt-to-image inside a web app, Photoshop plugin, Figma-like canvas, or mobile editor with fast previews and final renders.
Marketing & ecommerce automation
Generate product hero images, seasonal variations, ad creative, backgrounds, and A/B test assets automatically.
Game & media pipelines
Concept art generation, batch rendering, asset variations, style exploration, and rapid ideation workflows.
Internal tooling
Teams generate mockups, UI illustrations, storyboards, or internal documentation images using repeatable prompts and templates.
Why teams choose a dedicated image API
An image generation API becomes valuable when you need reliability, guardrails, and operational control: consistent auth, clear error modes, measurable costs, and integrations that can be audited. Stability’s API provides standardized endpoints and rate limiting behavior, making it suitable for production systems where you want to track usage, plan budgets, and build predictable user experiences.
https://api.stability.ai and endpoints like /v1/user/account,
/v1/user/balance, /v1/engines/list, and generation endpoints under /v1/generation.
Requests authenticate using an API key passed in the Authorization header as a Bearer token, and the documented rate limit is 150 requests per 10 seconds.
2) Getting started: keys, base URL, and your first successful call
Base URL and API host
The REST API is hosted under a Stability domain and, in the official docs, examples default to a base host of:
https://api.stability.ai
Most examples also allow an API_HOST override so you can point at a different host (for example, a staging host, regional host,
or an enterprise endpoint). In production systems, make the host configurable by environment (dev/staging/prod) so you can upgrade safely.
Store your API key safely
Your API key is the credential that authorizes requests and charges usage to your account/organization. Treat it like a password:
- Store it in a secrets manager or CI secret store
- Never commit it to Git
- Never embed it in client-side JavaScript (browser apps)
- Rotate it on a schedule and immediately if exposed
First call: “Who am I?” (Account) and “How many credits do I have?” (Balance)
The best first two calls are account and balance because they confirm authentication and help you validate billing scope before you generate anything.
In the official REST docs, these endpoints exist under /v1/user.
Get account
curl -sS https://api.stability.ai/v1/user/account \
-H "Authorization: Bearer $STABILITY_API_KEY" \
-H "Accept: application/json"
Get credit balance
curl -sS https://api.stability.ai/v1/user/balance \
-H "Authorization: Bearer $STABILITY_API_KEY" \
-H "Accept: application/json"
If you receive 401, your key is missing or invalid. If you receive 429, you are sending too many requests too quickly.
If you receive 5xx, the service may be experiencing an incident—check the provider’s status page and retry with backoff.
3) Authentication: Bearer tokens, organizations, and client identification
Bearer token auth (the standard pattern)
In the official REST docs, requests authenticate by including your Stability API key in the Authorization header as a Bearer token.
This means your header should look like:
Authorization: Bearer YOUR_STABILITY_API_KEY
Organization scoping
Some endpoints accept an Organization header that lets you scope requests to an organization other than your default.
This matters if your user belongs to multiple orgs, you run a multi-tenant integration, or you want strict cost allocation across teams.
Organization: org-123456
Client ID / Client Version headers (recommended for clarity)
The REST docs show optional headers like Stability-Client-ID and Stability-Client-Version. These are useful for:
- Debugging: quickly identify which app is causing errors
- Billing clarity: segment usage by product or internal service
- Support requests: provide clear client metadata when reporting issues
Stability-Client-ID: my-app
Stability-Client-Version: 1.2.1
Recommended auth architecture (frontend vs backend)
Frontend (browser/mobile)
Uses your app’s auth (sessions/JWT). Calls your backend for generation jobs. Never holds Stability keys.
Backend (server)
Holds Stability keys, enforces rate limits, validates prompts, stores job state, and calls Stability endpoints.
4) Endpoint map: account, engines, and generation
The official REST API docs (v1) include these core groups:
| Group | Typical endpoints | Use case | Notes |
|---|---|---|---|
| User |
/v1/user/account/v1/user/balance
|
Identity verification and credit checks | Best first calls; helps you enforce budget gates. |
| Engines | /v1/engines/list |
Discover which engines/models you can use | Engine availability depends on your org/plan. |
| Generation |
/v1/generation/{engine_id}/text-to-image/v1/generation/{engine_id}/image-to-image/v1/generation/{engine_id}/image-to-image/upscale/v1/generation/{engine_id}/image-to-image/masking
|
Create images, transform images, upscale, and mask/inpaint | Most product experiences are built here. |
Engine discovery: why you should list engines dynamically
Many developers hardcode an “engine_id” and ship it. That works until:
- The engine name changes or becomes deprecated
- A new default engine becomes available to your org
- Your plan changes and certain engines are no longer enabled
- You want to A/B test engines by user tier or use case
Instead, build a startup “capabilities discovery” step:
- Call
/v1/engines/listonce per environment boot or daily cache refresh - Store engines in your DB/cache with metadata and a safe allowlist
- Select engine dynamically depending on prompt, resolution, or user plan
List engines (cURL)
curl -sS https://api.stability.ai/v1/engines/list \
-H "Authorization: Bearer $STABILITY_API_KEY" \
-H "Accept: application/json"
Response formats: JSON vs image/png
For some endpoints, the docs show an Accept header where you can choose JSON or a PNG response:
Accept: application/json
# or
Accept: image/png
In production, JSON responses are often preferable because they can include additional metadata and multiple artifacts. Returning raw PNG is convenient for simple services, but you’ll usually still store metadata (engine, seed, params, cost) in your system.
5) Image generation fundamentals (text-to-image)
Text-to-image request anatomy
A typical text-to-image request includes:
- Dimensions (
width,height) — usually multiples of 64 - Prompts — an array of prompt objects (often with weights)
- Guidance (e.g., CFG scale) — how strongly the output adheres to the prompt
- Sampler/steps — affects quality, compute cost, and runtime
- Seed — enables reproducibility (when supported)
- Output format — JSON payload or image bytes
Example: text-to-image (cURL skeleton)
Replace {engine_id} with an engine from /v1/engines/list. Parameters vary by engine/version—confirm in the docs for the engine you use.
curl -sS "https://api.stability.ai/v1/generation/{engine_id}/text-to-image" \
-H "Authorization: Bearer $STABILITY_API_KEY" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"width": 1024,
"height": 1024,
"text_prompts": [
{ "text": "A clean product photo of a ceramic mug on a white table, soft studio lighting", "weight": 1 }
],
"cfg_scale": 7,
"steps": 30
}'
Prompting: the “structured prompt” method
While you can write a single sentence, production results improve when you use a consistent structure:
- Subject: what is in the scene (object/person/landscape)
- Composition: camera angle, framing, perspective, depth of field
- Style: realism, illustration, cinematic, etc.
- Lighting: studio, natural light, rim light, soft box, sunset
- Constraints: “no text,” “no watermark,” “clean background,” etc.
Store prompt templates as versioned assets in your system. You can then improve outputs over time without changing application code: update the template, and the entire product improves.
Negative prompts and “what not to generate”
Many diffusion systems support negative prompts or negative weights to reduce unwanted features (e.g., extra limbs, artifacts, text). If your chosen engine supports it, you can add a second prompt object with a negative weight, or use the engine’s explicit negative prompt fields. Because behavior varies by engine, build your request builder to support both patterns and enable them via feature flags.
6) Advanced workflows: image-to-image, upscale, and masking/inpainting
Image-to-image (img2img): controlled variation and style transfer
Image-to-image starts from an existing image and applies your prompt to transform it. Product teams use this for:
- Generating variations while keeping composition consistent
- Style transfer (e.g., turn a sketch into a rendered scene)
- Background changes while preserving subject identity
- “Re-render” improvements (lighting, realism) without redesigning from scratch
The most important parameter conceptually is “how much to change the original.” Some APIs expose this as strength or noise: lower values preserve more of the original, higher values allow more transformation.
Upscale: when and how to use it
Upscaling is not just “make it bigger.” In production pipelines, upscale is used when:
- You generated a preview at low resolution and want a final high-res output
- You need better texture detail for print or large-format assets
- Your product must produce consistent output sizes (e.g., 2048 square)
Best practice: upscale only the selected “final” images, not every preview. This cuts cost dramatically.
Masking / inpainting: edit a region precisely
Masking endpoints allow you to specify a mask image that defines which pixels can change. This is how you:
- Remove objects (and fill the background plausibly)
- Add new objects into a scene
- Fix hands, faces, or artifacts without regenerating the entire image
- Swap logos, labels, or UI elements in mockups (careful: follow licensing and trademark rules)
For best results, generate masks with soft edges (feathering) so the model blends content naturally. Hard edges can cause visible seams.
7) Pricing and cost control (credits-based billing)
The credit model
Stability’s Platform API pricing is commonly expressed in credits. The official pricing page explains that API usage is credit-based, and that 1 credit = $0.01. Different models/actions consume different credits per generation. Pricing can change over time as models and infrastructure improve.
Why cost control is a product feature
If you are building a consumer app, your users will explore. They will refresh, tweak prompts, and iterate. If you are building an enterprise tool, teams will run batch jobs. In both cases, cost can spike unless you design explicit controls:
- Preview mode: smaller images, fewer steps, cheaper engine
- Final mode: higher steps / resolution, only for selected images
- Result limits: cap number of variants generated per action
- Daily budgets: per user/team budgets enforced by your backend
- Queue & approvals: require confirmation for expensive operations (upscale, large batches)
Cost estimator formula (simple and useful)
If your model costs C credits per generation and you generate N images:
Total credits = C * N
Total USD ≈ (Total credits) * $0.01
Put this estimate directly in your UI before the user runs the job. Users love knowing “this will cost ~X credits.” For internal tools, log the estimate and the actual usage so you can tune defaults.
Practical cost reduction playbook
- Do fewer, better generations: use templates, guided inputs, and constraints so users don’t spam refresh.
- Use “small or fast” models for exploration: reserve premium models for final assets.
- Stop early: limit steps for previews; show a “quality slider.”
- Cache identical requests: if a user reruns the exact same request, return the cached result when appropriate.
- Deduplicate retries: ensure your retry logic doesn’t accidentally create duplicates (use idempotency keys if you build a proxy).
8) Rate limits, 429s, and reliable retries
Documented request limit
The official docs and KB describe a rate limit of 150 requests per 10 seconds.
The KB article also notes that exceeding the limit results in a 429 response and is followed by a timeout lasting 60 seconds.
Client-side throttling strategy
A simple strategy that works well:
- Limit concurrency per API key (e.g., 5–15 in-flight requests depending on your workload).
- Use a token bucket: allow bursts but enforce a steady rate under 150/10s.
- On 429: pause the key for a cooldown (start with 60s or read provider guidance), then resume gradually.
- Rotate multiple keys only if your account and policies allow it (and you can justify the complexity).
Retry policy (recommended)
| Status | Meaning | Retry? | What to do |
|---|---|---|---|
200 |
Success | No | Store result + metadata; return to user. |
400 |
Bad request | No | Fix params, validate dimensions, prompt schema, engine_id. |
401 |
Unauthorized | No | Rotate key, verify env secrets; do not loop retries. |
403 |
Forbidden | No | Engine not allowed / policy restriction; show a clear message. |
429 |
Rate limited | Yes | Backoff with jitter; reduce concurrency; cooldown the key. |
5xx |
Server error | Yes | Retry a few times with exponential backoff; check status page if persistent. |
Practical backoff algorithm
Use exponential backoff with jitter. Example delays: 1s, 2s, 4s, 8s, then stop (or cap at 10–20s). For 429 specifically, consider a bigger initial wait because the KB mentions a 60-second timeout period after exceeding the limit.
9) Quality, prompting, and debugging “why does my image look wrong?”
Quality is a system, not a single knob
In most products, output quality depends on:
- Prompt clarity (explicit subject + composition + style + constraints)
- Resolution (bigger images can hold more detail, but can cost more)
- Steps (more steps can improve fidelity, but increase latency/cost)
- CFG/guidance (too low: vague; too high: artifacts or over-constrained)
- Sampler (behavior varies; choose a safe default and test)
- Seed control (reproducibility for workflows and A/B testing)
Prompt templates that ship well
Here are examples of templates you can store and reuse:
// Product photo template
"Photorealistic studio product photo of {subject}, placed on {surface}, {lighting}, shallow depth of field, 85mm lens, ultra sharp, clean background, no text, no watermark"
// Illustration template
"High-quality illustration of {subject}, {style}, crisp lines, balanced composition, soft shading, high detail, no text, no watermark"
Debugging checklist
- Confirm engine_id: call
/v1/engines/listand ensure the engine exists and is enabled. - Validate dimensions: ensure width/height are permitted and multiples of 64.
- Reduce complexity: start with a simple prompt; remove extra clauses; then add constraints one by one.
- Stabilize with a seed: if supported, fix a seed to compare parameter changes fairly.
- Lower CFG: if artifacts appear, reduce guidance slightly.
- Increase steps: if images look undercooked, try more steps (especially for final render mode).
10) Safety, compliance, and responsible deployment
Safety is part of your API integration
Building on an image generation API means you are shipping a content system. Even if the API enforces policies, your product still needs:
- Clear Terms: what users can and cannot generate
- Abuse prevention: rate limits per user, anti-spam protections, and logging of suspicious patterns
- Moderation workflow: review flags for public galleries or user-shared content
- Privacy controls: do not store more than you need; protect images and prompts
Recommended logging (privacy-first)
Log:
- timestamp, user id (or hashed), job id
- engine_id, resolution, steps, cfg, and any cost estimate
- status code + error class (400, 401, 429, 5xx)
- latency and retry count
Avoid logging raw prompts and full images unless you have a clear privacy policy and a legitimate reason (debugging, safety, enterprise audit), and even then use retention limits and access controls.
Enterprise considerations
- Data boundaries: determine whether prompts/images can be used for training (check provider terms and your contract).
- Access controls: who can generate, who can upscale, who can run batches.
- Budgeting: per-team credit budgets and billing attribution.
- Auditability: reproducibility (seed), versioned prompt templates, and change logs.
11) Production architecture: queues, caching, retries, and UX that scales
Reference architecture
- Frontend (web/mobile) — collects prompt + options; never touches API keys
- Backend API — validates input, enforces budgets, signs Stability requests
- Job queue — holds generation jobs, supports retries without duplication
- Workers — execute calls to Stability with controlled concurrency
- Storage — stores outputs (S3/GCS) and metadata (DB)
- Observability — metrics, logs, alerts for 429 spikes and latency
Why queues are non-negotiable
If you call an image API synchronously from a user request, you risk:
- Slow UI and timeouts when jobs take longer
- Thundering herds (many users generate at once)
- Hard-to-control rate limits
- Duplicate retries that waste credits
With a queue, you can shape traffic: accept jobs quickly, process steadily, and give users live progress updates.
Cache strategy
Caching can reduce costs and load:
- Cache
/v1/engines/listfor hours (it rarely changes minute-to-minute). - Cache account/balance for short TTL (e.g., 10–60 seconds) if you call it frequently.
- Cache identical preview generations where appropriate (careful: many products choose not to cache final outputs because users expect uniqueness).
Idempotency: avoid double-charging on retries
If your request fails due to a network hiccup, you might retry. But retries can create duplicate generations. The safest approach is to implement idempotency in your backend:
- Compute a hash of the request payload (engine + params + prompt + input image hash).
- If a job with that hash is already “running,” return its job id instead of starting a new request.
- If it completed recently, return the stored result when appropriate (especially for previews).
Minimal Node.js worker skeleton
// Node 18+ example: a safe fetch wrapper (server-side)
const API_HOST = process.env.API_HOST || "https://api.stability.ai";
const KEY = process.env.STABILITY_API_KEY;
function sleep(ms){ return new Promise(r => setTimeout(r, ms)); }
async function callStability(path, { method="GET", headers={}, body } = {}) {
if (!KEY) throw new Error("Missing STABILITY_API_KEY");
const url = `${API_HOST}${path}`;
const h = {
"Authorization": `Bearer ${KEY}`,
"Accept": "application/json",
...headers,
};
const maxAttempts = 5;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const ctrl = new AbortController();
const t = setTimeout(() => ctrl.abort(), 20000);
try {
const res = await fetch(url, {
method,
headers: body ? { "Content-Type":"application/json", ...h } : h,
body: body ? JSON.stringify(body) : undefined,
signal: ctrl.signal
});
if (res.status === 429 || (res.status >= 500 && res.status <= 599)) {
if (attempt === maxAttempts) throw new Error(`Retry limit hit: ${res.status}`);
// Bigger waits for 429 because provider docs mention timeouts after exceeding limits
const base = res.status === 429 ? 5000 : 800;
const backoff = Math.min(base * Math.pow(2, attempt - 1), 60000);
const jitter = Math.floor(Math.random() * 400);
await sleep(backoff + jitter);
continue;
}
const text = await res.text();
if (!res.ok) throw new Error(`HTTP ${res.status}: ${text.slice(0, 600)}`);
return text ? JSON.parse(text) : {};
} finally {
clearTimeout(t);
}
}
}
async function getBalance() {
return callStability("/v1/user/balance");
}
Observability checklist
12) FAQ
What is the base URL for the Stability REST API?
The official REST docs show examples using a base host of https://api.stability.ai (with an optional API_HOST override),
and endpoints like /v1/user/account, /v1/user/balance, and /v1/engines/list.
How do I authenticate?
Use your Stability API key in the Authorization header as a Bearer token:
Authorization: Bearer YOUR_KEY. Keep the key server-side (never in browser code).
How do I check my available credits before generating?
Call /v1/user/balance and enforce budget gates in your backend (e.g., “must have ≥ X credits”).
This is also useful for showing a real-time “credits remaining” indicator.
How do I find which models/engines I can use?
Call /v1/engines/list and build a safe allowlist from what your account returns.
Don’t hardcode engine IDs without a fallback plan.
What is the rate limit and what happens if I exceed it?
The documented limit is 150 requests per 10 seconds. Exceeding it triggers 429 responses, and the KB notes a timeout period (60 seconds)
after the limit is exceeded. Design with throttling, queues, and backoff.
How does pricing work?
Platform API usage is priced in credits. The pricing page explains 1 credit equals $0.01, and different models/actions consume different credits. Build a cost estimator and expose it to users before they run expensive jobs.
Should I call the API directly from my frontend?
No. Put the API key in your backend only. Your frontend should call your backend, which then calls Stability. This prevents key leaks and allows budgets and safety checks.
What is the best “cheap preview → expensive final” approach?
Use smaller sizes and fewer steps for preview, then only upscale or render at high quality once the user selects the best candidate. This can cut costs dramatically while improving UX.
How do I prevent duplicate generations when retries happen?
Use a queue and an idempotency strategy: hash the request payload and re-use an existing job if the same request is already running or recently completed.
Where can I verify the latest endpoints and parameters?
Use the official API documentation, including the API reference and the REST docs (OpenAPI/Redoc). Also check the status page if you see persistent 5xx errors.
13) Official resources (bookmark these)
- Platform API (getting started / reference): platform.stability.ai/docs
- REST API docs (v1 OpenAPI/Redoc): staging-api.stability.ai/docs
- KB article on rate limits: kb.stability.ai
- Pricing (credits): platform.stability.ai/pricing
- Status page: (linked from the REST docs)
Changelog template
| Date | Change | Impact | Action |
|---|---|---|---|
| YYYY-MM-DD | Endpoint/model/pricing/rate limit change | Low / Medium / High | Update client, revise docs, adjust budgets |