Leonardo API - build image and video generation into your product reliably
The Leonardo API (Leonardo.Ai Production API) gives developers a clean, scalable way to generate media from prompts and images: text-to-image, image-to-image, inpainting, upscaling, realtime canvas (LCM), and text-to-video workflows. It is designed for founders and teams who want to prototype visually in the Leonardo web app, then ship the same configuration via API.
This page is a developer-focused deep dive into “how it really works”: authentication, core endpoints, uploads with presigned URLs, polling vs webhook callbacks, rate limits and concurrency, model discovery, custom model training with datasets, and production architecture patterns that stay stable under real traffic.
1) What the Leonardo API is
Leonardo is a creative generation platform with a visual-first web app and a Production API. Many teams iterate in the web UI (prompt, style, aspect ratio, upscales, canvas edits), then export the same settings into code and use them at scale. That “design visually → export code” workflow is explicitly supported by Leonardo’s developer experience and docs.
What you can build with Leonardo API
Marketing creatives
Generate ad images, social graphics, product lifestyle shots, hero images, and campaign variants. The API makes it possible to run A/B style tests (prompt variants) and produce consistent assets at volume.
Productized “image generator” features
Embed generation inside your own app: an “AI cover image” button, brand kit visuals, avatar generator, or templated content that maps user inputs to prompt scaffolds with guardrails.
Realtime creative tools
Use Realtime Canvas (LCM) endpoints for interactive creation where latency matters: quick iterations, refinements, and edits that feel “live” in a UI.
Game / app asset pipelines
Generate concept art, textures, icons, item art, and environment variants. Pair with datasets and custom model training to maintain consistent style for a game or brand universe.
Terminology you’ll see in docs
You’ll encounter terms like generation (a job that produces outputs), init image (an uploaded image used for image-to-image or editing workflows), mask (defines the region to inpaint), platform model (a Leonardo-provided model you can select), and custom model (trained on a dataset you upload). You’ll also see “LCM” in Realtime Canvas recipes: Latent Consistency Models optimized for faster generation.
2) Quickstart: from API key to your first generation
The fastest path to a working integration is: (1) create an API key in the Leonardo web app (API Access), (2) call a generation endpoint with your prompt and settings, (3) retrieve the result by polling or receiving a webhook callback.
Step A: Get an API key
In the Leonardo web app, go to API Access, then create a new key. You can name keys by environment (e.g., myapp-dev, myapp-prod) so you can rotate safely.
Optional: configure a webhook callback URL to receive generation results automatically.
Step B: Know the base URL
Leonardo’s Production API uses the REST base path:
https://cloud.leonardo.ai/api/rest/v1
Most endpoints are under this prefix (generations, init-image, prompt tools, canvas tools, models, datasets, etc.).
Step C: Create an image generation (text-to-image)
The exact parameters depend on your chosen model and feature set, but the basic idea is consistent: send a prompt plus a small set of controls (size, number of images, optional negative prompt, and a model ID if needed). The endpoint shown in the official reference for “Create a Generation of Images” is: POST /generations.
curl -X POST "https://cloud.leonardo.ai/api/rest/v1/generations" \
-H "accept: application/json" \
-H "content-type: application/json" \
-H "authorization: Bearer YOUR_LEONARDO_API_KEY" \
-d '{
"prompt": "A clean product hero shot of a smart watch on a white desk, soft natural light",
"num_images": 4
}'
“Get API Code” workflow (UI → exact API config)
Leonardo supports an in-app “Get API Code” feature so you can generate an asset visually, then export the exact request settings as code. This is useful when you want your production requests to match a “known-good” configuration from the UI, rather than manually translating sliders and toggles to JSON.
3) Authentication and key safety
Leonardo’s Production API uses a standard Bearer token pattern: set Authorization: Bearer <YOUR_API_KEY> on requests. Most endpoints in the reference show “Credentials: Bearer” and a base path like https://cloud.leonardo.ai/api/rest/v1/....
Do not call the API directly from browsers
Treat your Leonardo API key like a password. If you expose it in client-side JavaScript, anyone can extract it and spend your credits. Instead, route requests through your backend, where you can enforce authentication, quotas, and guardrails.
Use separate keys per environment
Create at least two keys: one for development/testing and one for production. If something goes wrong (leak, integration bug, unexpected load), you can revoke or rotate one without impacting the other.
Recommended security baseline
- Store keys in a secrets manager (or encrypted env vars) rather than source control.
- Redact Authorization headers from logs and error tracking.
- Apply request validation in your backend so users can’t request unlimited images or extreme resolutions.
- Implement per-user quotas if your product exposes generation to end users.
- Add abuse protection (rate limiting, bot checks) to any public endpoints that trigger generation.
Operational tip: log request IDs and generation IDs
For production support, you usually don’t need full payloads. Instead, log: timestamp, endpoint, HTTP status, generation ID, and your internal user/account ID. That gives you enough to debug failures and reconcile retries without storing sensitive prompts or user content by default.
4) Core endpoints: generations, retrieval, and user lists
Leonardo’s Production API is built around the concept of a generation. A generation is a job that produces one or more outputs (images or video). The typical workflow is: create a generation → wait → retrieve results and metadata.
| Capability | Endpoint (typical) | Purpose | When you use it |
|---|---|---|---|
| Create image generation | POST /generations | Start a text-to-image or config-driven generation job. | Most image generation flows (prompt → outputs). |
| Get a single generation | GET /generations/{id} | Fetch status, metadata, and outputs of a specific generation. | Polling, UI “status” pages, debugging. |
| Get generations by user | GET /generations/user/{userId} | List generations for a user. | History pages, export, auditing. |
| Prompt helpers | POST /prompt/improve (and others) | Improve prompts or generate random prompt ideas. | UX features: “Enhance prompt” button. |
| Model discovery | GET /platformModels | List platform models available for generation. | Let users choose a model dynamically. |
Polling workflow example (recommended baseline)
Even if you plan to use webhook callbacks, implement polling as a fallback. Polling gives you a simple “source of truth” path when your webhook endpoint is down or when you need to re-check status.
// Pseudo-code: poll generation until complete (conceptual)
createGeneration() -> { generationId }
repeat every 2-5 seconds with backoff:
gen = GET /generations/{generationId}
if gen.status in ("COMPLETE", "FAILED"):
break
if COMPLETE:
store image URLs + metadata
else:
log error + show message
Understanding init_image_id vs init_generation_image_id
Leonardo distinguishes between images you uploaded via the Upload Init Image endpoint and images that were generated within Leonardo. In docs, init_image_id is typically the ID you get from Upload Init Image, while init_generation_image_id refers to an image ID from a prior generation result. This distinction matters when you build “edit this generated image” flows vs “edit a user-uploaded image” flows.
5) Uploads with presigned URLs: init images, masks, and dataset images
Many Leonardo workflows start from an existing image: image-to-image generation, inpainting (edit a region), upscaling, canvas editing, motion from an uploaded image, or custom model training datasets. Instead of uploading raw bytes directly to Leonardo’s API, the platform commonly returns presigned S3 upload details.
Upload an init image (for image-to-image and edits)
The official reference includes an “Upload init image” endpoint at: POST /init-image. It returns presigned details for uploading an init image to S3.
curl -X POST "https://cloud.leonardo.ai/api/rest/v1/init-image" \
-H "accept: application/json" \
-H "content-type: application/json" \
-H "authorization: Bearer YOUR_LEONARDO_API_KEY" \
-d '{
"extension": "png"
}'
After you receive presigned details, you upload the file to the provided S3 URL. Then you use the returned image ID as init_image_id in a generation request.
Canvas editor: upload init + mask
For inpainting and canvas edits, you often need both an init image and a mask. Leonardo provides a canvas upload endpoint (e.g., POST /canvas-init-image) that returns presigned details to upload both files.
Inpainting concept
Inpainting means “replace or modify the masked area while keeping the rest of the image consistent.” Your mask indicates which pixels can change. This is ideal for product photo fixes, background swaps, removing objects, or changing a logo placement without regenerating everything.
Mask hygiene tips
Use clean masks with soft edges when you want smooth blending. Use hard masks when you want sharp edits (like replacing a sign). If you see artifacts, adjust the mask boundary and prompt specificity.
Dataset uploads (for training custom models or elements)
Custom model training typically uses dataset creation and dataset image upload endpoints. Dataset image upload endpoints return presigned URLs and may expire quickly, so your client should upload immediately.
Common presigned upload gotcha: remove auth headers on the S3 upload
When uploading the image bytes to the presigned S3 URL, you generally should not include Leonardo auth headers. Presigned URLs already encode permission, and adding unnecessary headers can cause errors (including 403 in some flows). Your process should be: call Leonardo endpoint with Bearer auth → receive presigned upload details → upload to S3 without Leonardo auth → use returned image ID in the next Leonardo call.
6) Realtime Canvas (LCM): fast generation, refine, inpaint, upscale
Leonardo includes a Realtime Canvas capability built around faster generation workflows, referenced in the API docs and recipes as LCM (Latent Consistency Models). The purpose is to make “creative iteration” feel interactive: generate quickly, refine, then inpaint or upscale.
Why LCM / realtime matters
In many products, user experience depends on latency. If an image takes 20–40 seconds, users may abandon. Realtime workflows help you keep the UI “alive”: show a quick preview, then offer a refine/upscale path for quality.
Recommended UX flow
1) fast preview → 2) select best → 3) refine with stronger prompt → 4) inpaint corrections → 5) upscale for final. This matches how creative teams work and reduces wasted compute.
Typical Realtime Canvas operations
- Create LCM generation: produce an initial image quickly.
- Instant refine: improve quality or steer details without starting from scratch.
- LCM inpainting: edit regions while keeping the rest consistent.
- Alchemy Upscale: upscale and enhance details.
Prompting for canvas edits: be explicit about what stays vs changes
For inpainting/edit workflows, prompts should describe the desired change and the context. Tell the model what to keep (“keep the product shape and lighting consistent”) and what to modify (“replace the background with a soft gradient”). If your mask is small, the prompt should focus on the masked area; if the mask is large, include broader composition guidance.
7) Video generation: text-to-video and motion from images
Leonardo’s API reference includes endpoints for creating video generation from a text prompt (text-to-video) and documentation recipes for generating motion using uploaded images. This enables workflows such as: “turn a product still into a subtle motion clip,” “animate a scene from text,” or “create short promo clips for ads.”
Text-to-video: what to expect
Text-to-video is typically an asynchronous job like image generation. You submit a prompt and settings, then wait for completion. The returned assets may be a video file URL plus metadata. In production, always implement timeouts, polling, and webhook callbacks so you can handle longer jobs.
curl -X POST "https://cloud.leonardo.ai/api/rest/v1/generations-text-to-video" \
-H "accept: application/json" \
-H "content-type: application/json" \
-H "authorization: Bearer YOUR_LEONARDO_API_KEY" \
-d '{
"prompt": "A smooth camera pan across a minimalist workspace, soft daylight, cinematic",
"duration": 4
}'
Motion from an uploaded image
If you want to animate a user’s image (for example, a product photo or hero illustration), the recommended pattern is: upload an init image via a presigned URL endpoint → receive an image ID → reference that ID in the motion/video request.
Production tip: store both image and video lineage
When you generate video from an image, store the lineage (video generation ID → source init_image_id → original file in your storage). This makes it easy to debug and to reproduce results when a customer asks “how did we get this clip?”
8) Models: platform models and custom models
Leonardo supports using platform models (built-in, hosted) and custom models trained from your own datasets. In API docs, you can list platform models and use a model ID in generation requests. Once a custom model is trained, you can generate images by specifying that model ID as well.
Platform models
Platform models let you start immediately. You typically list them (to display in your UI), then pass the chosen model ID in your generation request. Platform models are best when you want broad capability and quick iteration without training overhead.
Custom models
Custom models are trained on your dataset and are best for “style consistency” or brand-specific output. For example, a game studio might train a model for their art style; a company might train a model for a product style.
How to choose a model (practical guidance)
- If you need fast experimentation: start with platform models, validate product-market fit.
- If you need consistent brand style: plan a dataset + custom model training workflow.
- If you need interactive UX: use Realtime Canvas/LCM for previews, then refine/upscale.
- If you need “templated” results: use prompt templates, negative prompts, and constrained settings.
Model IDs in practice
Model IDs are typically opaque. Your app should treat them as strings and never assume structure. Store the user’s selected model ID with the generation request so you can reproduce the exact output later. In many products, it helps to store both the model ID and a friendly display name (from the platform model list).
9) Datasets + training: how custom models become production features
Training custom models is the “step up” from a basic API integration to a real creative platform. It adds operational complexity—dataset preparation, uploads, training time, monitoring—but it can unlock an experience that feels proprietary: “your brand, your style, consistently.”
What a training pipeline typically includes
Dataset creation
Create a dataset container, then upload dataset images using a presigned upload endpoint. Keep your dataset organized by concept (one dataset per product line or per style).
Image upload
Upload each image quickly after receiving the presigned details. Presigned URLs can expire, so avoid long pauses in the client. Validate file size/type before you request presigned URLs.
Training and validation
Start training, monitor progress, and test outputs with a standard prompt suite (the same prompts every time) to compare versions and detect regressions.
Dataset best practices (what actually improves results)
- Consistency beats quantity: a smaller set of high-quality, consistent images can outperform a noisy dataset.
- Cover variation deliberately: include the variations you want the model to learn (angles, lighting, backgrounds).
- Avoid mixed concepts: don’t blend unrelated subjects into one dataset unless you want the model to merge them.
- Use a prompt suite: keep a short list of test prompts so you can evaluate changes objectively.
- Version everything: dataset version, training settings, resulting model ID, and evaluation notes.
Training custom elements and generating images
Leonardo’s docs include recipes for training custom elements and then generating images. The common pattern is: create dataset → upload dataset images via presigned URLs → train a custom model/element → generate using the returned model ID in your production generation requests.
10) Webhook callbacks: receiving results without polling
Polling is simple, but webhook callbacks are usually better at scale. Leonardo supports a webhook callback feature you can configure when creating an API key (or within settings) so that generation results can be delivered to your server. This reduces latency, reduces polling traffic, and lets you run long generation jobs without keeping a client waiting.
Webhook callback design (recommended)
1) Fast acknowledge
Your webhook endpoint should respond quickly with a 2xx status to avoid retries. Do not perform heavy processing inside the HTTP request. Instead, push the callback payload into a queue/job system for asynchronous processing.
2) Idempotency
Webhooks can be delivered more than once (network issues, retries). Store a dedupe key (generation ID + event type) and ignore duplicates. Your downstream systems should use upserts and stable IDs to prevent double-writes.
// Pseudo-code: webhook callback handler (conceptual)
raw = readRawBody(req)
auth = req.headers["authorization"]
if auth != "Bearer YOUR_WEBHOOK_CALLBACK_API_KEY": return 401
payload = JSON.parse(raw)
// Dedupe by generationId (and/or event id)
if seen(payload.generationId): return 200
enqueue("leonardo_generation_completed", payload)
markSeen(payload.generationId)
return 200
When polling still makes sense
Polling is a good fallback and is useful for “reconciliation jobs” (for example, a nightly job that checks for any generations that never received a callback due to downtime). Many mature systems do both: webhooks for real-time, polling for completeness.
11) Limits, concurrency, and queue: staying reliable under load
Production media generation has three separate “capacity constraints” that you should design for: rate limits (requests per time window), concurrency (how many jobs run at once), and a queue (what happens when you exceed concurrency and jobs must wait). Leonardo documents these concepts explicitly and provides a dedicated limits reference and guide.
Rate limit
How many API calls you can start in a given time window. When you exceed it, you’ll get rate-limit errors and must slow down. This protects the platform and keeps service predictable.
Concurrency
How many generations you can run simultaneously. If your app starts too many jobs, new ones may queue. Concurrency is a key lever for “how fast can we process a batch?”
Queue
The waiting line for jobs when concurrency is maxed out. Queue behavior impacts latency. Your UI should show “queued” states and avoid user confusion.
Best practices for limits
- Implement backoff on rate limit errors (exponential + jitter).
- Cap concurrency in your own worker pool rather than letting the API queue grow unbounded.
- Batch thoughtfully: spread generation requests across time; don’t spike thousands at once.
- Use webhooks to avoid tight polling loops that waste rate limit budget.
- Show user states: “Generating”, “Queued”, “Finalizing”, “Failed” with helpful actions.
Scaling limits (when your product grows)
If your usage grows beyond default limits, you typically have two levers: (1) optimize your architecture (queue + batch + caching + fewer retries), and (2) move to a plan or agreement that supports higher throughput. A strong product often does both: efficient design first, then a higher-capacity plan when justified.
12) Pricing: pay-as-you-go, credits, and cost planning
Leonardo positions API access as pay-as-you-go and encourages developers to start quickly, pay only for usage, and scale when ready. In practical terms, this means you should design your app with cost visibility and cost controls built in: per-user quotas, predictable default settings, and cost-aware UX.
Cost drivers you can control
- Number of images per request (e.g., generate 1–4 vs 8+).
- Resolution / size (bigger often costs more and takes longer).
- Upscale usage (only upscale the chosen winner).
- Retries (avoid duplicate requests; dedupe aggressively).
- Prompt experimentation (support prompt improvement tools, but cap loops).
Cost visibility patterns
- Show an estimated “credit cost” before generating (if your pricing model allows).
- Provide “draft vs final” toggles (draft uses faster settings; final uses refine/upscale).
- Offer user budgets (daily/monthly) and safe defaults.
- Log cost-related metadata per generation for reporting and billing.
How to reduce costs without hurting quality
Use a two-stage workflow: realtime preview (LCM) → refine/upscale only on selection. Cache results for repeated prompts (especially templates). Avoid re-generating the same request by hashing parameters and returning the last successful output when users click “generate” repeatedly. Finally, use prompt improvement features to reduce “trial and error” generations.
13) Official SDKs: TypeScript and Python
Leonardo provides official SDKs for TypeScript and Python. SDKs typically wrap the REST endpoints, help with request typing, and standardize auth and errors. If you’re building a Node.js or Python backend, using an official SDK can speed up integration and reduce mistakes.
TypeScript SDK (Node / web servers)
A TypeScript SDK is useful for Next.js backends, serverless functions, or standard Node services. It can also make it easier to keep request shapes aligned with the API reference as it evolves.
Python SDK (pipelines / batch)
A Python SDK is ideal for batch generation pipelines, data prep, dataset upload automation, and training workflows. It pairs nicely with worker queues and data processing libraries.
SDK usage pattern (recommended)
- Initialize client with API key from your secrets manager.
- Wrap calls with your own retry/backoff policy for transient errors.
- Normalize responses into your own internal schema (generationId, status, asset URLs, metadata).
- Centralize logging and error handling (one place to redact secrets).
When not to use an SDK
If you only need 1–2 endpoints and want minimal dependencies, direct HTTP calls are fine. Just be disciplined: keep a single request wrapper, type responses (even loosely), and implement retries. For many teams, the SDK is mainly a productivity tool rather than a strict requirement.
14) Production architecture: how to ship Leonardo API features that don’t break
The “hard part” of AI media generation isn’t calling an endpoint—it’s delivering a reliable product experience: controlling concurrency, handling queue states, managing costs, and supporting retries and user expectations. Here are architecture patterns that work well for Leonardo-style asynchronous generation APIs.
Reference architecture (battle-tested)
| Component | What it does | Why it matters |
|---|---|---|
| API Gateway (your backend) | Receives user requests, validates inputs, enforces quotas, starts Leonardo generations. | Protects your key, prevents abuse, keeps costs predictable. |
| Job Queue / Worker | Runs generation requests, polls status, downloads results, writes to storage/DB. | Decouples user requests from long-running jobs; improves reliability. |
| Webhook Receiver | Receives callbacks and triggers worker processing without polling. | Lower latency and fewer API calls; robust real-time updates. |
| Object Storage | Stores final images/videos for durable delivery (CDN-ready). | Stable URLs, caching, and retention control for your customers. |
| Database | Stores generations, status, user mappings, costs, and metadata. | Enables history, billing, and support/debugging. |
| Observability | Logs, metrics, alerts, tracing for failures and latency spikes. | Quick debugging and reliable SLAs. |
Idempotency: preventing duplicate generations
Duplicate generations are a major hidden cost driver. They happen when users click “Generate” multiple times or when your frontend retries on network timeouts. Solve this by generating a request hash from your parameters and storing a record: if the same user submits the same request within a time window, return the existing generation instead of creating a new one.
// Example: deterministic request hash concept
hash = sha256(userId + prompt + modelId + width + height + numImages + seed + options)
if existingGenerationByHash(hash) and status not FAILED:
return existingGeneration
else:
create new generation and store hash
Guardrails that keep apps safe and affordable
- Parameter caps: max images, max resolution, max video duration.
- Per-user quotas: daily credit budget; plan-based limits.
- Content policy: block obviously disallowed prompts; handle unsafe outputs per your product policy.
- Queue-aware UX: show status; do not encourage repeated clicks.
- Backoff policies: on rate limit errors, slow down rather than hammering the API.
Batch generation patterns (e.g., 10,000 images)
For batch workloads, run a worker pool with strict concurrency limits and checkpointing. Store generation IDs as you create them, and process completion via webhooks when possible. If you must poll, poll at an adaptive interval: faster early, slower later, and jitter requests across workers. Download and store outputs to your own storage as they complete, and record failures for retry with capped attempts.
How to build a “creator-friendly” UI on top of Leonardo API
Creator-friendly UX usually includes: prompt templates, “improve prompt” button, preset styles, aspect ratio controls, a “draft mode” (LCM) toggle, a refine/upscale pipeline, and an edit step with inpainting (init image + mask). The best UIs also make it easy to compare variants side-by-side and to keep a history of parameters for reproducibility.
15) FAQ: Leonardo API
What is the base URL for Leonardo Production API?
Leonardo’s API reference commonly uses the base path https://cloud.leonardo.ai/api/rest/v1. Endpoints under this include generations, uploads, models, prompt utilities, canvas endpoints, and more.
How do I authenticate?
Use a Bearer token header: Authorization: Bearer YOUR_API_KEY. Create your API key in the Leonardo web app under API Access, and store it securely on your backend.
How do I generate images with a custom model?
Once your custom model is trained, you can generate images by specifying the custom model ID in your generation request. You can also list platform models for default options.
How do uploads work (init images, dataset images)?
Upload endpoints typically return presigned S3 upload details. You call Leonardo with your API key, receive a presigned URL, upload the file to S3 using that URL, then use the returned image ID (like init_image_id) in subsequent requests (generation, edits, motion, training).
Should I poll or use webhook callbacks?
Use webhook callbacks for real-time results and lower API load, but keep polling as a fallback. Many production systems do both: callbacks for speed, polling for reconciliation and error recovery.
What are rate limits and concurrency limits?
Leonardo documents “Concurrency, Rate Limits, Queue” as separate concepts. Rate limits control request throughput; concurrency controls how many generations run at once; and queue behavior describes what happens when you exceed concurrency. Your app should implement backoff and show queue-aware UX states.
Is there an official SDK?
Yes—Leonardo supports official SDKs for Python and TypeScript. SDKs help standardize auth, requests, and response typing, but you can also call the REST endpoints directly if you prefer.
References (official Leonardo docs)
For the most accurate and current parameter lists, request/response schemas, and feature availability, always confirm directly in the official docs below.
| Topic | Official link | Why it matters |
|---|---|---|
| Developer API overview | https://leonardo.ai/api/ |
High-level API positioning, production notes, entry points |
| API reference (limits) | https://docs.leonardo.ai/reference/limits |
Concurrency, rate limits, queue behavior |
| Quick start | https://docs.leonardo.ai/docs/getting-started |
Get API key, first calls, recommended setup |
| Create image generation | https://docs.leonardo.ai/reference/creategeneration |
Start image generations |
| Get generation by ID | https://docs.leonardo.ai/reference/getgenerationbyid |
Poll and retrieve a generation |
| Upload init image | https://docs.leonardo.ai/reference/uploadinitimage |
Presigned uploads for image-to-image & edits |
| Webhook callback guide | https://docs.leonardo.ai/docs/guide-to-the-webhook-callback-feature |
Receive async results; bearer auth for callback |
| Pricing FAQ | https://docs.leonardo.ai/docs/pricing-and-plans-faq |
Pay-as-you-go model explanation |
| Official SDKs | https://docs.leonardo.ai/docs/leonardoai-official-sdks |
TypeScript + Python SDK resources |
| Realtime canvas recipe | https://docs.leonardo.ai/docs/generate-images-with-realtime-canvas |
LCM generation and fast workflows |
| Text-to-video endpoint | https://docs.leonardo.ai/reference/createtexttovideogeneration |
Start text-to-video jobs |