Sora API (OpenAI) - Complete Developer Guide

1) What the Sora API is (and what it isn’t)

In OpenAI’s ecosystem, “Sora” can refer to the product experience (the Sora app/site) and also to the underlying video generation models. When developers say “Sora API,” they usually mean OpenAI’s Video API endpoints that accept a prompt (and optional reference image) and produce a rendered video clip as output.

Where Sora fits in the OpenAI API

The OpenAI platform offers multiple “media generation” capabilities. Text models generate text and structured outputs. Image models generate still images. Sora models generate short videos and (for certain models) synced audio. The documentation for Sora models and video endpoints lives on OpenAI’s API docs and model pages, including: Videos API reference and the model pages for sora-2 and sora-2-pro.

What it is

A production API for video generation that supports: creating a render job from text, optionally guided by a reference image, retrieving job status, downloading the MP4, listing videos for dashboards/history, and deleting stored video IDs for housekeeping.

Create Status Download List Delete

What it isn’t

Not a “single synchronous endpoint” that instantly returns an MP4. Video rendering is compute-heavy and usually async. Also, it is not a replacement for your own hosting/CDN if you need long-term storage and stable links; you’ll typically download and store finished assets in your own storage.

Not instant MP4 Not permanent CDN Not unlimited usage

Developer reality check

Shipping “AI video generation” is as much about product engineering as it is about models: queue states, retries, rate limits, cost guardrails, and a user experience that makes rendering delays feel normal and safe. The model is only one part of a reliable video feature.

2) Capabilities: what you can do with Sora via API

According to the OpenAI API guide for video generation, the Videos API supports a set of endpoints designed for a render-job workflow: create a video, check status, download the MP4, list your videos, and delete a video ID from OpenAI’s storage. See: Video generation guide.

Core capabilities (from OpenAI docs)

Text-to-video

Generate short clips from natural language prompts. Prompts can describe scenes, actions, camera motion, pacing, and style. You can treat the prompt like a mini storyboard.

Reference image guidance

Provide an image reference to guide the generation. This can help maintain subject identity, composition, or a particular look. Use it for “animate this image” or “match this style/character.”

Synced audio (model-dependent)

The Sora 2 model pages describe “videos with synced audio,” meaning outputs may include audio tracks. Treat audio as content too: apply policy checks and consider transcription for moderation workflows.

Typical real-world use cases

Marketing & ads: generate short product promos, social clips, background loops, brand animations.
Creative tools: “prompt-to-video” features in a design app, video storyboard generators, concept previews.
E-commerce: animate product imagery into short clips for listings (with strong guardrails and branding prompts).
Internal content: quick pre-visualization for a pitch, a short clip to explain an idea, rapid prototyping.
Education: short illustrative clips for lessons (with careful policy compliance).

How long are Sora clips via API?

In the Videos API reference, the seconds parameter is limited to specific values (e.g., 4, 8, 12 seconds) depending on the endpoint and model configuration. Always confirm current allowed durations in the Videos API reference.

3) Quickstart: API key → create video → download MP4

Step A: Create and secure your OpenAI API key

OpenAI API authentication uses API keys with standard Bearer authentication: Authorization: Bearer OPENAI_API_KEY. Never embed API keys in client-side code. Use a backend proxy and store keys in a secrets manager. See the general API reference intro: API reference introduction.

Step B: Create a video render job

The Videos API includes a create endpoint that accepts: a required text prompt, optional input_reference (image file), an optional model (allowed values include sora-2 and sora-2-pro), and optional parameters like seconds and size (see: Videos API reference).

curl https://api.openai.com/v1/videos \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sora-2",
    "prompt": "A minimalist workspace in soft daylight. Slow camera pan across a white desk. A cup of coffee steams gently. Clean, modern, calm mood.",
    "seconds": "4",
    "size": "1280x720"
  }'

What you get back

Video generation is typically asynchronous. The create request returns an object representing the render job (a video ID and metadata), and you then call a status endpoint until the job completes.

Step C: Poll status and download

OpenAI’s video guide describes “Get video status” and “Download video” endpoints as part of the workflow, plus list/delete endpoints for history and housekeeping. See: Video generation guide.

// Conceptual polling loop (pseudo-code)
video = POST /v1/videos {prompt,...} -> returns {id}

repeat with backoff:
  status = GET /v1/videos/{id}
  if status.state in ("completed","failed"): break

if completed:
  mp4 = GET /v1/videos/{id}/content  // download
  save to your storage (S3/R2/GCS) and return CDN URL to user

Do I need to store videos myself?

For many production apps, yes. Use OpenAI storage as a retrieval step, then copy finished MP4s to your own object storage. That gives you stable URLs, CDN caching, retention controls, and consistent billing/analytics across providers.

4) Endpoint map: Create, Status, Download, List, Delete

The OpenAI video generation guide describes five endpoints with distinct roles: create a video render job, check status, download the finished MP4, list videos, and delete video IDs. Reference: Video generation with Sora.

Action	Endpoint (conceptual)	Purpose	Production notes
Create video	POST /v1/videos	Starts a new render job from prompt (optional reference image).	Wrap in backend; add idempotency; log video ID.
Get status	GET /v1/videos/{id}	Retrieves job state and metadata.	Poll with backoff; show “queued/generating/finalizing” UI states.
Download MP4	GET /v1/videos/{id}/content	Fetches finished MP4 bytes (or a download response).	Immediately stream to storage; avoid holding large files in memory.
List videos	GET /v1/videos	Enumerate videos (pagination).	Use for dashboards, “history,” clean-up scripts.
Delete video	DELETE /v1/videos/{id}	Removes a video ID from OpenAI storage.	Use for privacy and retention policies; keep your own audit trail.

Why OpenAI exposes list/delete for video

Video generation creates storage objects. Listing gives you visibility and helps you build “history.” Deletion supports housekeeping and privacy controls—important for compliance and for minimizing stored content you don’t need.

5) Request schema: model, seconds, size, and reference image

The Videos API reference documents request body fields including: prompt (required), input_reference (optional image file), model (optional; allowed values include sora-2, sora-2-pro), seconds (optional; allowed values include 4, 8, 12), and size (optional). See: Videos API reference.

Choosing a model: sora-2 vs sora-2-pro

OpenAI lists both models in its Models documentation: Models. Generally, you can treat them as two tiers: a flagship model and a more advanced “pro” model. In product terms, you might map them to “Standard” and “Pro” generation modes.

sora-2

A flagship video generation model with synced audio described on its model page. Good default for most video features where you need quality and predictable cost.

Video Synced audio Text + image inputs

sora-2-pro

The most advanced Sora tier described on its model page. Use it when you need maximum fidelity, better adherence, or stronger motion/scene complexity—then charge accordingly.

Highest quality tier Synced audio Premium plan mapping

Duration: seconds (4 / 8 / 12)

The allowed values for seconds (e.g., 4, 8, 12) make it easier to predict render cost and performance. Your product UX should expose these as simple options like “4s”, “8s”, “12s” rather than a freeform input. That reduces user confusion and keeps spending predictable.

Size: portrait vs landscape

The Sora model pages list portrait and landscape resolutions (for example, 720×1280 and 1280×720 on the Sora 2 page), and the pro model page mentions higher portrait/landscape options as well. Always confirm currently supported sizes and how they map to pricing in the relevant model docs. See: sora-2 and sora-2-pro.

Reference image guidance: when to use input_reference

Use an image reference when you need consistency: a brand character, a product shape, a logo-like subject, or a fixed composition. In many apps, “reference image” is the difference between a “fun demo” and a usable production feature. It also helps reduce prompt complexity because the model can see what you mean rather than relying only on text.

Validation: what to check before calling the API

Validate the basics on your backend before spending money: prompt length and disallowed content checks, allowed duration, allowed sizes, file type/size for reference images, and user quota (credits). Reject invalid inputs early with friendly error messages so users don’t burn requests.

6) Prompting for video: how to get consistent results

Video prompting is different from image prompting because you’re asking for change over time: motion, pacing, and continuity. The best Sora prompts read like short directions to a camera crew: what’s in the scene, what moves, how the camera moves, what the lighting is, and what mood/style you want. You’ll get better results (and fewer costly retries) when you structure prompts intentionally.

A practical prompt template

Scene: Describe the setting and subjects.
Action: What happens across the duration?
Camera: Static / pan / dolly / close-up / wide shot / handheld feel.
Lighting: Soft daylight, studio, neon, golden hour, etc.
Style: Cinematic, documentary, animation, minimal, stylized.
Constraints: No text overlays, no logos, no fast cuts (if needed).
Duration & format: 4s/8s/12s; portrait or landscape.

Example prompts that tend to work well

Product promo (landscape, gentle motion)

“A modern smart watch on a white desk. Soft natural daylight from a window. Slow camera dolly-in. Subtle reflections on the watch face. Calm, premium, minimalist. No text, no logos, no fast cuts.”

Travel vibe (portrait, social clip)

“Portrait video of a coastal road at sunset. Smooth forward motion like a stabilized handheld walk. Warm golden light, gentle lens flare. Cinematic color grading. Peaceful, dreamy mood.”

Common prompt mistakes (and fixes)

Too many ideas: If the prompt contains multiple scenes, the model may blend them. Fix: pick one scene per clip.
Unclear motion: If you don’t say what moves, results may be static. Fix: add one clear motion or camera move.
Overconstrained: Too many constraints can conflict. Fix: keep constraints focused on what matters.
Text overlays: Asking for readable text can be unreliable and may introduce policy concerns. Fix: add text in post.
Style drift: If you need identity consistency, use a reference image and keep prompt variations small.

Prompt iteration strategy that saves money

Don’t brute force dozens of prompts. Instead: start with 4 seconds and a single clean prompt. When you like the direction, keep the prompt mostly fixed and only change one variable at a time (lighting, camera motion, background). Move to 8s/12s only after you’ve found a stable formula. This “one-variable” approach dramatically reduces wasted renders.

Building a prompt UI: presets beat freeform

End users generate better clips when your UI provides presets: camera moves, moods, and style packs. Let users choose “Slow pan,” “Dolly in,” “Wide shot,” and “Cinematic,” and map those to prompt fragments. This increases consistency and reduces support load.

7) Audio & speech: synced audio and safety considerations

OpenAI’s Sora 2 model pages describe “videos with synced audio.” That’s powerful—but it raises additional product and policy work: audio can contain speech, sound effects, or music-like content, and it can also create new risks (impersonation, copyrighted music mimicry, etc.). OpenAI has published safety and responsible-launch notes for Sora that include audio safeguards and restrictions. See: Launching Sora responsibly.

Practical guidance for audio in production apps

Moderation and review

Treat audio like text: you may need a moderation layer. If your app allows public sharing, consider additional checks: transcription of generated speech (when present) and scanning for policy-violating content. The “Launching Sora responsibly” post notes transcript scanning for generated speech as part of safeguards.

Music and IP risks

If you enable audio outputs, avoid workflows that encourage imitation of specific artists or copyrighted works. OpenAI’s Sora safety notes mention blocks for attempts to generate music that imitates living artists or existing works. Provide safe UX copy (“no celebrity imitation,” “no copyrighted song imitation”) and enforce it.

UX tip: separate “silent clip” and “audio clip” modes

Many products benefit from two modes: a default “silent video” option for simple social clips and product loops, and an “include audio” option for advanced use cases. This gives you a safer baseline and limits risk exposure for new users.

Should you add your own watermarking?

If your app outputs AI-generated videos, many teams add visible watermarks or metadata-based provenance indicators, depending on their product requirements and distribution channels. This is more of a product decision than an API requirement, but it can reduce misuse and support user trust.

8) Reliability: polling, retries, idempotency, and failure handling

Video rendering is asynchronous. That means the most important engineering choices you make are around job orchestration: how you track job state, how you retry safely, and how you prevent accidental duplicate renders. If you get these wrong, your costs spike and your UX becomes unpredictable.

Polling done right (with backoff)

Polling is the simplest approach: after creating a video job, your backend checks status until it’s completed. The key is to use backoff (e.g., 2s → 3s → 5s → 8s…) and stop polling after a maximum time. Avoid tight loops that hammer the API and burn rate limit budget.

// Pseudo-code: robust polling schedule (conceptual)
attempt = 0
delay = 2s
maxWait = 3 minutes
start = now()

while now() - start < maxWait:
  status = GET /v1/videos/{id}
  if status.state == "completed": return status
  if status.state == "failed": throw error
  sleep(delay + randomJitter(0..500ms))
  attempt += 1
  delay = min(delay * 1.4, 12s)

throw timeout

Idempotency: the #1 cost-saver

Duplicate jobs happen when users click “Generate” multiple times, or when the frontend retries after a network timeout. Solve this by creating a deterministic request hash: hash = sha256(user + model + prompt + seconds + size + referenceImageId + options). Store the hash in your DB. If the same hash is requested again within a short window, return the existing job ID.

// Conceptual idempotency guard
hash = sha256(params)
job = db.findByHash(hash)

if job and job.state in ("queued","running","completed"):
  return job

job = createNewVideo(params)
db.save({hash, jobId: job.id, ...})
return job

Retries: what’s safe to retry?

Safe to retry: status checks, downloads (with resume/streaming), list requests.
Careful: create requests. Only retry create if you have idempotency in place or if you can prove it didn’t start.
Always log: when retries happen, track count and outcome to detect provider issues.

How to present failures to users (without panic)

Video rendering can fail occasionally due to transient infrastructure issues or content constraints. In UI, explain calmly: “This render didn’t complete. Please try again or adjust the prompt.” Provide a one-click retry that reuses the same parameters (and uses idempotency to avoid duplicates). For power users, show an error code and a link to your help page.

9) Storage & delivery: your CDN strategy matters

In production, the typical pattern is: download the finished MP4 from OpenAI, store it in your own object storage, and serve it through a CDN. This approach gives you: stable URLs, predictable caching, custom retention, user-level permissions, analytics, and control over when content is deleted.

Recommended storage flow

Create video job; store job ID in DB.
Poll status until completed.
Stream download to object storage (don’t buffer large files in memory).
Generate a signed CDN URL or public URL (depending on your product).
Return the link to the user; update history dashboard.
Optionally delete video ID from OpenAI storage once stored (housekeeping/privacy).

Retention & privacy

Many apps implement “delete by user” and “auto-expire” rules. For example: free-tier videos expire after 7 days; paid-tier stored for 90 days; enterprise stored based on contract. If you expose deletion in UI, ensure you also remove cached CDN links and any derived assets (thumbnails).

Thumbnails, previews, and playback UX

Video generation features feel better when users see something quickly: a placeholder “rendering” card, then a thumbnail and play button after completion. Consider generating a thumbnail server-side from the MP4 and storing it alongside the video. This makes your feed/history pages fast and mobile-friendly.

When should you call delete on OpenAI video IDs?

If you copy assets to your own storage and don’t need OpenAI to retain them, delete to reduce stored content exposure and simplify privacy. But keep a short buffer period for support/debugging if you need to re-download. Also make sure your deletion is idempotent: attempting to delete an already-deleted ID should not break your workflow.

10) Pricing & cost planning: build guardrails from day one

Video generation can become expensive fast if your UX encourages “spam generate.” OpenAI’s Sora 2 model page shows pricing in a per-second framing (and/or token-like metrics depending on model), with an example of $0.10 per second at certain resolutions on the Sora 2 page. Always confirm current prices and tiers in the official docs: sora-2 model page and your pricing dashboard.

Cost math that keeps you sane

Treat cost as: model tier × duration × resolution × retries. Your job is to reduce retries and only generate “final” outputs when users truly want them. A two-step flow (4s preview → 8/12s final) is a strong baseline.

Cost controls to implement

Quotas

Per-user daily/monthly budgets (credits). Require upgrades for higher volumes. This protects you from unexpected bills and prevents abuse.

Plan-based feature gating

Make sora-2-pro a premium feature. Make 12s videos premium. Keep free tier on 4s standard model with lower resolution.

Idempotency + dedupe

Hash requests and reuse jobs when users repeat clicks. This single technique prevents accidental double renders, which is one of the biggest hidden costs.

UX patterns that reduce spending without feeling restrictive

Preview-first: default to 4s, let users “Extend to 8s/12s” after preview.
One-click rerun: allow rerun but warn about cost and keep prompt history.
Preset packs: curated styles reduce trial-and-error prompting.
Show progress: clear “queued / rendering / finalizing” reduces repeated clicks.
Explain limits: “Free plan: 10 renders/day” is better than silent throttling.

How to estimate a monthly budget

Start from your expected number of completed clips per user. Multiply by average duration and your model mix. Then add 20–40% overhead for retries and experimentation (if you’re early). As you improve prompt presets and UX, that overhead can drop significantly. Track “renders per successful clip” as a KPI; you want that number low.

11) Safety, policy, and responsible use

Any app that generates media must follow OpenAI’s policies and enforce appropriate safeguards. OpenAI’s Usage policies apply broadly. OpenAI also published guidance specifically about creating Sora content responsibly: Creating Sora videos in line with our policies.

What “responsible Sora integration” looks like

Policy-aware prompt handling

Validate prompts before submitting. Block clearly disallowed requests. Provide user-friendly rewrites (“Try describing a fictional character instead of a real person.”) and log policy rejections for abuse detection.

Human review where needed

If your app distributes content publicly (feeds, social posting, marketplace assets), add human review for edge cases or for high-risk categories. Combine automated checks with clear reporting and takedown processes.

High-level guardrails to consider

Age-appropriate UX: ensure your app doesn’t encourage unsafe or explicit content generation.
Impersonation protections: discourage deepfake-like prompts and enforce restrictions where required.
IP awareness: avoid instructing users to imitate living artists or copyrighted works (especially with audio).
Report & takedown: give users a way to report harmful content; act quickly.
Transparency: label AI-generated outputs where appropriate.

Policy updates happen—how do you stay current?

Add a small “Policy & Safety” section in your internal runbook with links to the live OpenAI policy pages. When you onboard new team members or ship a new feature (public sharing, audio, templates), review that section. Also add monitoring for spikes in policy rejections or user reports—those spikes often signal prompt abuse.

12) Production architecture blueprint (that scales)

Here’s a simple architecture that works well for Sora-style asynchronous video generation: a backend API that validates requests, a job queue that performs create/poll/download, storage for MP4s, and a UI that subscribes to job status.

Component	Responsibility	Why it matters
Your backend API	Receives prompt requests, enforces quotas, starts jobs, returns job IDs.	Protects API keys and costs; applies guardrails.
Queue + workers	Executes create calls, polls status, downloads MP4, stores in object storage.	Prevents request timeouts; controls concurrency.
Database	Stores job state, params hash, user ownership, costs, and asset URLs.	Enables idempotency and history dashboards.
Object storage + CDN	Durably stores MP4 outputs and serves them fast worldwide.	Better UX and stable links; reduces vendor lock-in.
Observability	Logs, metrics, alerts, trace IDs; tracks error rates and latency.	Faster debugging and higher reliability.

Concurrency control: your worker pool is the throttle

Even if the provider allows high throughput, your own system should throttle. Use a worker pool of N workers per region, where N is tuned to your cost budget and expected demand. If you exceed rate limits, backoff and reduce worker concurrency.

Observability KPIs you should track

Render success rate: % completed vs failed.
Time to first playable: create → completed → available on CDN.
Renders per successful clip: how many attempts users need (lower is better).
Policy rejection rate: spikes indicate abuse or confusing UX.
Cost per active user: your COGS baseline.

Enterprise note: privacy controls and retention

If you sell to enterprise customers, retention, deletion, and access controls become core requirements. Document where prompts and videos are stored, who can access them, how long you retain them, and how you honor deletion requests. Keep this aligned with OpenAI’s own privacy and policy pages and your contracts.

13) FAQ: Sora API

Which models can I use for video generation?

The Videos API reference lists allowed values for the model field including sora-2 and sora-2-pro. See: Videos API reference and Models.

How do I control duration?

Use the seconds field. The reference shows allowed values like 4, 8, and 12 seconds. Provide these as preset options in your UI to prevent invalid requests.

Can I guide a video with an image?

Yes. The Videos API reference includes an input_reference field for an optional image reference that guides generation. This is often essential for consistency.

Is Sora generation synchronous?

Usually no. The official guide presents a multi-step workflow: create job, check status, then download once complete. That pattern is common for compute-heavy video generation.

Where do I find the most accurate parameter list?

Use the live documentation: Videos API reference and the video guide: Video generation guide. Model pages (sora-2 and sora-2-pro) also document capabilities and output formats.

What policies do I need to follow?

Follow OpenAI’s Usage Policies: Usage policies, plus Sora-specific guidance: Sora policy guidance. If your app enables sharing, implement reporting and takedown workflows.

References (official OpenAI docs)

Use the following official pages as the source of truth for exact schemas, allowed values, and any changes over time.

Topic	Official link	Use it for
Videos API reference	`https://platform.openai.com/docs/api-reference/videos`	Request fields, allowed values, endpoints
Video generation guide	`https://platform.openai.com/docs/guides/video-generation`	Workflow: create → status → download + list/delete
sora-2 model page	`https://platform.openai.com/docs/models/sora-2`	Capabilities, formats, example pricing cues
sora-2-pro model page	`https://platform.openai.com/docs/models/sora-2-pro`	Pro tier details, formats, size options
All models	`https://platform.openai.com/docs/models`	Model availability overview
Usage policies	`https://openai.com/policies/usage-policies/`	Safety rules and compliance baseline
Sora policy guidance	`https://openai.com/policies/creating-sora-videos-in-line-with-our-policies/`	Practical tips for compliant Sora content
Launching Sora responsibly	`https://openai.com/index/launching-sora-responsibly/`	Safety posture and risk mitigations

Sora API - the practical developer guide to OpenAI video generation