Writer API - Complete Developer Guide

1) What the Writer API is (and when it’s the right choice)

Writer AI Studio is a full-stack platform for building and managing enterprise AI agents. The Writer API is how developers integrate those capabilities into external applications, backend services, internal tools, and product features.

What you can build

The Writer API supports a wide range of “LLM app” workloads, from simple prompt-to-text generation to multi-step, tool-using agents that depend on your company’s knowledge. Common builds include:

Customer support assistants that answer from your policy docs, product manuals, and knowledge base using Knowledge Graph RAG.
Internal copilots that can summarize long documents, draft emails, generate meeting notes, and answer questions from internal files.
Content workflows like product descriptions, brand-safe marketing copy, SEO outlines, social captions, and tone-consistent variations.
Extraction and structured outputs for turning messy text into JSON (entities, categories, compliance tags, and data tables).
Research flows where the agent uses tools like web search and returns answers with sources.
Document QA where users ask questions about uploaded PDFs, DOCX, CSV files, and more (often via Knowledge Graph ingestion).

How Writer’s approach tends to differ

Writer positions Knowledge Graph as graph-based retrieval-augmented generation (RAG), where your content is ingested into a graph representation that can be queried during a chat. In many organizations, this complements or replaces traditional vector retrieval. If your app’s success depends on “use my organization’s data safely and accurately,” this RAG layer becomes central.

Use Writer API when…

You want enterprise controls (permissions, scoped keys, observability, governance).
You need RAG that’s integrated into the platform (Knowledge Graph).
Your app needs tool calling (web search, translation, etc.) within chat.
You care about cost predictability with clear per-token pricing for specific models.

Use a simpler approach when…

You just need a basic one-shot completion (no tools, no RAG, no governance).
You’re prototyping and want minimal setup.
You can handle retrieval and observability yourself (custom vector DB + tracing).

Many teams start simple, then adopt platform features like Knowledge Graph or agent-level permissions once they reach production scale.

2) Authentication, API agents, and key management

Writer API uses token authentication via an API key passed in the Authorization header as a Bearer token. Keys are created inside Writer AI Studio and attached to API agents.

Bearer header format

Authorization: Bearer <your-writer-api-key>

API agents and “capabilities”

In Writer AI Studio, you create an API agent and then generate one or more keys for that agent. The important detail: permissions are set at the agent level, and “capabilities” map to specific endpoints. That means you can create separate API agents for different environments (dev/staging/prod) or different product components (chat service vs. ingestion worker), each with minimum required capabilities.

Recommended key strategy (simple + safe)

One API agent per environment (Dev, Staging, Prod), and rotate keys independently.
One API agent per service boundary if your system is split (e.g., “Chat API” vs “Ingestion Worker”).
Never ship API keys to browsers. Always call Writer from a server you control.
Store keys in a secrets manager or environment variables (Writer docs commonly reference WRITER_API_KEY).
Rotate on incident: if a key leaks, revoke it and mint a replacement quickly.

Minimal cURL example (chat completion)

curl --location 'https://api.writer.com/v1/chat' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <your-api-key>' \
  --data '{
    "model": "palmyra-x5",
    "messages": [
      { "role": "user", "content": "Write a one-sentence product description for a cozy sweater." }
    ]
  }'

SDKs (Python + Node.js)

Writer provides SDKs for Python and Node.js that simplify authentication, requests, and (often) pagination and error handling. A common pattern is: set WRITER_API_KEY in your environment and initialize a client without explicitly passing the key.

Python (conceptual example)

from writerai import Writer

# If WRITER_API_KEY is set, you can often do:
client = Writer()

resp = client.chat.completions.create(
  model="palmyra-x5",
  messages=[{"role":"user","content":"Summarize this text in 3 bullets: ..."}]
)

print(resp)

Exact method names can vary by SDK version. Use the official SDK docs and API reference to confirm the current shape.

Node.js (conceptual example)

import { Writer } from "writer-sdk";

const client = new Writer({ apiKey: process.env.WRITER_API_KEY });

const resp = await client.chat.completions.create({
  model: "palmyra-x5",
  messages: [{ role: "user", content: "Draft a polite refund policy reply." }]
});

console.log(resp);

Treat SDKs as convenience layers; your “source of truth” is the API reference endpoints and request/response formats.

3) Models, context windows, and pricing (Palmyra)

Writer’s models (commonly listed as Palmyra variants) are identified by model IDs such as palmyra-x5 and palmyra-x4. Your choice should be driven by: quality needs, latency expectations, tool calling requirements, context length, and cost.

palmyra-x5 palmyra-x4 palmyra-x-003-instruct palmyra-med palmyra-fin palmyra-creative palmyra-vision

Pricing: how it’s usually calculated

Writer pricing is commonly published as cost per 1M tokens for input and output (and in the case of vision, a per-image fee plus text output tokens). Most teams estimate monthly cost by tracking:

Prompt tokens: the user message, system instructions, tool schemas, conversation history, and any retrieved context.
Completion tokens: the model’s generated output (including structured JSON output).
Tool overhead: any extra tokens added by tool calling (tool definitions, tool results, etc.).

Writer’s docs note that usage info is returned in API responses, and prompt token counts can include tokens used for system prompts. In production, you should store per-request usage for chargeback, cost alerts, and performance optimization.

Example pricing table (from Writer AI Studio docs)

Model	Model ID	Input (per 1M tokens)	Output (per 1M tokens)	Typical fit
Palmyra X5	`palmyra-x5`	$0.60	$6.00	Long-context agents, strong general performance, tool workflows
Palmyra X4	`palmyra-x4`	$2.50	$10.00	Fast general purpose, tool calling, reliability-focused tasks
Palmyra X 003 Instruct	`palmyra-x-003-instruct`	$7.50	$22.50	Instruction-following completions where you want strict structure
Palmyra Med	`palmyra-med`	$5.00	$12.00	Healthcare-oriented language tasks (use with domain governance)
Palmyra Fin	`palmyra-fin`	$5.00	$12.00	Finance-oriented language tasks (policies, summaries, analysis)
Palmyra Creative	`palmyra-creative`	$5.00	$12.00	Creative ideation, copy variants, brand storytelling
Palmyra Vision	`palmyra-vision`	$7.50 (text input)	$7.50 (text output) + $0.005 / image	Image + text tasks (check availability by product surface)

Context windows: why they matter

Context window is the amount of text (and tool context) the model can consider at once. Writer’s model docs list different context sizes, including a very large context window for Palmyra X5. In practical terms, large context helps when:

You want to include multiple long documents without aggressive summarization.
You have long conversations where earlier turns still matter.
Your tool schema set is large and you still want to preserve history.
Your RAG layer returns larger context snippets (or graph query results) you want the model to use.

Even with large context, the best production systems still control prompt growth using summarization, state compaction, and retrieval that is “small but relevant.”

Model selection cheat sheet (pragmatic)

If you’re building a chat agent that must read lots of company docs, start with Palmyra X5 for long context and cost efficiency. If you need a smaller/faster general model and your context is moderate, try Palmyra X4. If you’re doing rigid instruction completion (like “generate this exact JSON schema”), consider an instruct-style model if supported in your workflow. For domain-specific tone and terminology, experiment with Med/Fin/Creative depending on your use case.

4) Chat completions: the core endpoint you’ll use most

The chat completion endpoint is the backbone for most Writer API integrations. It creates responses based on a series of messages (roles + content), and it can support multi-turn conversation where you pass previous messages so the model keeps context.

Endpoint

POST https://api.writer.com/v1/chat

Request shape (high-level)

A typical request includes:

model: one of the Palmyra model IDs (example: palmyra-x5).
messages: an array of message objects like { role: "user", content: "..." }.
Optional: tools and tool configuration if you want web search, translation, Knowledge Graph tools, etc.
Optional: parameters for output control (e.g., max tokens, temperature) depending on the API reference’s supported fields.

Simple single-turn example (cURL)

curl --location --request POST https://api.writer.com/v1/chat \
  --header "Authorization: Bearer $WRITER_API_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "model": "palmyra-x5",
    "messages": [
      { "role": "user", "content": "Write a memo summarizing this earnings report: <paste text>" }
    ]
  }'

Multi-turn conversations

For multi-turn chat, include prior messages in order (system/developer instructions if you use them, then user/assistant turns). The model uses that history to keep context. In production, keep history under control:

Summarize old turns into a compact “conversation state” message.
Store important facts separately in a database and inject only what’s needed.
Use Knowledge Graph or targeted retrieval to avoid pasting entire histories.

Tool calling inside chat

Tool calling is where chat completions become “agentic.” You provide tools (like translation or web search) and allow the model to choose when to use them. Conceptually:

You include tool definitions and settings in your chat request.
The model decides to call a tool (sometimes with arguments).
Your app runs the tool (or calls Writer’s tool endpoint) and returns results.
The model uses those results to produce a final answer.

Writer supports both prebuilt tools (such as web search) and tool workflows that integrate Knowledge Graph queries. This is one reason “chat completion” is often the only endpoint you need for many features: you can route translation, web search, and graph queries through the same chat interface.

Streaming vs non-streaming

Non-streaming is simplest: you wait for the full response, then return it. Streaming sends partial tokens so your UI can show “typing” and reduce perceived latency. Use streaming for chat UIs, long answers, or agent workflows that take time. Keep in mind that streaming complicates retries (you may receive partial output). In critical workflows, you may prefer non-streaming with robust timeouts and retry logic.

Structured JSON outputs

Many production features need structured output (for example: a list of tasks, an extraction schema, or a normalized classification object). The general approach is: (1) provide a strict JSON schema or template inside the prompt, (2) tell the model “output JSON only,” and (3) validate. If you’re using an SDK integration that supports schema-based parsing, keep a fallback: retry with stricter instructions if parsing fails.

5) Knowledge Graph (graph-based RAG): connect your data to the model

Knowledge Graph is Writer’s platform feature for retrieval-augmented generation (RAG). Instead of relying only on a model’s training data, you ingest your organization’s content into a graph and then query it during chat completions. The goal is to improve factual accuracy, reduce hallucinations, and enable “answer from our docs” experiences.

Conceptual workflow

Create a Knowledge Graph (an empty container with a name and description).
Upload files (PDF, DOCX, PPTX, CSV, HTML, images, etc.) using the File API.
Attach files to the graph (or associate on upload using a query parameter).
Query the graph during chat using a Knowledge Graph tool/workflow so the model can reference the retrieved content.

Create a Knowledge Graph

The docs show a direct endpoint for graph creation. You send a name and description; the API returns a graph ID.

curl --location --request POST https://api.writer.com/v1/graphs \
  --header "Authorization: Bearer $WRITER_API_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "name": "Financial Reports",
    "description": "Knowledge Graph of 2024 financial reports"
  }'

Add a file to the Knowledge Graph

After uploading a file (to get a file ID), you associate it with a graph. The API reference commonly documents endpoints like:

POST /v1/graphs/{graph_id}/file
DELETE /v1/graphs/{graph_id}/file/{file_id}

Knowledge Graph during chat (RAG in practice)

Most teams don’t want “a separate RAG pipeline” and “a separate chat pipeline.” They want one chat call that can: retrieve relevant content, cite it internally, then answer. Writer’s Knowledge Graph chat tool is designed for that. In a typical flow:

You configure chat to allow Knowledge Graph queries (and specify which graphs the agent can use).
You ask a question (“What does our refund policy allow for subscriptions?”).
The model queries the graph and then answers using the retrieved content.

Production RAG tips (that actually move metrics)

Write a good graph description. Treat it like an instruction: what’s in the graph, and when to use it.
Chunking matters. If you control document formatting, add headings, short paragraphs, and consistent structure.
Keep retrieved context small. Large dumps reduce answer quality and raise cost. Prefer “few best” passages.
Ask for citations. Even if you don’t show them to users, log them for QA and trust evaluation.
Measure accuracy. Run evaluation sets: 50–500 real questions, compare answers, and iterate on ingestion.

URLs and web connectors

Writer’s Knowledge Graph docs describe adding URLs as connectors to enhance a graph with website content. In practice, teams often ingest: public documentation pages, help center articles, or internal wiki pages. If you ingest web content, set governance rules: who can add URLs, how often you refresh them, and how you handle removed or changed pages.

6) File API: upload, list, and manage files

Writer’s File API lets you upload, download, list, and delete files in your account. Files can then be used as inputs to Knowledge Graph ingestion or no-code agents, depending on your setup.

Endpoint (upload)

POST https://api.writer.com/v1/files

Supported file formats (examples)

The API reference lists a variety of common formats such as PDF, DOC/DOCX, PPT/PPTX, JPG/PNG, EML, HTML, SRT, CSV, XLS/XLSX. In production, standardize your ingestion formats. If your organization uses many formats, build a preprocessing layer that:

Converts uncommon formats into PDF or text when possible.
Strips boilerplate, footers, and repeated headers that can pollute retrieval.
Normalizes encoding and removes broken characters.

Why “file lifecycle” matters

Files persist in your account until you delete them. That makes file lifecycle a governance and cost topic:

Retention: delete files that are no longer needed, especially if they contain sensitive information.
Versioning: if a policy doc changes, upload a new version and remove the old one from the graph.
Traceability: log which file IDs contributed to an answer so you can audit later.

Filtering and listing files

Writer’s changelog notes support for filtering files by type using a file_types query parameter (for example, txt,pdf,docx). This is useful when you want to build a management UI that only shows documents relevant to RAG ingestion.

7) Tools: web search, translation, vision, and deprecations

Tools are what turn a “chat model” into an “agent.” With tools, the model can request external information or transformations, and then use results to produce better final answers.

Web search tool

Writer documents a web search tool endpoint. It accepts a query and can return results with source URLs. This is valuable for:

“Freshness” questions (news, updates, policy changes, or content outside your private docs).
Research assistants that must provide citations.
Reducing hallucinations by grounding answers in retrieved sources.

POST https://api.writer.com/v1/tools/web-search

Translation tool (recommended modern approach)

Writer’s docs describe migrating from a standalone translation endpoint to a translation tool used inside chat completions. The benefit is that translation becomes part of a broader workflow: translate, summarize, extract, and respond in one coherent chat interaction.

Vision / images in chat

Writer documents image support in chat with specific models (for example, image analysis in chat completions being supported with Palmyra X5). In practical terms, you can:

Ask the model to summarize or interpret an image in a business context (product photo, screenshot, diagram).
Combine text + images in one request (e.g., “Look at this screenshot and tell me what error it shows.”).
Use uploaded images by referencing their IDs or URLs as required by the API and model capabilities.

Important: deprecations you should account for

Writer’s API reference includes deprecation notices. For example, some endpoints were marked as deprecated and scheduled for removal on December 22, 2025. Since this guide is labeled “2026,” you should assume deprecated endpoints may no longer be available and migrate to the recommended alternatives (often: prebuilt tools in chat completions).

AI detection endpoint /v1/tools/ai-detect was documented with a deprecation notice and removal date.
Parse PDF endpoint /v1/tools/pdf-parser/{file_id} was documented with a deprecation notice and removal date.
Standalone tool APIs (like web search or translation) may have migration guides that move functionality into chat tool calling.

How to handle tool migrations safely

Treat tool migrations as backwards-incompatible changes. Add a feature flag in your app that switches between old and new paths, run both in parallel for a small percentage of traffic, compare outputs, then flip fully. Keep logs of tool results so you can debug “why did this answer change?” after migration.

8) Rate limits, errors, and reliability

Every production integration should start with reliability. Writer documents rate limits as both requests per minute (RPM) and tokens per minute (TPM). A common published baseline is:

Typical limits (example)

RPM: 400 requests/min
TPM: 25,000 tokens/min

If you need higher limits, enterprise plans may allow custom quotas via sales.

Why TPM matters more than RPM

If you send huge prompts (long chat history + large retrieved context), you can hit token limits long before request limits. TPM is the real ceiling for “how much work per minute” your app can do.

Retry strategy (safe defaults)

Retry on transient failures: 429 (rate limited), 500, 502, 503, 504.
Use exponential backoff with jitter: spread retries so you don’t create a thundering herd.
Cap retries: 2–4 retries is common; beyond that, fail fast and degrade gracefully.
Timeouts: set a request timeout; don’t let calls hang indefinitely.

Idempotency and duplicate prevention

If you retry POST requests, you must assume duplicates can happen (especially if the server succeeded but your network dropped). If the API supports idempotency keys, use them. If not, implement your own deduplication:

Generate a request ID and store it with a status (“started”, “finished”) in your DB.
If a retry occurs, check whether that request already succeeded before running it again.
For chat UI streaming, consider that partial responses may have been delivered; the client must be able to reconcile.

Error handling

Writer’s API reference includes an “error codes” section with examples (including SDK-based patterns). In your app, you want to log enough detail to debug without storing sensitive user content:

HTTP status + error code string
Request ID / correlation ID
Model ID, endpoint, and token usage (not the full prompt)
Latency breakdown (DNS, connect, first byte, total)

9) Production architecture: patterns that scale

A good Writer API integration is less about “one perfect prompt” and more about an end-to-end system: retrieval + prompt assembly + tool calling + validation + caching + observability. Below are architecture patterns that tend to produce stable, affordable results.

Pattern A: Chat service (thin) + Retrieval service (smart)

Separate responsibilities:

Chat service: handles user sessions, rate limiting, streaming, UI formatting.
Retrieval/Knowledge service: manages Knowledge Graph ingestion, file lifecycle, and evaluation of retrieval quality.

This reduces coupling: if you change ingestion logic (new doc formats, new policies), your chat UI doesn’t break.

Pattern B: “Two-model” pipeline for cost control

Use a cheaper/faster model for routine steps, and a stronger model only when needed:

Step 1: classify intent, detect language, extract entities (fast + cheap).
Step 2: retrieve from Knowledge Graph if needed.
Step 3: answer with a stronger model when risk is high (compliance, finance, medical, legal-like content).

Even if you only use one model family, you can reduce cost by controlling output length, limiting retrieval size, and summarizing context before handing it to the “final answer” step.

Pattern C: Validation loop for structured outputs

If your app depends on JSON, add a loop:

Request JSON output with a strict schema and an example.
Validate JSON (schema validator).
If invalid: retry once with a “repair” prompt: “Fix the JSON; do not change meanings.”
If still invalid: return a safe fallback or route to a human review queue.

Pattern D: Caching (what to cache, what not to)

Caching is the #1 lever for cost and speed, but you must cache the right things:

Cache deterministic tasks: templates, standard summaries, and repeated help-center answers.
Cache retrieval results for stable queries (e.g., “refund policy”), with short TTL if docs change frequently.
Don’t cache personal data unless you have explicit governance and encryption.
Cache at the “prompt assembly” layer: storing the final prompt/context is often more useful than caching raw output.

Security and prompt-injection basics

Any system that uses user input + tool calling + private docs must assume adversarial behavior. Practical defenses:

Separate instructions from user content and never let user text override system policies.
Gate tools: don’t allow web search or file access for every user tier if it increases risk.
Least privilege for API keys: a chat-only service shouldn’t have file deletion permissions.
Log and audit: track which graphs/files/tools were used for each answer.

Deployment options (enterprise note)

Writer documents managed deployment options (for example, Standard Cloud vs Private Cloud). If you’re operating in a regulated environment, confirm data residency, access controls, and compliance requirements before putting private documents into any RAG system.

10) Writer API FAQs (developer-focused)

Is the Writer API the same as Writer AI Studio?

Writer AI Studio is the platform and UI for building agents, managing keys, observability, and governance. The Writer API is the programmatic interface that lets your application call the same platform capabilities. In practice, you’ll use AI Studio to create API agents/keys and manage permissions, then use the API from your backend services.

How do I get a Writer API key?

You typically create it in Writer AI Studio under Admin Settings → API Keys. Writer’s docs note you can’t view the key after creation, so copy it immediately and store it securely. If you lose it, you generate a new key.

What is the main endpoint I should start with?

Start with POST /v1/chat. It covers standard chat completions and is the foundation for multi-turn conversation and tool calling. Once that works, add files and Knowledge Graph if you need RAG.

What’s the difference between files and Knowledge Graph?

Files are raw uploaded assets stored in your account. Knowledge Graph is a retrieval layer built from files (and potentially URLs) that can be queried during chat completion so the model answers using your documents. Uploading a file alone doesn’t guarantee it will be used for question answering— attaching it to a graph and querying the graph is the typical RAG workflow.

How do I reduce Writer API costs?

The highest-impact moves are: (1) shrink prompts (summarize history, reduce retrieved context), (2) cap output length, (3) cache repeated answers, (4) use a two-step pipeline (cheap classification + targeted strong model only when needed), and (5) measure token usage per endpoint so you optimize based on reality rather than guesses.

What rate limits should I plan for?

Writer documents limits as RPM and TPM. A published baseline example is 400 requests/min and 25,000 tokens/min. Design your system to back off on 429 responses and to avoid token spikes by controlling prompt size.

Can I do web search and translation inside chat?

Yes. Writer documents a web search tool endpoint and also describes translation as a tool used within chat completions (with migration guides from standalone endpoints). Tool calling lets the model decide when to use these capabilities as part of one coherent workflow.

Are there deprecated endpoints I should avoid in 2026?

The API reference includes deprecation notices for some tools (for example, AI detection and a parse-PDF endpoint) with a removal date of December 22, 2025. In a 2026 build, you should assume those endpoints may be unavailable and follow the migration guidance toward tool-based workflows inside chat completions.

Should I call Writer API directly from the browser?

No. Your API key must remain secret. Put Writer calls behind your own backend (or serverless function), implement your own authentication for users, and enforce per-user and per-org quotas to prevent abuse.

How do I keep answers grounded in my docs?

Use Knowledge Graph for retrieval, keep the retrieved context tight, instruct the model to only answer using retrieved content when required, and log which files/graph nodes were used for each response. Then run evaluation sets with real questions to measure accuracy.

References (official docs)

These are the most useful official pages to keep open while building. (Links are included as plain URLs so you can copy/paste.)

Topic	URL	Why it matters
AI Studio Introduction	`https://dev.writer.com/home/introduction`	Platform overview and navigation
Quickstart	`https://dev.writer.com/home/quickstart`	First API call, how to create keys
API Keys guide	`https://dev.writer.com/api-reference/api-keys`	Bearer auth, API agents, permissions
Chat completion API reference	`https://dev.writer.com/api-reference/completion-api/chat-completion`	Request/response format for /v1/chat
Models	`https://dev.writer.com/home/models`	Model IDs, context windows, capabilities
Pricing	`https://dev.writer.com/home/pricing`	Per-token costs and tool pricing
Knowledge Graph guide	`https://dev.writer.com/home/knowledge-graph`	Create graphs and manage ingestion
Files guide	`https://dev.writer.com/home/files`	Upload/list/delete files
Web search tool API reference	`https://dev.writer.com/api-reference/tool-api/web-search`	Ground answers with web results
Rate limits	`https://dev.writer.com/api-reference/rate-limits`	RPM/TPM quotas and best practices
Changelog	`https://dev.writer.com/home/changelog`	Keep up with endpoint/SDK changes
Usage policy	`https://dev.writer.com/home/usage-policy`	Rules, key security expectations