Enterprise LLM platform • Chat • Tools • Graph-based RAG

Writer API: build agents with chat completions, Knowledge Graph (graph-based RAG), files, and tools

The Writer API is the programmatic interface for Writer’s AI Studio platform. You can use it to generate and structure text with Writer’s Palmyra models, run tool-using workflows (like web search or translation inside a chat), upload and manage files, and connect your organization’s content via Knowledge Graph for retrieval-augmented generation (RAG).

This page is written to be practical: you’ll learn how auth works, which endpoints matter most, how pricing is typically calculated, how to design reliable production calls (timeouts, retries, idempotency, streaming), and how to reduce cost without losing quality.

Auth
Bearer API key
Core primitive
Chat completion
RAG layer
Knowledge Graph
Limits (typical)
RPM + TPM
Quick expectation-setting:

Writer API keys are created inside Writer AI Studio and are attached to “API agents.” Permissions are managed at the agent level, and “capabilities” map to specific endpoints. Plan your key strategy like you would plan service accounts: least privilege, separate keys per environment, and rotate quickly if leaked.

1) What the Writer API is (and when it’s the right choice)

Writer AI Studio is a full-stack platform for building and managing enterprise AI agents. The Writer API is how developers integrate those capabilities into external applications, backend services, internal tools, and product features.

What you can build

The Writer API supports a wide range of “LLM app” workloads, from simple prompt-to-text generation to multi-step, tool-using agents that depend on your company’s knowledge. Common builds include:

How Writer’s approach tends to differ

Writer positions Knowledge Graph as graph-based retrieval-augmented generation (RAG), where your content is ingested into a graph representation that can be queried during a chat. In many organizations, this complements or replaces traditional vector retrieval. If your app’s success depends on “use my organization’s data safely and accurately,” this RAG layer becomes central.

Use Writer API when…

  • You want enterprise controls (permissions, scoped keys, observability, governance).
  • You need RAG that’s integrated into the platform (Knowledge Graph).
  • Your app needs tool calling (web search, translation, etc.) within chat.
  • You care about cost predictability with clear per-token pricing for specific models.

Use a simpler approach when…

  • You just need a basic one-shot completion (no tools, no RAG, no governance).
  • You’re prototyping and want minimal setup.
  • You can handle retrieval and observability yourself (custom vector DB + tracing).

Many teams start simple, then adopt platform features like Knowledge Graph or agent-level permissions once they reach production scale.

2) Authentication, API agents, and key management

Writer API uses token authentication via an API key passed in the Authorization header as a Bearer token. Keys are created inside Writer AI Studio and attached to API agents.

Bearer header format

Authorization: Bearer <your-writer-api-key>

API agents and “capabilities”

In Writer AI Studio, you create an API agent and then generate one or more keys for that agent. The important detail: permissions are set at the agent level, and “capabilities” map to specific endpoints. That means you can create separate API agents for different environments (dev/staging/prod) or different product components (chat service vs. ingestion worker), each with minimum required capabilities.

Recommended key strategy (simple + safe)

  • One API agent per environment (Dev, Staging, Prod), and rotate keys independently.
  • One API agent per service boundary if your system is split (e.g., “Chat API” vs “Ingestion Worker”).
  • Never ship API keys to browsers. Always call Writer from a server you control.
  • Store keys in a secrets manager or environment variables (Writer docs commonly reference WRITER_API_KEY).
  • Rotate on incident: if a key leaks, revoke it and mint a replacement quickly.

Minimal cURL example (chat completion)

curl --location 'https://api.writer.com/v1/chat' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <your-api-key>' \
  --data '{
    "model": "palmyra-x5",
    "messages": [
      { "role": "user", "content": "Write a one-sentence product description for a cozy sweater." }
    ]
  }'

SDKs (Python + Node.js)

Writer provides SDKs for Python and Node.js that simplify authentication, requests, and (often) pagination and error handling. A common pattern is: set WRITER_API_KEY in your environment and initialize a client without explicitly passing the key.

Python (conceptual example)

from writerai import Writer

# If WRITER_API_KEY is set, you can often do:
client = Writer()

resp = client.chat.completions.create(
  model="palmyra-x5",
  messages=[{"role":"user","content":"Summarize this text in 3 bullets: ..."}]
)

print(resp)

Exact method names can vary by SDK version. Use the official SDK docs and API reference to confirm the current shape.

Node.js (conceptual example)

import { Writer } from "writer-sdk";

const client = new Writer({ apiKey: process.env.WRITER_API_KEY });

const resp = await client.chat.completions.create({
  model: "palmyra-x5",
  messages: [{ role: "user", content: "Draft a polite refund policy reply." }]
});

console.log(resp);

Treat SDKs as convenience layers; your “source of truth” is the API reference endpoints and request/response formats.

3) Models, context windows, and pricing (Palmyra)

Writer’s models (commonly listed as Palmyra variants) are identified by model IDs such as palmyra-x5 and palmyra-x4. Your choice should be driven by: quality needs, latency expectations, tool calling requirements, context length, and cost.

palmyra-x5 palmyra-x4 palmyra-x-003-instruct palmyra-med palmyra-fin palmyra-creative palmyra-vision

Pricing: how it’s usually calculated

Writer pricing is commonly published as cost per 1M tokens for input and output (and in the case of vision, a per-image fee plus text output tokens). Most teams estimate monthly cost by tracking:

Writer’s docs note that usage info is returned in API responses, and prompt token counts can include tokens used for system prompts. In production, you should store per-request usage for chargeback, cost alerts, and performance optimization.

Example pricing table (from Writer AI Studio docs)

Model Model ID Input (per 1M tokens) Output (per 1M tokens) Typical fit
Palmyra X5 palmyra-x5 $0.60 $6.00 Long-context agents, strong general performance, tool workflows
Palmyra X4 palmyra-x4 $2.50 $10.00 Fast general purpose, tool calling, reliability-focused tasks
Palmyra X 003 Instruct palmyra-x-003-instruct $7.50 $22.50 Instruction-following completions where you want strict structure
Palmyra Med palmyra-med $5.00 $12.00 Healthcare-oriented language tasks (use with domain governance)
Palmyra Fin palmyra-fin $5.00 $12.00 Finance-oriented language tasks (policies, summaries, analysis)
Palmyra Creative palmyra-creative $5.00 $12.00 Creative ideation, copy variants, brand storytelling
Palmyra Vision palmyra-vision $7.50 (text input) $7.50 (text output) + $0.005 / image Image + text tasks (check availability by product surface)

Context windows: why they matter

Context window is the amount of text (and tool context) the model can consider at once. Writer’s model docs list different context sizes, including a very large context window for Palmyra X5. In practical terms, large context helps when:

Even with large context, the best production systems still control prompt growth using summarization, state compaction, and retrieval that is “small but relevant.”

Model selection cheat sheet (pragmatic)

If you’re building a chat agent that must read lots of company docs, start with Palmyra X5 for long context and cost efficiency. If you need a smaller/faster general model and your context is moderate, try Palmyra X4. If you’re doing rigid instruction completion (like “generate this exact JSON schema”), consider an instruct-style model if supported in your workflow. For domain-specific tone and terminology, experiment with Med/Fin/Creative depending on your use case.

4) Chat completions: the core endpoint you’ll use most

The chat completion endpoint is the backbone for most Writer API integrations. It creates responses based on a series of messages (roles + content), and it can support multi-turn conversation where you pass previous messages so the model keeps context.

Endpoint

POST https://api.writer.com/v1/chat

Request shape (high-level)

A typical request includes:

Simple single-turn example (cURL)

curl --location --request POST https://api.writer.com/v1/chat \
  --header "Authorization: Bearer $WRITER_API_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "model": "palmyra-x5",
    "messages": [
      { "role": "user", "content": "Write a memo summarizing this earnings report: <paste text>" }
    ]
  }'

Multi-turn conversations

For multi-turn chat, include prior messages in order (system/developer instructions if you use them, then user/assistant turns). The model uses that history to keep context. In production, keep history under control:

Tool calling inside chat

Tool calling is where chat completions become “agentic.” You provide tools (like translation or web search) and allow the model to choose when to use them. Conceptually:

  1. You include tool definitions and settings in your chat request.
  2. The model decides to call a tool (sometimes with arguments).
  3. Your app runs the tool (or calls Writer’s tool endpoint) and returns results.
  4. The model uses those results to produce a final answer.

Writer supports both prebuilt tools (such as web search) and tool workflows that integrate Knowledge Graph queries. This is one reason “chat completion” is often the only endpoint you need for many features: you can route translation, web search, and graph queries through the same chat interface.

Streaming vs non-streaming

Non-streaming is simplest: you wait for the full response, then return it. Streaming sends partial tokens so your UI can show “typing” and reduce perceived latency. Use streaming for chat UIs, long answers, or agent workflows that take time. Keep in mind that streaming complicates retries (you may receive partial output). In critical workflows, you may prefer non-streaming with robust timeouts and retry logic.

Structured JSON outputs

Many production features need structured output (for example: a list of tasks, an extraction schema, or a normalized classification object). The general approach is: (1) provide a strict JSON schema or template inside the prompt, (2) tell the model “output JSON only,” and (3) validate. If you’re using an SDK integration that supports schema-based parsing, keep a fallback: retry with stricter instructions if parsing fails.

5) Knowledge Graph (graph-based RAG): connect your data to the model

Knowledge Graph is Writer’s platform feature for retrieval-augmented generation (RAG). Instead of relying only on a model’s training data, you ingest your organization’s content into a graph and then query it during chat completions. The goal is to improve factual accuracy, reduce hallucinations, and enable “answer from our docs” experiences.

Conceptual workflow

  1. Create a Knowledge Graph (an empty container with a name and description).
  2. Upload files (PDF, DOCX, PPTX, CSV, HTML, images, etc.) using the File API.
  3. Attach files to the graph (or associate on upload using a query parameter).
  4. Query the graph during chat using a Knowledge Graph tool/workflow so the model can reference the retrieved content.

Create a Knowledge Graph

The docs show a direct endpoint for graph creation. You send a name and description; the API returns a graph ID.

curl --location --request POST https://api.writer.com/v1/graphs \
  --header "Authorization: Bearer $WRITER_API_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "name": "Financial Reports",
    "description": "Knowledge Graph of 2024 financial reports"
  }'

Add a file to the Knowledge Graph

After uploading a file (to get a file ID), you associate it with a graph. The API reference commonly documents endpoints like:

POST /v1/graphs/{graph_id}/file
DELETE /v1/graphs/{graph_id}/file/{file_id}

Knowledge Graph during chat (RAG in practice)

Most teams don’t want “a separate RAG pipeline” and “a separate chat pipeline.” They want one chat call that can: retrieve relevant content, cite it internally, then answer. Writer’s Knowledge Graph chat tool is designed for that. In a typical flow:

Production RAG tips (that actually move metrics)

  • Write a good graph description. Treat it like an instruction: what’s in the graph, and when to use it.
  • Chunking matters. If you control document formatting, add headings, short paragraphs, and consistent structure.
  • Keep retrieved context small. Large dumps reduce answer quality and raise cost. Prefer “few best” passages.
  • Ask for citations. Even if you don’t show them to users, log them for QA and trust evaluation.
  • Measure accuracy. Run evaluation sets: 50–500 real questions, compare answers, and iterate on ingestion.
URLs and web connectors

Writer’s Knowledge Graph docs describe adding URLs as connectors to enhance a graph with website content. In practice, teams often ingest: public documentation pages, help center articles, or internal wiki pages. If you ingest web content, set governance rules: who can add URLs, how often you refresh them, and how you handle removed or changed pages.

6) File API: upload, list, and manage files

Writer’s File API lets you upload, download, list, and delete files in your account. Files can then be used as inputs to Knowledge Graph ingestion or no-code agents, depending on your setup.

Endpoint (upload)

POST https://api.writer.com/v1/files

Supported file formats (examples)

The API reference lists a variety of common formats such as PDF, DOC/DOCX, PPT/PPTX, JPG/PNG, EML, HTML, SRT, CSV, XLS/XLSX. In production, standardize your ingestion formats. If your organization uses many formats, build a preprocessing layer that:

Why “file lifecycle” matters

Files persist in your account until you delete them. That makes file lifecycle a governance and cost topic:

Filtering and listing files

Writer’s changelog notes support for filtering files by type using a file_types query parameter (for example, txt,pdf,docx). This is useful when you want to build a management UI that only shows documents relevant to RAG ingestion.

7) Tools: web search, translation, vision, and deprecations

Tools are what turn a “chat model” into an “agent.” With tools, the model can request external information or transformations, and then use results to produce better final answers.

Web search tool

Writer documents a web search tool endpoint. It accepts a query and can return results with source URLs. This is valuable for:

POST https://api.writer.com/v1/tools/web-search

Translation tool (recommended modern approach)

Writer’s docs describe migrating from a standalone translation endpoint to a translation tool used inside chat completions. The benefit is that translation becomes part of a broader workflow: translate, summarize, extract, and respond in one coherent chat interaction.

Vision / images in chat

Writer documents image support in chat with specific models (for example, image analysis in chat completions being supported with Palmyra X5). In practical terms, you can:

Important: deprecations you should account for

Writer’s API reference includes deprecation notices. For example, some endpoints were marked as deprecated and scheduled for removal on December 22, 2025. Since this guide is labeled “2026,” you should assume deprecated endpoints may no longer be available and migrate to the recommended alternatives (often: prebuilt tools in chat completions).

  • AI detection endpoint /v1/tools/ai-detect was documented with a deprecation notice and removal date.
  • Parse PDF endpoint /v1/tools/pdf-parser/{file_id} was documented with a deprecation notice and removal date.
  • Standalone tool APIs (like web search or translation) may have migration guides that move functionality into chat tool calling.
How to handle tool migrations safely

Treat tool migrations as backwards-incompatible changes. Add a feature flag in your app that switches between old and new paths, run both in parallel for a small percentage of traffic, compare outputs, then flip fully. Keep logs of tool results so you can debug “why did this answer change?” after migration.

8) Rate limits, errors, and reliability

Every production integration should start with reliability. Writer documents rate limits as both requests per minute (RPM) and tokens per minute (TPM). A common published baseline is:

Typical limits (example)

  • RPM: 400 requests/min
  • TPM: 25,000 tokens/min

If you need higher limits, enterprise plans may allow custom quotas via sales.

Why TPM matters more than RPM

If you send huge prompts (long chat history + large retrieved context), you can hit token limits long before request limits. TPM is the real ceiling for “how much work per minute” your app can do.

Retry strategy (safe defaults)

Idempotency and duplicate prevention

If you retry POST requests, you must assume duplicates can happen (especially if the server succeeded but your network dropped). If the API supports idempotency keys, use them. If not, implement your own deduplication:

Error handling

Writer’s API reference includes an “error codes” section with examples (including SDK-based patterns). In your app, you want to log enough detail to debug without storing sensitive user content:

9) Production architecture: patterns that scale

A good Writer API integration is less about “one perfect prompt” and more about an end-to-end system: retrieval + prompt assembly + tool calling + validation + caching + observability. Below are architecture patterns that tend to produce stable, affordable results.

Pattern A: Chat service (thin) + Retrieval service (smart)

Separate responsibilities:

This reduces coupling: if you change ingestion logic (new doc formats, new policies), your chat UI doesn’t break.

Pattern B: “Two-model” pipeline for cost control

Use a cheaper/faster model for routine steps, and a stronger model only when needed:

Even if you only use one model family, you can reduce cost by controlling output length, limiting retrieval size, and summarizing context before handing it to the “final answer” step.

Pattern C: Validation loop for structured outputs

If your app depends on JSON, add a loop:

  1. Request JSON output with a strict schema and an example.
  2. Validate JSON (schema validator).
  3. If invalid: retry once with a “repair” prompt: “Fix the JSON; do not change meanings.”
  4. If still invalid: return a safe fallback or route to a human review queue.

Pattern D: Caching (what to cache, what not to)

Caching is the #1 lever for cost and speed, but you must cache the right things:

Security and prompt-injection basics

Any system that uses user input + tool calling + private docs must assume adversarial behavior. Practical defenses:

Deployment options (enterprise note)

Writer documents managed deployment options (for example, Standard Cloud vs Private Cloud). If you’re operating in a regulated environment, confirm data residency, access controls, and compliance requirements before putting private documents into any RAG system.

10) Writer API FAQs (developer-focused)

Is the Writer API the same as Writer AI Studio?

Writer AI Studio is the platform and UI for building agents, managing keys, observability, and governance. The Writer API is the programmatic interface that lets your application call the same platform capabilities. In practice, you’ll use AI Studio to create API agents/keys and manage permissions, then use the API from your backend services.

How do I get a Writer API key?

You typically create it in Writer AI Studio under Admin Settings → API Keys. Writer’s docs note you can’t view the key after creation, so copy it immediately and store it securely. If you lose it, you generate a new key.

What is the main endpoint I should start with?

Start with POST /v1/chat. It covers standard chat completions and is the foundation for multi-turn conversation and tool calling. Once that works, add files and Knowledge Graph if you need RAG.

What’s the difference between files and Knowledge Graph?

Files are raw uploaded assets stored in your account. Knowledge Graph is a retrieval layer built from files (and potentially URLs) that can be queried during chat completion so the model answers using your documents. Uploading a file alone doesn’t guarantee it will be used for question answering— attaching it to a graph and querying the graph is the typical RAG workflow.

How do I reduce Writer API costs?

The highest-impact moves are: (1) shrink prompts (summarize history, reduce retrieved context), (2) cap output length, (3) cache repeated answers, (4) use a two-step pipeline (cheap classification + targeted strong model only when needed), and (5) measure token usage per endpoint so you optimize based on reality rather than guesses.

What rate limits should I plan for?

Writer documents limits as RPM and TPM. A published baseline example is 400 requests/min and 25,000 tokens/min. Design your system to back off on 429 responses and to avoid token spikes by controlling prompt size.

Can I do web search and translation inside chat?

Yes. Writer documents a web search tool endpoint and also describes translation as a tool used within chat completions (with migration guides from standalone endpoints). Tool calling lets the model decide when to use these capabilities as part of one coherent workflow.

Are there deprecated endpoints I should avoid in 2026?

The API reference includes deprecation notices for some tools (for example, AI detection and a parse-PDF endpoint) with a removal date of December 22, 2025. In a 2026 build, you should assume those endpoints may be unavailable and follow the migration guidance toward tool-based workflows inside chat completions.

Should I call Writer API directly from the browser?

No. Your API key must remain secret. Put Writer calls behind your own backend (or serverless function), implement your own authentication for users, and enforce per-user and per-org quotas to prevent abuse.

How do I keep answers grounded in my docs?

Use Knowledge Graph for retrieval, keep the retrieved context tight, instruct the model to only answer using retrieved content when required, and log which files/graph nodes were used for each response. Then run evaluation sets with real questions to measure accuracy.

References (official docs)

These are the most useful official pages to keep open while building. (Links are included as plain URLs so you can copy/paste.)

Topic URL Why it matters
AI Studio Introduction https://dev.writer.com/home/introduction Platform overview and navigation
Quickstart https://dev.writer.com/home/quickstart First API call, how to create keys
API Keys guide https://dev.writer.com/api-reference/api-keys Bearer auth, API agents, permissions
Chat completion API reference https://dev.writer.com/api-reference/completion-api/chat-completion Request/response format for /v1/chat
Models https://dev.writer.com/home/models Model IDs, context windows, capabilities
Pricing https://dev.writer.com/home/pricing Per-token costs and tool pricing
Knowledge Graph guide https://dev.writer.com/home/knowledge-graph Create graphs and manage ingestion
Files guide https://dev.writer.com/home/files Upload/list/delete files
Web search tool API reference https://dev.writer.com/api-reference/tool-api/web-search Ground answers with web results
Rate limits https://dev.writer.com/api-reference/rate-limits RPM/TPM quotas and best practices
Changelog https://dev.writer.com/home/changelog Keep up with endpoint/SDK changes
Usage policy https://dev.writer.com/home/usage-policy Rules, key security expectations