Harvey AI API - Complete Developer Guide for Legal Workflows

Harvey is an AI platform built for legal and professional services teams. Its API suite is designed for secure, scalable automation: embed legal reasoning into internal apps, manage document libraries through Vault, ground answers in firm knowledge, export usage and query history, pull audit logs for compliance, and map activity to client matters for attribution.

Bearer-token auth
Region endpoints (US / EU / AU)
Assistant Completion endpoint
Vault document APIs + Vault RAG grounding
Audit Logs + History Exports
Client Matters attribution

What this page covers

This is a practical, “ship it” guide. It explains Harvey API access and authentication, the main endpoint families, how Vault grounding and citations work, how to design reliable production integrations, and what to consider for legal-grade security, auditability, and usage governance.

Educational note: This page is an independent guide. Always confirm contractual/feature availability in your order form and with Harvey support before relying on any workflow in production.

1) Overview: what the Harvey AI API is

The Harvey AI API is a set of endpoints that let organizations integrate Harvey’s legal-focused AI capabilities into their own systems and workflows—rather than requiring staff to work only inside a standalone web app. Think of it as “legal intelligence as a service” with enterprise controls: your systems can send prompts, attach documents, request grounded/cited answers, and record activity in ways that support compliance and governance.

Unlike generic LLM endpoints, Harvey’s API ecosystem is organized around the realities of legal work: document libraries, matter attribution, audit logs, and structured exports for leadership reporting. It supports:

Important: Harvey API access is typically provisioned for organizations and may depend on your agreement. Some capabilities require an additional purchase and are not necessarily enabled by default.

2) Access & provisioning: what you need before you code

In many developer platforms you can sign up, create an API key, and immediately start making requests. Harvey’s model is closer to enterprise provisioning: tokens and feature scopes are tied to your organization’s agreement, and API usage is expected to align with internal governance (especially in regulated legal contexts).

Typical prerequisites

When the API is the right fit

Use the API if you want to embed Harvey into tools your teams already use—like internal portals, matter management systems, DMS workflows, or review pipelines—while centralizing governance.

When the API might not be necessary

If your team only needs interactive use in the UI (no custom integration, no ETL ingestion, no internal system embedding), the web app alone may cover your needs.

Implementation hint: treat this like an enterprise integration. Plan for least-privilege tokens, centralized logging, a staging environment, and “break-glass” procedures for credential rotation.

3) Authentication & region endpoints

Harvey’s API uses bearer token authentication. You include your token in the Authorization header of every request. Keep tokens server-side only (never in browser JS), rotate them on a schedule, and redact them from logs and screenshots.

Base URLs (region-aware)

Harvey supports region-specific API endpoints for organizations on EU-hosted or AU-hosted deployments. The primary base URL is the US endpoint, with alternates for EU and AU.

Region / Deployment Base URL When to use
US-hosted (default) https://api.harvey.ai Most organizations unless contract specifies EU/AU hosting
EU-hosted https://eu.api.harvey.ai When your organization is provisioned on the EU deployment
AU-hosted https://au.api.harvey.ai When your organization is provisioned on the AU deployment

Auth header pattern

Authorization: Bearer YOUR_TOKEN_HERE

Quick token validation (whoami)

A practical first call is the “whoami” endpoint to verify that authentication works and to identify which service user is associated with a token (useful when investigating audit logs).

GET https://api.harvey.ai/api/whoami
Authorization: Bearer YOUR_TOKEN_HERE
Security reminder: treat tokens like passwords. Do not commit them to repositories, paste them into tickets, or store them in plaintext. Use a secrets manager (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, or equivalent).

4) Rate limits (per organization, reset every minute)

Harvey applies rate limits per organization and resets counters every minute. You should design clients to handle 429 Too Many Requests gracefully with exponential backoff and jitter, and prefer batching operations where possible.

Endpoint category Limit (requests/minute) Typical workload
Assistant Completion 20 Prompting, drafting, analysis calls
Vault APIs 10 Uploads, metadata, deletes, project operations
Audit Logs 60 Compliance/monitoring fetches
History Exports 60 Usage/query exports for reporting and forensics
Client Matters 150 Bulk onboarding / attribution updates

Recommended client behavior

5) Endpoint map: what exists in the Harvey API

At a high level, Harvey’s API suite can be grouped into five families:

Assistant

The Completion endpoint for asking questions, drafting, and analyzing—optionally grounded in documents and accompanied by citations.

Vault

Manage document projects/knowledge bases, upload and delete files, preserve folder structure, and retrieve metadata for review workflows.

History Exports

Export usage history (high-level metadata) and query history (for deeper reviews where allowed) to support adoption monitoring, reporting, and investigations.

Audit Logs

Query audit logs by timestamp or ID, retrieve earliest/latest entries, and paginate for continuous compliance capture.

Client Matters

Create, retrieve, and delete client-matter mappings to attribute usage and support access or scope-based controls in downstream reporting.

Cross-cutting controls

Auth, region endpoints, and rate limit behavior apply across the entire API surface.

In the sections below, we’ll go endpoint family by endpoint family and highlight the core request/response structure, practical use cases, and production patterns.

6) Quickstart: your first successful Harvey API call

A “first call” should be safe, deterministic, and easy to troubleshoot. The usual sequence is:

  1. Verify your token works with /api/whoami.
  2. Make a basic Completion request with a short prompt (no files, no knowledge sources).
  3. Add streaming only after non-streaming works (it’s easier to debug).
  4. Then add grounding (Vault or web) and confirm citations are returned as expected.

Step 1 — Confirm identity

curl -X GET "https://api.harvey.ai/api/whoami" \
  -H "Authorization: Bearer YOUR_TOKEN_HERE"

Step 2 — Simple Completion request

The Completion endpoint uses multipart/form-data for requests. You’ll send the prompt and options as form fields.

curl --request POST \
  --url https://api.harvey.ai/api/v2/completion \
  --header "Authorization: Bearer YOUR_TOKEN_HERE" \
  --header "Content-Type: multipart/form-data" \
  --form "prompt=Summarize the practical difference between indemnity and limitation of liability in plain English." \
  --form "stream=false" \
  --form "mode=draft"
Tip: Start with small prompts and build up. Once you confirm success, gradually add grounding, client matter IDs, and streaming—one change at a time.

7) Assistant API: Completion endpoint

The Assistant API is centered on a single main endpoint: POST /api/v2/completion. This call supports freeform legal Q&A, document analysis, and drafting, and it can optionally return citations when grounded sources are provided.

Key request fields (high level)

Understanding citations

When citations are enabled (default), Harvey can return a response_with_citations field that includes inline citation markers like [1], plus a sources array with snippets and page references when documents/knowledge sources were provided.

For legal teams, this is a big deal: it supports a “trustable UI” where attorneys can review the grounded source excerpt and confirm that a claim is supported.

Draft vs assist (practical interpretation)

Mode selection is about how you want the output shaped:

Streaming vs non-streaming

Streaming can improve perceived latency in interactive UIs. Non-streaming is simpler for batch jobs, ETL tasks, or workflows where you store a complete result and then run post-processing.

Use streaming when

You are building a chat or drafting UI, and users benefit from seeing output immediately. Make sure your UI can handle partial updates and cancellation.

Use non-streaming when

You need predictable outputs for ingestion pipelines, you’re writing results to a database, or you need to run validations after completion.

Error handling patterns

The Completion endpoint uses standard HTTP status codes. For robust integrations:

Design principle: in legal workflows, correctness and auditability matter more than raw speed. Prefer deterministic logging, controlled rollout, and a good human review UX.

8) Vault APIs: secure document projects, uploads, metadata, deletes

Vault is the API family for managing documents in structured “projects” (and knowledge bases), enabling your organization to ingest files from existing systems and then analyze them within Harvey. The Vault endpoints are designed for integrations with document management systems (DMS), contract lifecycle management (CLM) tools, file storage platforms, and internal ETL pipelines.

Common Vault operations

Vault endpoints (examples)

GET    /api/v1/vault/workspace/projects
POST   /api/v1/vault/upload_files/{project_id}
GET    /api/v1/vault/get_metadata/{project_id}
DELETE /api/v1/vault/delete_file/{file_id}
DELETE /api/v1/vault/delete_project/{project_id}

Practical use cases

Best practices for Vault structure

9) Vault RAG grounding: ask questions using Vault as a knowledge source

A powerful Harvey pattern is to ground completions in Vault projects and files using a knowledge_sources array. This enables “Vault RAG” (retrieval-augmented generation): the system can reference relevant documents and return citations that point back to specific snippets and pages.

How it works conceptually

  1. You ingest documents into Vault projects (or use existing knowledge base projects).
  2. You call /api/v2/completion with knowledge_sources set to Vault.
  3. The system grounds the response in the documents and returns citations when enabled.
  4. Your UI shows a “citations panel” so users can verify sources before using output.

Vault grounding example (knowledge_sources)

Note: knowledge_sources is passed as a JSON-encoded string in the form-data request.

curl --request POST \
  --url https://api.harvey.ai/api/v2/completion \
  --header "Authorization: Bearer YOUR_TOKEN_HERE" \
  --header "Content-Type: multipart/form-data" \
  --form "prompt=Summarize these documents and flag any non-standard indemnity language." \
  --form 'knowledge_sources=[{"type":"vault","folder_id":"YOUR_VAULT_PROJECT_ID","file_ids":["FILE_ID_1","FILE_ID_2"]}]' \
  --form "stream=false" \
  --form "mode=assist"

Vault vs direct file uploads

You can either upload files directly on a completion request (file field) or you can ground via Vault (knowledge_sources). The difference is operational:

Compatibility constraint: in the Completion API, direct file uploads cannot be used together with knowledge_sources. Choose one approach per request.

Web grounding

Harvey’s Completion API also supports a web knowledge source type. In practice, organizations should treat web grounding carefully in legal contexts: define when it’s allowed, capture citations, and require human review.

--form 'knowledge_sources=[{"type":"web"}]'

Controlling citations for speed

The Completion endpoint supports an include_citations parameter (defaults to true). Disabling citations can return results faster, but it reduces verifiability—which is often undesirable for legal work.

POST /api/v2/completion?include_citations=false

10) History Exports: usage history and query history

Harvey’s History Export APIs are designed to help organizations understand how the platform is being used, monitor adoption, support leadership reporting, and investigate questions about activity patterns.

Two key export types

Endpoints

GET /api/v1/history/usage
GET /api/v1/history/query

Usage history: what it’s for (and what it’s not)

Usage history supports programmatic reporting and oversight. It’s helpful for answering questions like:

Usage exports are generally built to avoid exposing sensitive content. Instead, they provide operational metadata that helps governance teams understand patterns without turning the export into a “content leakage” vector.

Scheduling export jobs

11) Audit Logs: compliance monitoring and incident response

Audit logs are the “paper trail” for activity in a workspace: they enable compliance teams to monitor actions, investigate incidents, and maintain an auditable record. Harvey provides endpoints to:

Endpoints (common)

GET /api/v1/logs/audit/search
GET /api/v1/logs/audit/earliest
GET /api/v1/logs/audit/latest
GET /api/v1/logs/audit

Timestamp-based search

A common pattern is to start from a time boundary (e.g., “start of day UTC”) and then paginate through results.

curl -X GET "https://api.harvey.ai/api/v1/logs/audit/search?time=1712066546" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Earliest/latest for bootstrapping

For initial setup, retrieve earliest to backfill from the beginning, or retrieve latest to begin near the most recent events.

curl -X GET "https://api.harvey.ai/api/v1/logs/audit/earliest" \
  -H "Authorization: Bearer YOUR_API_KEY"

curl -X GET "https://api.harvey.ai/api/v1/logs/audit/latest" \
  -H "Authorization: Bearer YOUR_API_KEY"

Operational best practice: build a “collector”

Treat audit logs like a compliance feed:

12) Client Matters: attribution and scope-based controls

Client matters are a foundational concept for many legal organizations: work is tracked by billing codes, engagement IDs, or internal project identifiers. Harvey’s Client Matter API lets you programmatically create, retrieve, and remove these associations so that usage and queries can be attributed accurately.

Endpoints

POST   /api/v1/client_matters
GET    /api/v1/client_matters
DELETE /api/v1/client_matters

Add or update client matters (bulk-friendly)

curl -X POST https://api.harvey.ai/api/v1/client_matters \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "client_matters": [
      { "cm_name": "123-45", "cm_desc": "Acme Corp Bankruptcy", "cm_allowed": "true" },
      { "cm_name": "M-2025-0094", "cm_desc": "Example Engagement", "cm_allowed": "false" }
    ]
  }'

How “allowed” flags can help governance

Many organizations want to restrict which matters can be used for certain workflows (or to prevent accidental attribution to a deprecated matter code). An allow/deny flag helps enforce “only approved matters” downstream.

Recommended matter design

Attribution is a legal ops superpower: when usage is cleanly mapped to matters, you can report ROI, align costs, answer client questions, and support audits with far less manual work.

13) Production architecture: a practical reference design

Legal-grade integrations tend to fail for boring reasons: concurrency spikes, missing audit trails, leaked tokens, unclear governance, or brittle ETL. Here is a battle-tested approach to reduce operational risk.

A simple architecture that scales

  1. Backend API gateway: your app talks to your server, not directly to Harvey.
  2. Request queue: all Completion calls go through a queue with concurrency caps to respect rate limits.
  3. Secrets manager: tokens are stored and rotated centrally.
  4. Vault ingestion pipeline: documents are uploaded to Vault with consistent paths and tracked file IDs.
  5. Observability & logging: store request metadata, response IDs, and error outcomes without leaking sensitive content.
  6. Audit collector: scheduled job pulls audit logs into your SIEM.
  7. History exporter: scheduled job exports usage/query history into BI/reporting.

Why you want a queue even at small scale

The Completion endpoint has a relatively modest per-minute limit. A single busy team can exceed it if every UI action becomes a new request. A queue makes throughput predictable, provides retry control, and gives you a place to implement “priority” (e.g., interactive drafting gets priority over batch summarization).

Designing a trustworthy legal UI

14) Security & compliance considerations

Harvey’s security posture is a core reason legal organizations adopt it: the platform emphasizes encryption and access controls, and (by default) states it does not train on customer data. It also references annual SOC 2 Type II and ISO 27001 audits/certifications in its security materials and trust center.

Practical security checklist for your implementation

Region hosting alignment

If your organization is on EU-hosted or AU-hosted deployments, enforce the correct base URL at the configuration level (not per request) to reduce the chance of accidental cross-region calls.

Legal-grade governance: Security isn’t only technical. You also need policy: which workflows are allowed, when web grounding is acceptable, what review steps are required, and how to respond to incidents.

15) Troubleshooting: common issues and fixes

401 Unauthorized

400 Bad Request

429 Too Many Requests

Vault uploads feel slow or fragile

16) Frequently asked questions

In practice, API access is typically provisioned for organizations, and tokens are obtained through your account/customer success channel. Feature availability can be tied to your agreement, and some endpoints (such as Completion) may require additional purchase depending on your contract.
Call the whoami endpoint: GET https://api.harvey.ai/api/whoami with your bearer token. It returns the underlying service user associated with the token, which is also useful for audit investigations.
For legal workflows, citations are often the point: they support verification and review. If you disable citations you may gain speed, but you lose the strongest UX pattern for trust. Many teams keep citations on by default and only disable them for low-risk internal automation.
Vault is usually better for repeatable work: it supports project organization, tracking file IDs, and reuse across workflows. Direct file attachments are convenient for one-off analysis but can be harder to manage over time.
Implement a server-side queue with concurrency caps and retries. Use exponential backoff when you get 429 responses, and schedule exports (history/audit) instead of running them in response to interactive UI events.
Show a draft answer plus a citations panel. Let users click a citation to see the quoted snippet and page number. Encourage editing and review before anything goes to a client or becomes part of work product.