Harvey AI API - Complete Developer Guide for Legal Workflows

Harvey is an AI platform built for legal and professional services teams. Its API suite is designed for secure, scalable automation: embed legal reasoning into internal apps, manage document libraries through Vault, ground answers in firm knowledge, export usage and query history, pull audit logs for compliance, and map activity to client matters for attribution.

Bearer-token auth

Region endpoints (US / EU / AU)

Assistant Completion endpoint

Vault document APIs + Vault RAG grounding

Audit Logs + History Exports

Client Matters attribution

What this page covers

This is a practical, “ship it” guide. It explains Harvey API access and authentication, the main endpoint families, how Vault grounding and citations work, how to design reliable production integrations, and what to consider for legal-grade security, auditability, and usage governance.

Jump to Quickstart See Endpoints Map Production Architecture

Educational note: This page is an independent guide. Always confirm contractual/feature availability in your order form and with Harvey support before relying on any workflow in production.

Overview What Harvey API is for Access & provisioning What you need to get started Authentication & regions Bearer tokens + base URLs Rate limits Per-org limits by endpoint Assistant API Completion, streaming, citations Vault APIs Projects, uploads, metadata, deletes Vault RAG grounding knowledge_sources & citations History Exports Usage + query history Audit Logs Search, earliest/latest, pagination Client Matters Attribution & access control Security & compliance Encryption, training defaults, audits FAQs Common implementation questions

1) Overview: what the Harvey AI API is

The Harvey AI API is a set of endpoints that let organizations integrate Harvey’s legal-focused AI capabilities into their own systems and workflows—rather than requiring staff to work only inside a standalone web app. Think of it as “legal intelligence as a service” with enterprise controls: your systems can send prompts, attach documents, request grounded/cited answers, and record activity in ways that support compliance and governance.

Unlike generic LLM endpoints, Harvey’s API ecosystem is organized around the realities of legal work: document libraries, matter attribution, audit logs, and structured exports for leadership reporting. It supports:

Assistant completion for legal reasoning, drafting, analysis, and Q&A.
Vault APIs to upload, organize, and manage documents in projects/knowledge bases.
Grounding via Vault RAG (retrieve and cite internal documents) and optional web sources.
History exports for usage reporting and query forensics.
Audit logs for workspace activity monitoring and incident response.
Client matters to align usage with billing codes, projects, or engagements.

Important: Harvey API access is typically provisioned for organizations and may depend on your agreement. Some capabilities require an additional purchase and are not necessarily enabled by default.

2) Access & provisioning: what you need before you code

In many developer platforms you can sign up, create an API key, and immediately start making requests. Harvey’s model is closer to enterprise provisioning: tokens and feature scopes are tied to your organization’s agreement, and API usage is expected to align with internal governance (especially in regulated legal contexts).

Typical prerequisites

An enterprise relationship (or equivalent organizational access) with Harvey.
Confirmation of enabled features in your order form (e.g., Completion, Vault, exports).
An organization token (bearer token) obtained via your Harvey customer success/account team.
Security review readiness (vendor assessments, trust center docs, data handling alignment).
Integration plan for your DMS/CLM/storage systems if you’ll use Vault.

When the API is the right fit

Use the API if you want to embed Harvey into tools your teams already use—like internal portals, matter management systems, DMS workflows, or review pipelines—while centralizing governance.

When the API might not be necessary

If your team only needs interactive use in the UI (no custom integration, no ETL ingestion, no internal system embedding), the web app alone may cover your needs.

Implementation hint: treat this like an enterprise integration. Plan for least-privilege tokens, centralized logging, a staging environment, and “break-glass” procedures for credential rotation.

3) Authentication & region endpoints

Harvey’s API uses bearer token authentication. You include your token in the Authorization header of every request. Keep tokens server-side only (never in browser JS), rotate them on a schedule, and redact them from logs and screenshots.

Base URLs (region-aware)

Harvey supports region-specific API endpoints for organizations on EU-hosted or AU-hosted deployments. The primary base URL is the US endpoint, with alternates for EU and AU.

Region / Deployment	Base URL	When to use
US-hosted (default)	`https://api.harvey.ai`	Most organizations unless contract specifies EU/AU hosting
EU-hosted	`https://eu.api.harvey.ai`	When your organization is provisioned on the EU deployment
AU-hosted	`https://au.api.harvey.ai`	When your organization is provisioned on the AU deployment

Auth header pattern

Authorization: Bearer YOUR_TOKEN_HERE

Quick token validation (whoami)

A practical first call is the “whoami” endpoint to verify that authentication works and to identify which service user is associated with a token (useful when investigating audit logs).

GET https://api.harvey.ai/api/whoami
Authorization: Bearer YOUR_TOKEN_HERE

Security reminder: treat tokens like passwords. Do not commit them to repositories, paste them into tickets, or store them in plaintext. Use a secrets manager (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, or equivalent).

4) Rate limits (per organization, reset every minute)

Harvey applies rate limits per organization and resets counters every minute. You should design clients to handle 429 Too Many Requests gracefully with exponential backoff and jitter, and prefer batching operations where possible.

Endpoint category	Limit (requests/minute)	Typical workload
Assistant Completion	20	Prompting, drafting, analysis calls
Vault APIs	10	Uploads, metadata, deletes, project operations
Audit Logs	60	Compliance/monitoring fetches
History Exports	60	Usage/query exports for reporting and forensics
Client Matters	150	Bulk onboarding / attribution updates

Recommended client behavior

Implement retries with backoff for 429 and transient 5xx errors.
Use request queues and concurrency caps per endpoint family.
Prefer idempotent designs: retries should not create duplicates or corrupt state.
Make exports on a schedule (e.g., hourly/daily) rather than “per user action.”

5) Endpoint map: what exists in the Harvey API

At a high level, Harvey’s API suite can be grouped into five families:

Assistant

The Completion endpoint for asking questions, drafting, and analyzing—optionally grounded in documents and accompanied by citations.

Vault

Manage document projects/knowledge bases, upload and delete files, preserve folder structure, and retrieve metadata for review workflows.

History Exports

Export usage history (high-level metadata) and query history (for deeper reviews where allowed) to support adoption monitoring, reporting, and investigations.

Audit Logs

Query audit logs by timestamp or ID, retrieve earliest/latest entries, and paginate for continuous compliance capture.

Client Matters

Create, retrieve, and delete client-matter mappings to attribute usage and support access or scope-based controls in downstream reporting.

Cross-cutting controls

Auth, region endpoints, and rate limit behavior apply across the entire API surface.

In the sections below, we’ll go endpoint family by endpoint family and highlight the core request/response structure, practical use cases, and production patterns.

6) Quickstart: your first successful Harvey API call

A “first call” should be safe, deterministic, and easy to troubleshoot. The usual sequence is:

Verify your token works with /api/whoami.
Make a basic Completion request with a short prompt (no files, no knowledge sources).
Add streaming only after non-streaming works (it’s easier to debug).
Then add grounding (Vault or web) and confirm citations are returned as expected.

Step 1 — Confirm identity

curl -X GET "https://api.harvey.ai/api/whoami" \
  -H "Authorization: Bearer YOUR_TOKEN_HERE"

Step 2 — Simple Completion request

The Completion endpoint uses multipart/form-data for requests. You’ll send the prompt and options as form fields.

curl --request POST \
  --url https://api.harvey.ai/api/v2/completion \
  --header "Authorization: Bearer YOUR_TOKEN_HERE" \
  --header "Content-Type: multipart/form-data" \
  --form "prompt=Summarize the practical difference between indemnity and limitation of liability in plain English." \
  --form "stream=false" \
  --form "mode=draft"

Tip: Start with small prompts and build up. Once you confirm success, gradually add grounding, client matter IDs, and streaming—one change at a time.

7) Assistant API: Completion endpoint

The Assistant API is centered on a single main endpoint: POST /api/v2/completion. This call supports freeform legal Q&A, document analysis, and drafting, and it can optionally return citations when grounded sources are provided.

Key request fields (high level)

prompt: your question or drafting instruction (up to 20,000 characters).
mode: typically draft or assist depending on desired behavior.
stream: true for incremental output, false for full response.
client_matter_id: associate a completion with a specific client matter (optional).
knowledge_sources: JSON-encoded array to ground answers (Vault and/or web).
file: attach files directly (cannot be used together with knowledge_sources).
include_citations: query parameter (defaults to true) controlling citation generation speed.

Understanding citations

When citations are enabled (default), Harvey can return a response_with_citations field that includes inline citation markers like [1], plus a sources array with snippets and page references when documents/knowledge sources were provided.

For legal teams, this is a big deal: it supports a “trustable UI” where attorneys can review the grounded source excerpt and confirm that a claim is supported.

Draft vs assist (practical interpretation)

Mode selection is about how you want the output shaped:

draft: produce polished, client-ready prose (emails, clauses, summaries) that you can edit.
assist: produce more direct “analysis/help” answers for internal use or chat-style flows.

Streaming vs non-streaming

Streaming can improve perceived latency in interactive UIs. Non-streaming is simpler for batch jobs, ETL tasks, or workflows where you store a complete result and then run post-processing.

Use streaming when

You are building a chat or drafting UI, and users benefit from seeing output immediately. Make sure your UI can handle partial updates and cancellation.

Use non-streaming when

You need predictable outputs for ingestion pipelines, you’re writing results to a database, or you need to run validations after completion.

Error handling patterns

The Completion endpoint uses standard HTTP status codes. For robust integrations:

400: validate parameters and payload shape; surface actionable messages for developers.
401: invalid or missing token; rotate/verify secrets; confirm environment base URL.
429: backoff and retry; add queueing; reduce concurrency.
5xx: retry with backoff; log correlation IDs if provided; contact support if persistent.

Design principle: in legal workflows, correctness and auditability matter more than raw speed. Prefer deterministic logging, controlled rollout, and a good human review UX.

8) Vault APIs: secure document projects, uploads, metadata, deletes

Vault is the API family for managing documents in structured “projects” (and knowledge bases), enabling your organization to ingest files from existing systems and then analyze them within Harvey. The Vault endpoints are designed for integrations with document management systems (DMS), contract lifecycle management (CLM) tools, file storage platforms, and internal ETL pipelines.

Common Vault operations

List projects: discover what projects/knowledge bases exist in the workspace.
Create project: set up a new container for a deal, matter, client, or knowledge domain.
Upload files: ingest documents while preserving folder paths for organization.
Get metadata: retrieve file IDs, names, sizes, and other details for tracking and review.
Delete file: remove outdated or erroneous documents.
Delete project: remove an entire project and its contents (high impact; handle carefully).

Vault endpoints (examples)

GET    /api/v1/vault/workspace/projects
POST   /api/v1/vault/upload_files/{project_id}
GET    /api/v1/vault/get_metadata/{project_id}
DELETE /api/v1/vault/delete_file/{file_id}
DELETE /api/v1/vault/delete_project/{project_id}

Practical use cases

Secure document ingestion from iManage/NetDocuments/SharePoint-like systems into a deal project.
Bulk due diligence review by uploading a data room export and then asking targeted questions.
Policy and playbook grounding by maintaining a “knowledge base project” with canonical templates.
Automated clean-up for incorrect uploads, duplicates, or time-limited engagements.

Best practices for Vault structure

Organize by project: one project per deal/matter/client engagement when possible.
Preserve paths: keep folder structures consistent across uploads; it helps review workflows.
Track file IDs: persist file IDs in your database so you can reference them later for grounding.
Confirm destructive actions: require explicit approval before deleting projects or large sets of files.

9) Vault RAG grounding: ask questions using Vault as a knowledge source

A powerful Harvey pattern is to ground completions in Vault projects and files using a knowledge_sources array. This enables “Vault RAG” (retrieval-augmented generation): the system can reference relevant documents and return citations that point back to specific snippets and pages.

How it works conceptually

You ingest documents into Vault projects (or use existing knowledge base projects).
You call /api/v2/completion with knowledge_sources set to Vault.
The system grounds the response in the documents and returns citations when enabled.
Your UI shows a “citations panel” so users can verify sources before using output.

Vault grounding example (knowledge_sources)

Note: knowledge_sources is passed as a JSON-encoded string in the form-data request.

curl --request POST \
  --url https://api.harvey.ai/api/v2/completion \
  --header "Authorization: Bearer YOUR_TOKEN_HERE" \
  --header "Content-Type: multipart/form-data" \
  --form "prompt=Summarize these documents and flag any non-standard indemnity language." \
  --form 'knowledge_sources=[{"type":"vault","folder_id":"YOUR_VAULT_PROJECT_ID","file_ids":["FILE_ID_1","FILE_ID_2"]}]' \
  --form "stream=false" \
  --form "mode=assist"

Vault vs direct file uploads

You can either upload files directly on a completion request (file field) or you can ground via Vault (knowledge_sources). The difference is operational:

Direct file upload is convenient for one-off analysis, but can be harder to track over time.
Vault grounding is better for repeatable workflows, shared projects, and audit-friendly reuse.

Compatibility constraint: in the Completion API, direct file uploads cannot be used together with knowledge_sources. Choose one approach per request.

Web grounding

Harvey’s Completion API also supports a web knowledge source type. In practice, organizations should treat web grounding carefully in legal contexts: define when it’s allowed, capture citations, and require human review.

--form 'knowledge_sources=[{"type":"web"}]'

Controlling citations for speed

The Completion endpoint supports an include_citations parameter (defaults to true). Disabling citations can return results faster, but it reduces verifiability—which is often undesirable for legal work.

POST /api/v2/completion?include_citations=false

10) History Exports: usage history and query history

Harvey’s History Export APIs are designed to help organizations understand how the platform is being used, monitor adoption, support leadership reporting, and investigate questions about activity patterns.

Two key export types

Usage history: metadata over a time range (user, timestamps, event type), designed not to include sensitive inputs/outputs.
Query history: for deeper analysis of queries and sources (availability and detail may depend on your permissions and configuration).

Endpoints

GET /api/v1/history/usage
GET /api/v1/history/query

Usage history: what it’s for (and what it’s not)

Usage history supports programmatic reporting and oversight. It’s helpful for answering questions like:

Are we seeing adoption across teams?
Which product areas (Assist vs Draft, files vs web) are most used?
Are certain departments hitting rate limits more often than others?
How does usage map to client matters for billing attribution?

Usage exports are generally built to avoid exposing sensitive content. Instead, they provide operational metadata that helps governance teams understand patterns without turning the export into a “content leakage” vector.

Scheduling export jobs

Weekly/monthly reporting: run scheduled jobs that load exports into your BI tool.
Compliance and investigations: run targeted exports for specific time ranges.
Continuous monitoring: fetch increments (e.g., every 15 minutes) and store in your SIEM/data lake.

11) Audit Logs: compliance monitoring and incident response

Audit logs are the “paper trail” for activity in a workspace: they enable compliance teams to monitor actions, investigate incidents, and maintain an auditable record. Harvey provides endpoints to:

Search logs starting at a timestamp
Retrieve the earliest log
Retrieve the latest log
Query/paginate through logs over time

Endpoints (common)

GET /api/v1/logs/audit/search
GET /api/v1/logs/audit/earliest
GET /api/v1/logs/audit/latest
GET /api/v1/logs/audit

Timestamp-based search

A common pattern is to start from a time boundary (e.g., “start of day UTC”) and then paginate through results.

curl -X GET "https://api.harvey.ai/api/v1/logs/audit/search?time=1712066546" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Earliest/latest for bootstrapping

For initial setup, retrieve earliest to backfill from the beginning, or retrieve latest to begin near the most recent events.

curl -X GET "https://api.harvey.ai/api/v1/logs/audit/earliest" \
  -H "Authorization: Bearer YOUR_API_KEY"

curl -X GET "https://api.harvey.ai/api/v1/logs/audit/latest" \
  -H "Authorization: Bearer YOUR_API_KEY"

Operational best practice: build a “collector”

Treat audit logs like a compliance feed:

Run a scheduled collector (e.g., every 5–15 minutes).
Store logs immutably (append-only) in your data lake or SIEM.
Track checkpoint state so you can resume after outages (by timestamp or log ID).
Alert on abnormal activity (unusual login patterns, bulk exports, admin changes).

12) Client Matters: attribution and scope-based controls

Client matters are a foundational concept for many legal organizations: work is tracked by billing codes, engagement IDs, or internal project identifiers. Harvey’s Client Matter API lets you programmatically create, retrieve, and remove these associations so that usage and queries can be attributed accurately.

Endpoints

POST   /api/v1/client_matters
GET    /api/v1/client_matters
DELETE /api/v1/client_matters

Add or update client matters (bulk-friendly)

curl -X POST https://api.harvey.ai/api/v1/client_matters \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "client_matters": [
      { "cm_name": "123-45", "cm_desc": "Acme Corp Bankruptcy", "cm_allowed": "true" },
      { "cm_name": "M-2025-0094", "cm_desc": "Example Engagement", "cm_allowed": "false" }
    ]
  }'

How “allowed” flags can help governance

Many organizations want to restrict which matters can be used for certain workflows (or to prevent accidental attribution to a deprecated matter code). An allow/deny flag helps enforce “only approved matters” downstream.

Recommended matter design

Use a stable ID scheme that maps cleanly to billing systems (avoid “friendly names” that change).
Store matter metadata in your own system-of-record and sync to Harvey on a schedule or via events.
Automate deactivation for closed matters and require manual approval to reactivate.

Attribution is a legal ops superpower: when usage is cleanly mapped to matters, you can report ROI, align costs, answer client questions, and support audits with far less manual work.

13) Production architecture: a practical reference design

Legal-grade integrations tend to fail for boring reasons: concurrency spikes, missing audit trails, leaked tokens, unclear governance, or brittle ETL. Here is a battle-tested approach to reduce operational risk.

A simple architecture that scales

Backend API gateway: your app talks to your server, not directly to Harvey.
Request queue: all Completion calls go through a queue with concurrency caps to respect rate limits.
Secrets manager: tokens are stored and rotated centrally.
Vault ingestion pipeline: documents are uploaded to Vault with consistent paths and tracked file IDs.
Observability & logging: store request metadata, response IDs, and error outcomes without leaking sensitive content.
Audit collector: scheduled job pulls audit logs into your SIEM.
History exporter: scheduled job exports usage/query history into BI/reporting.

Why you want a queue even at small scale

The Completion endpoint has a relatively modest per-minute limit. A single busy team can exceed it if every UI action becomes a new request. A queue makes throughput predictable, provides retry control, and gives you a place to implement “priority” (e.g., interactive drafting gets priority over batch summarization).

Designing a trustworthy legal UI

Default to citations for any workflow that could influence legal advice, drafting, or client communications.
Show sources next to claims (click-to-expand snippet and page reference).
Log who requested what (user identity, timestamp, matter ID, project ID) for accountability.
Encourage review: present results as “draft / suggestion,” not as final authority.
Provide a “copy with citations” option so users can paste into memos with supporting references.

14) Security & compliance considerations

Harvey’s security posture is a core reason legal organizations adopt it: the platform emphasizes encryption and access controls, and (by default) states it does not train on customer data. It also references annual SOC 2 Type II and ISO 27001 audits/certifications in its security materials and trust center.

Practical security checklist for your implementation

Keep tokens server-side and use short-lived session tokens for your own app where possible.
Encrypt sensitive data in transit and at rest in your own systems, too.
Minimize data sent in prompts (avoid unnecessary personal data; prefer document grounding in Vault).
Implement access controls so only authorized users can query a given Vault project/matter.
Capture an audit trail in your app: who requested it, when, what matter, what documents.
Plan for retention: define how long prompts/outputs/logs should be kept and where.

Region hosting alignment

If your organization is on EU-hosted or AU-hosted deployments, enforce the correct base URL at the configuration level (not per request) to reduce the chance of accidental cross-region calls.

Legal-grade governance: Security isn’t only technical. You also need policy: which workflows are allowed, when web grounding is acceptable, what review steps are required, and how to respond to incidents.

15) Troubleshooting: common issues and fixes

401 Unauthorized

Confirm you’re using Authorization: Bearer … (not an API key header).
Confirm you’re calling the correct region base URL (US vs EU vs AU).
Verify token is active and has access to the endpoint family you’re calling.

400 Bad Request

Check you sent multipart/form-data for Completion.
Confirm knowledge_sources is a JSON-encoded string and properly escaped.
Do not send both file and knowledge_sources in the same Completion request.

429 Too Many Requests

Reduce concurrency, implement a queue, add exponential backoff with jitter.
Batch where possible (e.g., one completion prompt to summarize multiple documents instead of many prompts).
Move exports to scheduled jobs and avoid “live export” patterns.

Vault uploads feel slow or fragile

Use consistent file paths and maintain a manifest of uploaded files.
Retry safely: design uploads to be idempotent (avoid duplicating content on retries).
Separate ingestion jobs from user-facing UI calls.

16) Frequently asked questions

In practice, API access is typically provisioned for organizations, and tokens are obtained through your account/customer success channel. Feature availability can be tied to your agreement, and some endpoints (such as Completion) may require additional purchase depending on your contract.

Call the whoami endpoint: GET https://api.harvey.ai/api/whoami with your bearer token. It returns the underlying service user associated with the token, which is also useful for audit investigations.

For legal workflows, citations are often the point: they support verification and review. If you disable citations you may gain speed, but you lose the strongest UX pattern for trust. Many teams keep citations on by default and only disable them for low-risk internal automation.

Vault is usually better for repeatable work: it supports project organization, tracking file IDs, and reuse across workflows. Direct file attachments are convenient for one-off analysis but can be harder to manage over time.

Implement a server-side queue with concurrency caps and retries. Use exponential backoff when you get 429 responses, and schedule exports (history/audit) instead of running them in response to interactive UI events.

Show a draft answer plus a citations panel. Let users click a citation to see the quoted snippet and page number. Encourage editing and review before anything goes to a client or becomes part of work product.

Harvey AI API - Complete Developer Guide for Legal Workflows

What this page covers

Table of contents

1) Overview: what the Harvey AI API is

2) Access & provisioning: what you need before you code

Typical prerequisites

When the API is the right fit

When the API might not be necessary

3) Authentication & region endpoints

Base URLs (region-aware)

Auth header pattern

Quick token validation (whoami)

4) Rate limits (per organization, reset every minute)

Recommended client behavior

5) Endpoint map: what exists in the Harvey API

Assistant

Vault

History Exports

Audit Logs

Client Matters

Cross-cutting controls

6) Quickstart: your first successful Harvey API call

Step 1 — Confirm identity

Step 2 — Simple Completion request

7) Assistant API: Completion endpoint

Key request fields (high level)

Understanding citations

Draft vs assist (practical interpretation)

Streaming vs non-streaming

Use streaming when

Use non-streaming when

Error handling patterns

8) Vault APIs: secure document projects, uploads, metadata, deletes

Common Vault operations

Vault endpoints (examples)

Practical use cases

Best practices for Vault structure

9) Vault RAG grounding: ask questions using Vault as a knowledge source

How it works conceptually

Vault grounding example (knowledge_sources)

Vault vs direct file uploads

Web grounding

Controlling citations for speed

10) History Exports: usage history and query history

Two key export types

Endpoints

Usage history: what it’s for (and what it’s not)

Scheduling export jobs

11) Audit Logs: compliance monitoring and incident response

Endpoints (common)

Timestamp-based search

Earliest/latest for bootstrapping

Operational best practice: build a “collector”

12) Client Matters: attribution and scope-based controls

Endpoints

Add or update client matters (bulk-friendly)

How “allowed” flags can help governance

Recommended matter design

13) Production architecture: a practical reference design

A simple architecture that scales

Why you want a queue even at small scale

Designing a trustworthy legal UI

14) Security & compliance considerations

Practical security checklist for your implementation

Region hosting alignment

15) Troubleshooting: common issues and fixes

401 Unauthorized

400 Bad Request

429 Too Many Requests

Vault uploads feel slow or fragile

16) Frequently asked questions