Harvey AI API - Complete Developer Guide for Legal Workflows
Harvey is an AI platform built for legal and professional services teams. Its API suite is designed for secure, scalable automation: embed legal reasoning into internal apps, manage document libraries through Vault, ground answers in firm knowledge, export usage and query history, pull audit logs for compliance, and map activity to client matters for attribution.
What this page covers
This is a practical, “ship it” guide. It explains Harvey API access and authentication, the main endpoint families, how Vault grounding and citations work, how to design reliable production integrations, and what to consider for legal-grade security, auditability, and usage governance.
Educational note: This page is an independent guide. Always confirm contractual/feature availability in your order form and with Harvey support before relying on any workflow in production.
Table of contents
1) Overview: what the Harvey AI API is
The Harvey AI API is a set of endpoints that let organizations integrate Harvey’s legal-focused AI capabilities into their own systems and workflows—rather than requiring staff to work only inside a standalone web app. Think of it as “legal intelligence as a service” with enterprise controls: your systems can send prompts, attach documents, request grounded/cited answers, and record activity in ways that support compliance and governance.
Unlike generic LLM endpoints, Harvey’s API ecosystem is organized around the realities of legal work: document libraries, matter attribution, audit logs, and structured exports for leadership reporting. It supports:
- Assistant completion for legal reasoning, drafting, analysis, and Q&A.
- Vault APIs to upload, organize, and manage documents in projects/knowledge bases.
- Grounding via Vault RAG (retrieve and cite internal documents) and optional web sources.
- History exports for usage reporting and query forensics.
- Audit logs for workspace activity monitoring and incident response.
- Client matters to align usage with billing codes, projects, or engagements.
2) Access & provisioning: what you need before you code
In many developer platforms you can sign up, create an API key, and immediately start making requests. Harvey’s model is closer to enterprise provisioning: tokens and feature scopes are tied to your organization’s agreement, and API usage is expected to align with internal governance (especially in regulated legal contexts).
Typical prerequisites
- An enterprise relationship (or equivalent organizational access) with Harvey.
- Confirmation of enabled features in your order form (e.g., Completion, Vault, exports).
- An organization token (bearer token) obtained via your Harvey customer success/account team.
- Security review readiness (vendor assessments, trust center docs, data handling alignment).
- Integration plan for your DMS/CLM/storage systems if you’ll use Vault.
When the API is the right fit
Use the API if you want to embed Harvey into tools your teams already use—like internal portals, matter management systems, DMS workflows, or review pipelines—while centralizing governance.
When the API might not be necessary
If your team only needs interactive use in the UI (no custom integration, no ETL ingestion, no internal system embedding), the web app alone may cover your needs.
Implementation hint: treat this like an enterprise integration. Plan for least-privilege tokens, centralized logging, a staging environment, and “break-glass” procedures for credential rotation.
3) Authentication & region endpoints
Harvey’s API uses bearer token authentication. You include your token in the
Authorization header of every request. Keep tokens server-side only (never in browser JS),
rotate them on a schedule, and redact them from logs and screenshots.
Base URLs (region-aware)
Harvey supports region-specific API endpoints for organizations on EU-hosted or AU-hosted deployments. The primary base URL is the US endpoint, with alternates for EU and AU.
| Region / Deployment | Base URL | When to use |
|---|---|---|
| US-hosted (default) | https://api.harvey.ai |
Most organizations unless contract specifies EU/AU hosting |
| EU-hosted | https://eu.api.harvey.ai |
When your organization is provisioned on the EU deployment |
| AU-hosted | https://au.api.harvey.ai |
When your organization is provisioned on the AU deployment |
Auth header pattern
Authorization: Bearer YOUR_TOKEN_HERE
Quick token validation (whoami)
A practical first call is the “whoami” endpoint to verify that authentication works and to identify which service user is associated with a token (useful when investigating audit logs).
GET https://api.harvey.ai/api/whoami
Authorization: Bearer YOUR_TOKEN_HERE
4) Rate limits (per organization, reset every minute)
Harvey applies rate limits per organization and resets counters every minute. You should design clients to
handle 429 Too Many Requests gracefully with exponential backoff and jitter, and prefer batching
operations where possible.
| Endpoint category | Limit (requests/minute) | Typical workload |
|---|---|---|
| Assistant Completion | 20 | Prompting, drafting, analysis calls |
| Vault APIs | 10 | Uploads, metadata, deletes, project operations |
| Audit Logs | 60 | Compliance/monitoring fetches |
| History Exports | 60 | Usage/query exports for reporting and forensics |
| Client Matters | 150 | Bulk onboarding / attribution updates |
Recommended client behavior
- Implement retries with backoff for
429and transient5xxerrors. - Use request queues and concurrency caps per endpoint family.
- Prefer idempotent designs: retries should not create duplicates or corrupt state.
- Make exports on a schedule (e.g., hourly/daily) rather than “per user action.”
5) Endpoint map: what exists in the Harvey API
At a high level, Harvey’s API suite can be grouped into five families:
Assistant
The Completion endpoint for asking questions, drafting, and analyzing—optionally grounded in documents and accompanied by citations.
Vault
Manage document projects/knowledge bases, upload and delete files, preserve folder structure, and retrieve metadata for review workflows.
History Exports
Export usage history (high-level metadata) and query history (for deeper reviews where allowed) to support adoption monitoring, reporting, and investigations.
Audit Logs
Query audit logs by timestamp or ID, retrieve earliest/latest entries, and paginate for continuous compliance capture.
Client Matters
Create, retrieve, and delete client-matter mappings to attribute usage and support access or scope-based controls in downstream reporting.
Cross-cutting controls
Auth, region endpoints, and rate limit behavior apply across the entire API surface.
In the sections below, we’ll go endpoint family by endpoint family and highlight the core request/response structure, practical use cases, and production patterns.
6) Quickstart: your first successful Harvey API call
A “first call” should be safe, deterministic, and easy to troubleshoot. The usual sequence is:
- Verify your token works with
/api/whoami. - Make a basic Completion request with a short prompt (no files, no knowledge sources).
- Add streaming only after non-streaming works (it’s easier to debug).
- Then add grounding (Vault or web) and confirm citations are returned as expected.
Step 1 — Confirm identity
curl -X GET "https://api.harvey.ai/api/whoami" \
-H "Authorization: Bearer YOUR_TOKEN_HERE"
Step 2 — Simple Completion request
The Completion endpoint uses multipart/form-data for requests. You’ll send the prompt and options
as form fields.
curl --request POST \
--url https://api.harvey.ai/api/v2/completion \
--header "Authorization: Bearer YOUR_TOKEN_HERE" \
--header "Content-Type: multipart/form-data" \
--form "prompt=Summarize the practical difference between indemnity and limitation of liability in plain English." \
--form "stream=false" \
--form "mode=draft"
7) Assistant API: Completion endpoint
The Assistant API is centered on a single main endpoint:
POST /api/v2/completion. This call supports freeform legal Q&A, document analysis, and drafting,
and it can optionally return citations when grounded sources are provided.
Key request fields (high level)
- prompt: your question or drafting instruction (up to 20,000 characters).
- mode: typically
draftorassistdepending on desired behavior. - stream:
truefor incremental output,falsefor full response. - client_matter_id: associate a completion with a specific client matter (optional).
- knowledge_sources: JSON-encoded array to ground answers (Vault and/or web).
- file: attach files directly (cannot be used together with
knowledge_sources). - include_citations: query parameter (defaults to true) controlling citation generation speed.
Understanding citations
When citations are enabled (default), Harvey can return a response_with_citations field that includes
inline citation markers like [1], plus a sources array with snippets and page references
when documents/knowledge sources were provided.
For legal teams, this is a big deal: it supports a “trustable UI” where attorneys can review the grounded source excerpt and confirm that a claim is supported.
Draft vs assist (practical interpretation)
Mode selection is about how you want the output shaped:
- draft: produce polished, client-ready prose (emails, clauses, summaries) that you can edit.
- assist: produce more direct “analysis/help” answers for internal use or chat-style flows.
Streaming vs non-streaming
Streaming can improve perceived latency in interactive UIs. Non-streaming is simpler for batch jobs, ETL tasks, or workflows where you store a complete result and then run post-processing.
Use streaming when
You are building a chat or drafting UI, and users benefit from seeing output immediately. Make sure your UI can handle partial updates and cancellation.
Use non-streaming when
You need predictable outputs for ingestion pipelines, you’re writing results to a database, or you need to run validations after completion.
Error handling patterns
The Completion endpoint uses standard HTTP status codes. For robust integrations:
- 400: validate parameters and payload shape; surface actionable messages for developers.
- 401: invalid or missing token; rotate/verify secrets; confirm environment base URL.
- 429: backoff and retry; add queueing; reduce concurrency.
- 5xx: retry with backoff; log correlation IDs if provided; contact support if persistent.
8) Vault APIs: secure document projects, uploads, metadata, deletes
Vault is the API family for managing documents in structured “projects” (and knowledge bases), enabling your organization to ingest files from existing systems and then analyze them within Harvey. The Vault endpoints are designed for integrations with document management systems (DMS), contract lifecycle management (CLM) tools, file storage platforms, and internal ETL pipelines.
Common Vault operations
- List projects: discover what projects/knowledge bases exist in the workspace.
- Create project: set up a new container for a deal, matter, client, or knowledge domain.
- Upload files: ingest documents while preserving folder paths for organization.
- Get metadata: retrieve file IDs, names, sizes, and other details for tracking and review.
- Delete file: remove outdated or erroneous documents.
- Delete project: remove an entire project and its contents (high impact; handle carefully).
Vault endpoints (examples)
GET /api/v1/vault/workspace/projects
POST /api/v1/vault/upload_files/{project_id}
GET /api/v1/vault/get_metadata/{project_id}
DELETE /api/v1/vault/delete_file/{file_id}
DELETE /api/v1/vault/delete_project/{project_id}
Practical use cases
- Secure document ingestion from iManage/NetDocuments/SharePoint-like systems into a deal project.
- Bulk due diligence review by uploading a data room export and then asking targeted questions.
- Policy and playbook grounding by maintaining a “knowledge base project” with canonical templates.
- Automated clean-up for incorrect uploads, duplicates, or time-limited engagements.
Best practices for Vault structure
- Organize by project: one project per deal/matter/client engagement when possible.
- Preserve paths: keep folder structures consistent across uploads; it helps review workflows.
- Track file IDs: persist file IDs in your database so you can reference them later for grounding.
- Confirm destructive actions: require explicit approval before deleting projects or large sets of files.
9) Vault RAG grounding: ask questions using Vault as a knowledge source
A powerful Harvey pattern is to ground completions in Vault projects and files using a
knowledge_sources array. This enables “Vault RAG” (retrieval-augmented generation): the system can
reference relevant documents and return citations that point back to specific snippets and pages.
How it works conceptually
- You ingest documents into Vault projects (or use existing knowledge base projects).
- You call
/api/v2/completionwithknowledge_sourcesset to Vault. - The system grounds the response in the documents and returns citations when enabled.
- Your UI shows a “citations panel” so users can verify sources before using output.
Vault grounding example (knowledge_sources)
Note: knowledge_sources is passed as a JSON-encoded string in the form-data request.
curl --request POST \
--url https://api.harvey.ai/api/v2/completion \
--header "Authorization: Bearer YOUR_TOKEN_HERE" \
--header "Content-Type: multipart/form-data" \
--form "prompt=Summarize these documents and flag any non-standard indemnity language." \
--form 'knowledge_sources=[{"type":"vault","folder_id":"YOUR_VAULT_PROJECT_ID","file_ids":["FILE_ID_1","FILE_ID_2"]}]' \
--form "stream=false" \
--form "mode=assist"
Vault vs direct file uploads
You can either upload files directly on a completion request (file field) or you can ground via
Vault (knowledge_sources). The difference is operational:
- Direct file upload is convenient for one-off analysis, but can be harder to track over time.
- Vault grounding is better for repeatable workflows, shared projects, and audit-friendly reuse.
file uploads cannot be used
together with knowledge_sources. Choose one approach per request.
Web grounding
Harvey’s Completion API also supports a web knowledge source type. In practice, organizations should treat web grounding carefully in legal contexts: define when it’s allowed, capture citations, and require human review.
--form 'knowledge_sources=[{"type":"web"}]'
Controlling citations for speed
The Completion endpoint supports an include_citations parameter (defaults to true). Disabling citations
can return results faster, but it reduces verifiability—which is often undesirable for legal work.
POST /api/v2/completion?include_citations=false
10) History Exports: usage history and query history
Harvey’s History Export APIs are designed to help organizations understand how the platform is being used, monitor adoption, support leadership reporting, and investigate questions about activity patterns.
Two key export types
- Usage history: metadata over a time range (user, timestamps, event type), designed not to include sensitive inputs/outputs.
- Query history: for deeper analysis of queries and sources (availability and detail may depend on your permissions and configuration).
Endpoints
GET /api/v1/history/usage
GET /api/v1/history/query
Usage history: what it’s for (and what it’s not)
Usage history supports programmatic reporting and oversight. It’s helpful for answering questions like:
- Are we seeing adoption across teams?
- Which product areas (Assist vs Draft, files vs web) are most used?
- Are certain departments hitting rate limits more often than others?
- How does usage map to client matters for billing attribution?
Usage exports are generally built to avoid exposing sensitive content. Instead, they provide operational metadata that helps governance teams understand patterns without turning the export into a “content leakage” vector.
Scheduling export jobs
- Weekly/monthly reporting: run scheduled jobs that load exports into your BI tool.
- Compliance and investigations: run targeted exports for specific time ranges.
- Continuous monitoring: fetch increments (e.g., every 15 minutes) and store in your SIEM/data lake.
11) Audit Logs: compliance monitoring and incident response
Audit logs are the “paper trail” for activity in a workspace: they enable compliance teams to monitor actions, investigate incidents, and maintain an auditable record. Harvey provides endpoints to:
- Search logs starting at a timestamp
- Retrieve the earliest log
- Retrieve the latest log
- Query/paginate through logs over time
Endpoints (common)
GET /api/v1/logs/audit/search
GET /api/v1/logs/audit/earliest
GET /api/v1/logs/audit/latest
GET /api/v1/logs/audit
Timestamp-based search
A common pattern is to start from a time boundary (e.g., “start of day UTC”) and then paginate through results.
curl -X GET "https://api.harvey.ai/api/v1/logs/audit/search?time=1712066546" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json"
Earliest/latest for bootstrapping
For initial setup, retrieve earliest to backfill from the beginning, or retrieve latest to begin near the most recent events.
curl -X GET "https://api.harvey.ai/api/v1/logs/audit/earliest" \
-H "Authorization: Bearer YOUR_API_KEY"
curl -X GET "https://api.harvey.ai/api/v1/logs/audit/latest" \
-H "Authorization: Bearer YOUR_API_KEY"
Operational best practice: build a “collector”
Treat audit logs like a compliance feed:
- Run a scheduled collector (e.g., every 5–15 minutes).
- Store logs immutably (append-only) in your data lake or SIEM.
- Track checkpoint state so you can resume after outages (by timestamp or log ID).
- Alert on abnormal activity (unusual login patterns, bulk exports, admin changes).
12) Client Matters: attribution and scope-based controls
Client matters are a foundational concept for many legal organizations: work is tracked by billing codes, engagement IDs, or internal project identifiers. Harvey’s Client Matter API lets you programmatically create, retrieve, and remove these associations so that usage and queries can be attributed accurately.
Endpoints
POST /api/v1/client_matters
GET /api/v1/client_matters
DELETE /api/v1/client_matters
Add or update client matters (bulk-friendly)
curl -X POST https://api.harvey.ai/api/v1/client_matters \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"client_matters": [
{ "cm_name": "123-45", "cm_desc": "Acme Corp Bankruptcy", "cm_allowed": "true" },
{ "cm_name": "M-2025-0094", "cm_desc": "Example Engagement", "cm_allowed": "false" }
]
}'
How “allowed” flags can help governance
Many organizations want to restrict which matters can be used for certain workflows (or to prevent accidental attribution to a deprecated matter code). An allow/deny flag helps enforce “only approved matters” downstream.
Recommended matter design
- Use a stable ID scheme that maps cleanly to billing systems (avoid “friendly names” that change).
- Store matter metadata in your own system-of-record and sync to Harvey on a schedule or via events.
- Automate deactivation for closed matters and require manual approval to reactivate.
13) Production architecture: a practical reference design
Legal-grade integrations tend to fail for boring reasons: concurrency spikes, missing audit trails, leaked tokens, unclear governance, or brittle ETL. Here is a battle-tested approach to reduce operational risk.
A simple architecture that scales
- Backend API gateway: your app talks to your server, not directly to Harvey.
- Request queue: all Completion calls go through a queue with concurrency caps to respect rate limits.
- Secrets manager: tokens are stored and rotated centrally.
- Vault ingestion pipeline: documents are uploaded to Vault with consistent paths and tracked file IDs.
- Observability & logging: store request metadata, response IDs, and error outcomes without leaking sensitive content.
- Audit collector: scheduled job pulls audit logs into your SIEM.
- History exporter: scheduled job exports usage/query history into BI/reporting.
Why you want a queue even at small scale
The Completion endpoint has a relatively modest per-minute limit. A single busy team can exceed it if every UI action becomes a new request. A queue makes throughput predictable, provides retry control, and gives you a place to implement “priority” (e.g., interactive drafting gets priority over batch summarization).
Designing a trustworthy legal UI
- Default to citations for any workflow that could influence legal advice, drafting, or client communications.
- Show sources next to claims (click-to-expand snippet and page reference).
- Log who requested what (user identity, timestamp, matter ID, project ID) for accountability.
- Encourage review: present results as “draft / suggestion,” not as final authority.
- Provide a “copy with citations” option so users can paste into memos with supporting references.
14) Security & compliance considerations
Harvey’s security posture is a core reason legal organizations adopt it: the platform emphasizes encryption and access controls, and (by default) states it does not train on customer data. It also references annual SOC 2 Type II and ISO 27001 audits/certifications in its security materials and trust center.
Practical security checklist for your implementation
- Keep tokens server-side and use short-lived session tokens for your own app where possible.
- Encrypt sensitive data in transit and at rest in your own systems, too.
- Minimize data sent in prompts (avoid unnecessary personal data; prefer document grounding in Vault).
- Implement access controls so only authorized users can query a given Vault project/matter.
- Capture an audit trail in your app: who requested it, when, what matter, what documents.
- Plan for retention: define how long prompts/outputs/logs should be kept and where.
Region hosting alignment
If your organization is on EU-hosted or AU-hosted deployments, enforce the correct base URL at the configuration level (not per request) to reduce the chance of accidental cross-region calls.
15) Troubleshooting: common issues and fixes
401 Unauthorized
- Confirm you’re using
Authorization: Bearer …(not an API key header). - Confirm you’re calling the correct region base URL (US vs EU vs AU).
- Verify token is active and has access to the endpoint family you’re calling.
400 Bad Request
- Check you sent
multipart/form-datafor Completion. - Confirm
knowledge_sourcesis a JSON-encoded string and properly escaped. - Do not send both
fileandknowledge_sourcesin the same Completion request.
429 Too Many Requests
- Reduce concurrency, implement a queue, add exponential backoff with jitter.
- Batch where possible (e.g., one completion prompt to summarize multiple documents instead of many prompts).
- Move exports to scheduled jobs and avoid “live export” patterns.
Vault uploads feel slow or fragile
- Use consistent file paths and maintain a manifest of uploaded files.
- Retry safely: design uploads to be idempotent (avoid duplicating content on retries).
- Separate ingestion jobs from user-facing UI calls.
16) Frequently asked questions
GET https://api.harvey.ai/api/whoami with your bearer token.
It returns the underlying service user associated with the token, which is also useful for audit investigations.