Krisp API - Complete Developer Guide

Krisp API: what developers can integrate (SDKs, webhooks, and OAuth-based platform endpoints)

“Krisp API” is a search term people use when they want to build noise-free calling, real-time voice AI agents, meeting transcription pipelines, or a “send summaries to my app” integration. Krisp’s developer story spans multiple surfaces: real-time AI Voice SDKs (running on-device, in the browser, and on servers), as well as webhook-style automations for meeting outputs and an OAuth-based “Platform API” centered on subscriptions to meeting-note events.

This page is designed for builders. It explains what Krisp exposes publicly, how those pieces fit together, and how to design a production architecture that is reliable (and not fragile “network tab reverse engineering”).

AI Voice SDK (VIVA + RTC) Noise Cancellation (NC) Background Voice Cancellation (BVC) Accent Conversion (AC) Voice Isolation + Turn-Taking Webhooks (meeting outputs) OAuth2 Platform API (subscriptions) Speech-to-Text API (STT)

What most developers want

A clean way to improve audio quality (noise/echo/voices), power voice agents, and connect transcripts/summaries to downstream tools.

What Krisp provides

Commercial SDKs for real-time voice enhancement across server, browser, desktop, and mobile—plus meeting assistant webhook automation and an OAuth subscriptions API.

How to choose the right surface

Use the SDK for real-time audio processing. Use webhooks/OAuth Platform API for meeting-note automation and event delivery. Use STT API if you need transcription at scale.

Independent educational note

Krisp’s product lineup and developer docs can change. Always confirm the latest licensing, security expectations, and platform capabilities with Krisp’s official documentation before shipping production features.

1) What “Krisp API” can mean

Definitions

The phrase “Krisp API” is overloaded. In practice, it usually refers to one (or more) of these:

Krisp AI Voice SDK — libraries you embed into your application to process audio in real time. This includes server-side components for voice AI agents and on-device/browser components for conferencing or call center apps.
Meeting Assistant automation — webhook-style delivery of meeting outputs like transcripts, notes, and outlines to another system.
Krisp Platform API — OAuth 2.0-based endpoints (documented via Postman) that focus on subscriptions and event notifications around meeting notes/summaries/action items.
Speech-to-Text (STT) API — Krisp’s speech-to-text offering intended for call centers and BPO workflows, positioned as cost-efficient and privacy-focused.

A useful way to pick the right path is to decide what you’re building: if you need real-time audio enhancement, reach for the Voice SDK. If you need outputs and automations (transcripts, summaries, action items), reach for webhooks and/or the OAuth subscriptions API. If your core need is transcription at scale (call center analytics, compliance monitoring, QA), evaluate the STT API.

Surface	Best for	Runs where?	Typical integration pattern
AI Voice SDK	Noise/voice/echo reduction, accent conversion, voice isolation, turn-taking	Server, browser, desktop, mobile	Embed SDK into audio path (WebRTC, SIP, voice agent stack)
Meeting Assistant Webhook API	Send transcripts/notes/outlines to your system	Krisp → your HTTPS endpoint	Configure webhook URL + auth headers; receive POSTs on events
Platform API (OAuth2)	Subscribe to meeting-note events; manage subscriptions	Cloud endpoint (Krisp) + your backend	OAuth2 authorization code with PKCE + webhook subscriptions
Speech-to-Text API	Transcription for CX/call center workflows	Device-side processing + your app	Integrate into existing CX/voice platforms; use for analytics/QA

2) Krisp AI Voice SDK: the “API” most builders actually need

Core

Krisp describes its SDK as a collection of real-time AI-powered technologies that improve speech clarity in real-time communication applications such as conversational AI, conferencing, collaboration, streaming/podcasting, and mobile calls. The developer hub presents multiple SDK tracks and components.

Think of the Krisp Voice SDK as a set of audio filters + models that you wire into your application’s audio path. Instead of “send audio to the cloud and get audio back,” the design emphasizes low-latency, real-time use with the audio staying under your control.

2.1 What Krisp’s models do (plain English)

Noise Cancellation (NC): removes background noise in real time. There are distinct models for outbound (microphone) and inbound (speaker) streams.
Background Voice Cancellation (BVC): removes other people’s voices near the primary speaker (useful for call centers and shared offices).
Accent Conversion (AC): converts an agent’s accent to a target accent in real time for clearer calls.
Voice Isolation (VIVA): designed for voice AI agents—removes background voices/noise to reduce false interruptions and improve turn-taking/STT accuracy.
Turn-Taking: detects likely end-of-turn moments to make AI agents respond more naturally and avoid awkward overlaps.

2.2 Where the SDK runs

Krisp’s docs outline server-side usage (especially for voice AI agents) as well as browser, desktop, and mobile environments. Because many components are C-library based, they can be integrated into a wide range of stacks, including voice agent frameworks and WebRTC pipelines.

Implementation mindset

Treat Krisp as an “audio processing layer” rather than a typical REST API. Your job is to place the filter at the right point in the audio chain (before your STT, before your agent’s VAD, before sending audio to the network, etc.).

3) SDK families: VIVA (voice agents) vs RTC/on-device (conferencing & calls)

Architecture

Krisp documentation distinguishes between server-side VIVA components aimed at voice AI agents and RTC/on-device components aimed at real-time communications (RTC) scenarios like calls and meetings.

3.1 VIVA SDK (server-side) for voice AI agents

The VIVA SDK is positioned to improve turn-taking and STT accuracy in voice AI agents using small CPU-based models (voice isolation, turn-taking, VAD). The idea is simple: clean the inbound audio and stabilize the turn-taking signal so the agent interrupts less and understands more reliably.

A typical voice agent pipeline looks like this:

Ingest audio (WebRTC, SIP, telephony, or a streaming transport)
Apply voice isolation/noise/voice cancellation layer
Run VAD + turn-taking detection
Feed STT with cleaner audio
Agent reasoning + response generation
TTS output (optionally post-processed for clarity)

3.2 RTC / on-device SDK for calls, meetings, and conferencing

In RTC scenarios, the SDK usually sits inside the client application (desktop/mobile/web) and processes microphone and/or speaker streams before they are transmitted or played. That’s how you can improve audio for end users without requiring them to install a separate audio device or system driver.

Krisp’s docs also highlight multiple integration guides for popular platforms and frameworks—particularly for web-based apps using a JavaScript SDK approach and for well-known RTC stacks.

4) Krisp models in detail: NC, BVC, and AC

Deep dive

Below is a developer-friendly explanation of the main RTC-oriented models as described in Krisp’s documentation. (Even if your ultimate goal is “just remove background noise,” it helps to understand the model boundaries and constraints.)

4.1 Noise Cancellation (NC)

Noise Cancellation is designed to remove background noise during real-time communication. Krisp’s documentation explicitly differentiates between outbound (microphone) NC and inbound (speaker) NC, reflecting that the acoustic and network realities are different for each direction.

Outbound NC: optimized for near-field microphone conditions (close to the mouth). Performance varies in far-field environments depending on distance, echo, SNR, and device characteristics.
Inbound NC: handles overlapping speakers and robustness against codec degradations and low bandwidth scenarios; it is designed for the “speaker” stream.
De-reverberation: the NC process includes de-reverberation, aiming to reduce room echo during processing.

Practical implication: if you are building a call center desktop app, you may want outbound NC (mic) by default and consider inbound NC selectively depending on how noisy the agent’s environment is and whether the speaker stream is degraded.

4.2 Model sizes and latency thinking

Krisp documentation references small models (default) and big models (on demand) for noise cancellation, trading CPU cost for quality. From an engineering perspective, this is a classic quality-performance trade-off: you might pick small models for low-end hardware, and big models for high-end systems or specialized deployments.

On latency: real-time audio filters must be fast. Krisp’s docs illustrate that algorithmic latency depends on sampling rate and frame duration, and that with common settings like 10ms frames at 16kHz, the latency can be in the tens of milliseconds range. This matters because it directly affects how “natural” live conversations feel.

4.3 Background Voice Cancellation (BVC)

BVC is built to remove other voices near the primary speaker and also remove background noise and reverberation. This is especially useful for “cross-talk” scenarios (a nearby coworker’s voice leaking into the agent mic). Krisp notes that BVC does not require user enrollment or training, and it is designed to work with headsets and earbuds.

In a real deployment, BVC becomes a differentiator when:

Agents sit close to each other in a call center floor
Remote workers share a room with other talking people
Hybrid offices have constant voice noise from nearby desks

4.4 Accent Conversion (AC)

Accent Conversion is described as a real-time AI voice conversion model designed to neutralize accents in call center environments. Krisp’s documentation highlights that it converts specific accent groups to a target accent and that it also removes background noise and voices. For engineering planning, treat AC as a specialized feature with stricter hardware requirements and careful UX design considerations.

Model	Primary goal	Best environments	Engineering considerations
NC (Outbound/Inbound)	Remove background noise	Calls, meetings, conferencing	Frame size, sampling rate, CPU budget, avoid double-processing
BVC	Remove other voices near mic	Call centers, shared offices	Device compatibility and sampling constraints; monitor voice suppression risk
AC	Accent conversion to target accent	Call center agent speech	Latency budgets, CPU requirements, user experience and transparency

5) Getting started (server-side): VIVA + real-time filtering for voice agents

Quickstart

Server-side audio filtering is common when you build voice AI agents, because the agent stack often lives on servers. Krisp’s docs describe VIVA SDK as server-side and mention multiple language bindings, including Python, Node.js, Go, Rust, C++, and C.

Conceptually, you initialize the SDK, create a session/filter instance with the right configuration (sample rate, frame duration), then process audio frames in a loop. The SDK returns processed frames that you can forward to STT or any downstream signal processing.

Conceptual pseudocode (frame-by-frame processing)

// PSEUDOCODE (conceptual) — your actual code depends on Krisp's SDK bindings

initKrispSdk(licenseKeyOrPath);

const cfg = {
  inputSampleRate: 16000,
  frameDurationMs: 10,
  // model path or model selection settings
};

const filter = createNoiseOrVoiceIsolationFilter(cfg);

while (streamHasFrames) {
  const frame = readNextPcmFrame();    // e.g., Float32 or Int16 PCM
  const clean = filter.process(frame); // processed frame, same duration
  sendToStt(clean);                    // or to VAD, or to agent pipeline
}

filter.dispose();
shutdownKrispSdk();

This job-loop style is the standard pattern for audio DSP and makes it easy to reason about latency: every frame you process adds some algorithmic overhead. If you keep frames small and keep CPU overhead under control, the conversation remains natural.

Where to place the filter in a voice agent

If your agent uses VAD and turn-taking, place voice isolation/noise cancellation before VAD so the VAD sees cleaner speech. If your agent uses STT, place filtering before the STT input to reduce WER.

6) Supported platforms & integrations (browser + RTC ecosystem)

Compatibility

Krisp’s platform documentation emphasizes broad deployment across servers, desktop, mobile, OS, browsers, and frameworks. The docs point out that because core components are C-library based, integration is possible across many audio stacks, and also reference specific frameworks where Krisp filters are built-in.

6.1 Supported OS (server-side)

Krisp’s server-side platforms list includes Linux (x64, armv8a), Windows (x64), and macOS (x64, armv8a). This is useful if you deploy voice agents on Linux servers, run local speech infrastructure on Windows, or ship macOS tooling.

6.2 Integration guides (web/RTC)

Krisp’s JavaScript SDK documentation highlights integration paths with popular communication and RTC frameworks and platforms. For web-based voice and calling, this matters because you usually want “drop-in noise cancellation” without rewriting your entire media stack.

Twilio Voice: Krisp documents a guide for integrating with Twilio’s Audio Processor API (notable for web calling apps).
Amazon Connect: guidance for contact center use cases.
Jitsi: guidance for self-hosted conferencing.
SIP.js: guidance for SIP endpoints in the browser.
React: integration guide for React applications.
Electron: guidance for desktop apps built with web technologies.

The big idea: Krisp tries to meet you where you already are—WebRTC, Twilio Voice, SIP, and “framework world.” That reduces integration time and improves adoption.

Common pitfall: double noise-cancellation

If you apply noise cancellation in multiple places (device, browser, agent, server), the second model may see “already processed” audio and produce worse results. Pick a single primary place to apply Krisp and measure quality end-to-end.

7) Licensing & security: what production teams must plan for

Governance

Krisp’s SDK documentation states that the SDK is a commercial product and that developers need to obtain a commercial license to integrate it into their products. That means licensing is not an afterthought—your architecture should include a clear plan for how licenses are provisioned, validated, and rotated.

7.1 Why licensing affects architecture

Deployment model: server-side deployment vs on-device deployment affects how license verification is performed.
Environment separation: you’ll likely need different keys/licenses for dev, staging, and production.
Operational readiness: you should have monitoring and alerting for license verification failures (because that can break audio quality features in real time).

7.2 Security posture (as described in the SDK docs)

Krisp’s security documentation claims that Krisp does not access, collect, or store audio data, and that the SDK is designed to operate on the customer’s premises, with processing performed on the customer side. It also notes that the SDK accesses the network only for license verification needs and that metadata generation is activated only upon explicit instruction.

From a builder’s standpoint, this implies a privacy-friendly model where audio remains under your control. It also implies you must:

Review how license verification networking behaves in your environment
Decide whether optional metadata generation is enabled
Document your privacy posture clearly to end users (especially in regulated industries)

Practical compliance tip

If you are in a regulated environment (healthcare, finance, government), run a privacy review that includes data flow diagrams: where audio exists, where it is processed, what gets logged, and who can access outputs.

8) Krisp Meeting Assistant Webhook API: sending transcripts/notes/outlines to your system

Automation

Krisp also offers a Webhook API in its help documentation for automatically sending meeting outputs—like transcripts, notes, and outlines—to another tool or internal system via an HTTPS endpoint.

8.1 What events can be sent?

The help doc describes events that fire when:

A transcript is created
Notes are generated (key points/action items)
An outline is generated

8.2 What you need before enabling

You typically need:

A Webhook URL that can receive HTTPS POST requests
Optional authentication headers (for example, a token) if your endpoint requires auth

8.3 Recommended webhook receiver design

Don’t treat webhooks as a “fire-and-forget magic pipe.” Make them durable:

Verify authenticity: validate any provided signatures or shared secrets if available; otherwise use per-endpoint tokens and strict IP allowlists when practical.
Idempotency: store an event id or hash so retries don’t create duplicates.
Queue: immediately enqueue the payload and return 200 OK; process asynchronously.
Observability: log event type + meeting id + timestamp; capture failures with alerts.

Webhook receiver skeleton (example)

// Example webhook receiver (pseudo-Node/Express)
app.post("/krisp/webhook", async (req, res) => {
  // 1) authenticate (token header, signature, etc.)
  const token = req.header("X-Webhook-Token");
  if (token !== process.env.KRISP_WEBHOOK_TOKEN) return res.status(401).send("Unauthorized");

  // 2) idempotency key
  const eventId = req.body?.event_id || hash(req.body);

  // 3) enqueue for processing
  await queue.publish("krisp_events", { eventId, payload: req.body });

  // 4) acknowledge fast
  res.status(200).send("OK");
});

With this pattern, even if your downstream systems are slow or temporarily down, you can still accept events reliably and process them later.

9) Krisp Platform API (OAuth 2.0): subscriptions + meeting-note events

OAuth

Krisp has documentation in a Postman public workspace for a “Krisp Platform API.” It describes a RESTful API using JSON output, with OAuth 2.0 authorization. The stated current capabilities focus on subscribing to meeting note-related events and triggering actions within a connected application.

9.1 OAuth 2.0 model (high-level)

OAuth-based APIs are designed for scenarios where an end user authorizes your app to access specific data. In practice:

Your app sends the user to an authorization URL with requested scopes
User grants access
Your backend exchanges authorization code for an access token (and often a refresh token)
Your backend calls Krisp API endpoints with the access token

The Postman docs show scopes for managing subscriptions and a “meetings read” scope that is described as necessary to subscribe to meeting-related updates. The docs also illustrate authorization code + PKCE concepts (code_verifier/code_challenge) and token endpoints for exchanging and refreshing tokens.

9.2 Subscriptions API: connect events to your webhook

The Postman documentation includes subscription endpoints such as:

GET /subscriptions (list)
GET /subscriptions/:id (get)
POST /subscriptions (create)
PUT /subscriptions/:id (update)
DELETE /subscriptions/:id (delete)

For event types, the docs highlight meeting-note related events like summary_generated and action_items_generated. A subscription payload can include the event type, a subscription type (webhook), and an HTTPS destination URL.

Create subscription payload (conceptual)

// Conceptual JSON body to create a webhook subscription
{
  "event": "summary_generated",
  "type": "webhook",
  "details": { "url": "https://example.com/webhooks/krisp" },
  "params": {},
  "status": "enabled"
}

9.3 When to use Platform API vs Meeting Webhook API

They can look similar (both can deliver events), but they solve different integration needs:

Meeting Assistant Webhook API is configured in-product and is a fast path for “send meeting outputs to my system.”
Platform API is developer/OAuth-driven and is suitable when you are building an application integration that many users will install, authorize, and manage with scopes.

Production advice

If you’re building a multi-tenant SaaS integration, prefer OAuth + scoped subscriptions so each customer can grant and revoke access cleanly. If you’re building an internal integration for one organization, the in-product webhook approach may be simpler.

10) Krisp Speech-to-Text (STT) API: transcription for call centers & BPOs

Transcription

Krisp announced a Speech-to-Text (STT) API positioned for call centers and BPO environments. The announcement emphasizes device-side processing, real-time redaction capabilities (PII/PCI), out-of-the-box compatibility with CX and voice platforms, and a focus on cost efficiency. It also cites a word error rate (WER) figure from evaluations across datasets.

10.1 How to think about STT API in your architecture

In call center systems, transcription is rarely the end goal. Usually you want transcription because it powers:

Quality assurance (QA): scoring calls, coaching agents, and analyzing objection handling
Compliance: detecting disclosures, required statements, and risky phrases
Speech analytics: discovering themes, churn signals, customer intent, and sentiment proxies
Agent assist: real-time prompts, knowledge lookup, recommended next steps
Summaries and action items: structured notes for CRM

10.2 Real-time redaction and privacy expectations

If your business handles sensitive information (credit cards, addresses, account numbers), real-time redaction can be a major requirement. From an engineering viewpoint, design your pipeline so sensitive snippets are redacted before storage, indexing, or downstream analytics. Even if you trust your storage, least-privilege design is the safest route.

Practical checklist for production STT

Define retention rules (how long audio and transcripts are kept), enforce role-based access, and ensure your analytics pipelines never store raw sensitive strings when redaction is enabled.

11) Use cases: what you can build with the Krisp “API ecosystem”

Ideas

The easiest way to understand Krisp’s developer surfaces is to map them to shipping products. Below are common build patterns, with notes on which Krisp surface is usually the best fit.

11.1 Voice AI agent with fewer interruptions

Use: VIVA server-side voice isolation + turn-taking
Why: voice isolation reduces cross-talk and background voices; turn-taking reduces false interrupts and awkward overlaps
Pair with: your STT + LLM + TTS stack

11.2 Web calling app with “Krisp-quality audio”

Use: JS/Browser SDK integrated into WebRTC (or via Twilio Voice audio processing)
Why: improves mic stream quality before sending audio over the network
Pair with: echo management, AGC, and your RTC platform’s jitter buffer/transport

11.3 Call center desktop app with cross-talk protection

Use: BVC + NC in desktop/on-device stack
Why: call center floors are voice-noisy; BVC helps suppress other agents’ speech leaking into the mic
Pair with: monitoring of CPU usage and “quality modes” for different hardware tiers

11.4 Meeting summaries automatically sent to CRM

Use: Meeting Assistant Webhook API (fastest) or OAuth Platform API (multi-tenant)
Why: webhook events let you sync transcripts, notes, outlines, summaries, and action items into your systems
Pair with: idempotent receiver, queue-based processing, and CRM schema mapping

11.5 Compliance monitoring and analytics from transcripts

Use: STT API + analytics pipeline (or STT output from your platform)
Why: transcription is the substrate for scoring, compliance checks, and speech analytics
Pair with: PII/PCI redaction, strict access control, retention policies

12) Production architecture: reliability, latency, and “don’t make it worse” rules

Best practices

Audio is unforgiving. A great demo can become a terrible product if you don’t manage latency, CPU budgets, and signal-chain complexity. Here are practical engineering rules to keep quality high.

12.1 Decide where audio processing happens

If you process in the browser, the user’s device bears the CPU cost and you reduce network dependency. If you process on the server, you standardize results and can centralize monitoring—but you pay for compute and you must manage stream transport. In many real products, you’ll do one primary layer (client or server) and keep the other layer minimal.

12.2 Avoid “double processing”

Applying noise cancellation multiple times can create warbling artifacts and unnatural voice gating. Pick a single main noise cancellation layer and ensure other layers are disabled or set to “raw” wherever possible.

12.3 Provide a quality mode switch

Because models can have different CPU costs (and Krisp mentions small vs big model variants), it’s smart to expose “Balanced” vs “High Quality” modes. You can default to balanced and allow “high quality” for high-end devices or when CPU headroom is available.

12.4 Instrument and monitor

CPU usage during calls (per core, per process)
Audio frame processing time (p50/p95/p99)
End-to-end latency (mic → remote hear)
Dropouts and buffer underruns
User-reported “robotic voice” incidents

12.5 Make failure safe

If the filter fails (license verification issues, initialization error, incompatible device), your app should continue to function— just without enhancement. A “hard failure” that breaks the call is unacceptable.

Golden rule

Always have a safe fallback path to unprocessed audio. Don’t let “better audio” become “no audio.”

13) FAQ

Quick answers

Is Krisp an API or an SDK?

For most developer use cases, Krisp is best described as an SDK: you embed it into your app’s audio path to process audio in real time. There are also webhook and OAuth-based platform endpoints for delivering meeting outputs and subscribing to meeting-note events.

What’s the difference between Noise Cancellation (NC) and Background Voice Cancellation (BVC)?

NC targets non-speech background noise. BVC targets other human voices near the primary speaker (and also reduces noise and reverberation), which is especially useful in call centers and shared office environments.

Where should I apply Krisp in a voice AI agent?

Typically before VAD and before STT so those components see cleaner speech. This can reduce false interruptions and improve transcription quality.

How do Krisp webhooks work?

You configure an HTTPS endpoint and (optionally) authentication headers. Krisp posts meeting outputs to your endpoint when events occur (for example transcript created, notes generated, outline generated). Your endpoint should be idempotent and queue-based.

When should I use the OAuth Platform API?

Use it when you are building a real “integration app” that many users install and authorize, and you want scoped access and subscription management (create/list/update/delete subscriptions for meeting-note events).

Is Krisp SDK free?

Krisp’s SDK documentation describes it as a commercial product and indicates you need a commercial license to integrate it into your products.

14) Official resources to bookmark

Links

Krisp Developer Hub / SDK Docs (AI Voice SDK documentation and guides)
Integrations docs (Twilio Voice, Amazon Connect, Jitsi, SIP.js, React, Electron)
Krisp Platform API (Postman public workspace docs: OAuth2 + subscriptions)
Krisp Meeting Assistant Webhook API (help center article)
Krisp STT API announcement (blog post describing positioning and capabilities)

If you want, tell me what you’re building (voice agent, web calling app, call center desktop, CRM integration), and I’ll draft a tighter architecture diagram + integration checklist for that specific use case.