Krisp API: what developers can integrate (SDKs, webhooks, and OAuth-based platform endpoints)
“Krisp API” is a search term people use when they want to build noise-free calling, real-time voice AI agents, meeting transcription pipelines, or a “send summaries to my app” integration. Krisp’s developer story spans multiple surfaces: real-time AI Voice SDKs (running on-device, in the browser, and on servers), as well as webhook-style automations for meeting outputs and an OAuth-based “Platform API” centered on subscriptions to meeting-note events.
This page is designed for builders. It explains what Krisp exposes publicly, how those pieces fit together, and how to design a production architecture that is reliable (and not fragile “network tab reverse engineering”).
What most developers want
A clean way to improve audio quality (noise/echo/voices), power voice agents, and connect transcripts/summaries to downstream tools.
What Krisp provides
Commercial SDKs for real-time voice enhancement across server, browser, desktop, and mobile—plus meeting assistant webhook automation and an OAuth subscriptions API.
How to choose the right surface
Use the SDK for real-time audio processing. Use webhooks/OAuth Platform API for meeting-note automation and event delivery. Use STT API if you need transcription at scale.
Krisp’s product lineup and developer docs can change. Always confirm the latest licensing, security expectations, and platform capabilities with Krisp’s official documentation before shipping production features.
1) What “Krisp API” can mean
DefinitionsThe phrase “Krisp API” is overloaded. In practice, it usually refers to one (or more) of these:
- Krisp AI Voice SDK — libraries you embed into your application to process audio in real time. This includes server-side components for voice AI agents and on-device/browser components for conferencing or call center apps.
- Meeting Assistant automation — webhook-style delivery of meeting outputs like transcripts, notes, and outlines to another system.
- Krisp Platform API — OAuth 2.0-based endpoints (documented via Postman) that focus on subscriptions and event notifications around meeting notes/summaries/action items.
- Speech-to-Text (STT) API — Krisp’s speech-to-text offering intended for call centers and BPO workflows, positioned as cost-efficient and privacy-focused.
A useful way to pick the right path is to decide what you’re building: if you need real-time audio enhancement, reach for the Voice SDK. If you need outputs and automations (transcripts, summaries, action items), reach for webhooks and/or the OAuth subscriptions API. If your core need is transcription at scale (call center analytics, compliance monitoring, QA), evaluate the STT API.
| Surface | Best for | Runs where? | Typical integration pattern |
|---|---|---|---|
| AI Voice SDK | Noise/voice/echo reduction, accent conversion, voice isolation, turn-taking | Server, browser, desktop, mobile | Embed SDK into audio path (WebRTC, SIP, voice agent stack) |
| Meeting Assistant Webhook API | Send transcripts/notes/outlines to your system | Krisp → your HTTPS endpoint | Configure webhook URL + auth headers; receive POSTs on events |
| Platform API (OAuth2) | Subscribe to meeting-note events; manage subscriptions | Cloud endpoint (Krisp) + your backend | OAuth2 authorization code with PKCE + webhook subscriptions |
| Speech-to-Text API | Transcription for CX/call center workflows | Device-side processing + your app | Integrate into existing CX/voice platforms; use for analytics/QA |
2) Krisp AI Voice SDK: the “API” most builders actually need
CoreKrisp describes its SDK as a collection of real-time AI-powered technologies that improve speech clarity in real-time communication applications such as conversational AI, conferencing, collaboration, streaming/podcasting, and mobile calls. The developer hub presents multiple SDK tracks and components.
Think of the Krisp Voice SDK as a set of audio filters + models that you wire into your application’s audio path. Instead of “send audio to the cloud and get audio back,” the design emphasizes low-latency, real-time use with the audio staying under your control.
2.1 What Krisp’s models do (plain English)
- Noise Cancellation (NC): removes background noise in real time. There are distinct models for outbound (microphone) and inbound (speaker) streams.
- Background Voice Cancellation (BVC): removes other people’s voices near the primary speaker (useful for call centers and shared offices).
- Accent Conversion (AC): converts an agent’s accent to a target accent in real time for clearer calls.
- Voice Isolation (VIVA): designed for voice AI agents—removes background voices/noise to reduce false interruptions and improve turn-taking/STT accuracy.
- Turn-Taking: detects likely end-of-turn moments to make AI agents respond more naturally and avoid awkward overlaps.
2.2 Where the SDK runs
Krisp’s docs outline server-side usage (especially for voice AI agents) as well as browser, desktop, and mobile environments. Because many components are C-library based, they can be integrated into a wide range of stacks, including voice agent frameworks and WebRTC pipelines.
Treat Krisp as an “audio processing layer” rather than a typical REST API. Your job is to place the filter at the right point in the audio chain (before your STT, before your agent’s VAD, before sending audio to the network, etc.).
3) SDK families: VIVA (voice agents) vs RTC/on-device (conferencing & calls)
ArchitectureKrisp documentation distinguishes between server-side VIVA components aimed at voice AI agents and RTC/on-device components aimed at real-time communications (RTC) scenarios like calls and meetings.
3.1 VIVA SDK (server-side) for voice AI agents
The VIVA SDK is positioned to improve turn-taking and STT accuracy in voice AI agents using small CPU-based models (voice isolation, turn-taking, VAD). The idea is simple: clean the inbound audio and stabilize the turn-taking signal so the agent interrupts less and understands more reliably.
A typical voice agent pipeline looks like this:
- Ingest audio (WebRTC, SIP, telephony, or a streaming transport)
- Apply voice isolation/noise/voice cancellation layer
- Run VAD + turn-taking detection
- Feed STT with cleaner audio
- Agent reasoning + response generation
- TTS output (optionally post-processed for clarity)
3.2 RTC / on-device SDK for calls, meetings, and conferencing
In RTC scenarios, the SDK usually sits inside the client application (desktop/mobile/web) and processes microphone and/or speaker streams before they are transmitted or played. That’s how you can improve audio for end users without requiring them to install a separate audio device or system driver.
Krisp’s docs also highlight multiple integration guides for popular platforms and frameworks—particularly for web-based apps using a JavaScript SDK approach and for well-known RTC stacks.
4) Krisp models in detail: NC, BVC, and AC
Deep diveBelow is a developer-friendly explanation of the main RTC-oriented models as described in Krisp’s documentation. (Even if your ultimate goal is “just remove background noise,” it helps to understand the model boundaries and constraints.)
4.1 Noise Cancellation (NC)
Noise Cancellation is designed to remove background noise during real-time communication. Krisp’s documentation explicitly differentiates between outbound (microphone) NC and inbound (speaker) NC, reflecting that the acoustic and network realities are different for each direction.
- Outbound NC: optimized for near-field microphone conditions (close to the mouth). Performance varies in far-field environments depending on distance, echo, SNR, and device characteristics.
- Inbound NC: handles overlapping speakers and robustness against codec degradations and low bandwidth scenarios; it is designed for the “speaker” stream.
- De-reverberation: the NC process includes de-reverberation, aiming to reduce room echo during processing.
Practical implication: if you are building a call center desktop app, you may want outbound NC (mic) by default and consider inbound NC selectively depending on how noisy the agent’s environment is and whether the speaker stream is degraded.
4.2 Model sizes and latency thinking
Krisp documentation references small models (default) and big models (on demand) for noise cancellation, trading CPU cost for quality. From an engineering perspective, this is a classic quality-performance trade-off: you might pick small models for low-end hardware, and big models for high-end systems or specialized deployments.
On latency: real-time audio filters must be fast. Krisp’s docs illustrate that algorithmic latency depends on sampling rate and frame duration, and that with common settings like 10ms frames at 16kHz, the latency can be in the tens of milliseconds range. This matters because it directly affects how “natural” live conversations feel.
4.3 Background Voice Cancellation (BVC)
BVC is built to remove other voices near the primary speaker and also remove background noise and reverberation. This is especially useful for “cross-talk” scenarios (a nearby coworker’s voice leaking into the agent mic). Krisp notes that BVC does not require user enrollment or training, and it is designed to work with headsets and earbuds.
In a real deployment, BVC becomes a differentiator when:
- Agents sit close to each other in a call center floor
- Remote workers share a room with other talking people
- Hybrid offices have constant voice noise from nearby desks
4.4 Accent Conversion (AC)
Accent Conversion is described as a real-time AI voice conversion model designed to neutralize accents in call center environments. Krisp’s documentation highlights that it converts specific accent groups to a target accent and that it also removes background noise and voices. For engineering planning, treat AC as a specialized feature with stricter hardware requirements and careful UX design considerations.
| Model | Primary goal | Best environments | Engineering considerations |
|---|---|---|---|
| NC (Outbound/Inbound) | Remove background noise | Calls, meetings, conferencing | Frame size, sampling rate, CPU budget, avoid double-processing |
| BVC | Remove other voices near mic | Call centers, shared offices | Device compatibility and sampling constraints; monitor voice suppression risk |
| AC | Accent conversion to target accent | Call center agent speech | Latency budgets, CPU requirements, user experience and transparency |
5) Getting started (server-side): VIVA + real-time filtering for voice agents
QuickstartServer-side audio filtering is common when you build voice AI agents, because the agent stack often lives on servers. Krisp’s docs describe VIVA SDK as server-side and mention multiple language bindings, including Python, Node.js, Go, Rust, C++, and C.
Conceptually, you initialize the SDK, create a session/filter instance with the right configuration (sample rate, frame duration), then process audio frames in a loop. The SDK returns processed frames that you can forward to STT or any downstream signal processing.
// PSEUDOCODE (conceptual) — your actual code depends on Krisp's SDK bindings
initKrispSdk(licenseKeyOrPath);
const cfg = {
inputSampleRate: 16000,
frameDurationMs: 10,
// model path or model selection settings
};
const filter = createNoiseOrVoiceIsolationFilter(cfg);
while (streamHasFrames) {
const frame = readNextPcmFrame(); // e.g., Float32 or Int16 PCM
const clean = filter.process(frame); // processed frame, same duration
sendToStt(clean); // or to VAD, or to agent pipeline
}
filter.dispose();
shutdownKrispSdk();
This job-loop style is the standard pattern for audio DSP and makes it easy to reason about latency: every frame you process adds some algorithmic overhead. If you keep frames small and keep CPU overhead under control, the conversation remains natural.
If your agent uses VAD and turn-taking, place voice isolation/noise cancellation before VAD so the VAD sees cleaner speech. If your agent uses STT, place filtering before the STT input to reduce WER.
6) Supported platforms & integrations (browser + RTC ecosystem)
CompatibilityKrisp’s platform documentation emphasizes broad deployment across servers, desktop, mobile, OS, browsers, and frameworks. The docs point out that because core components are C-library based, integration is possible across many audio stacks, and also reference specific frameworks where Krisp filters are built-in.
6.1 Supported OS (server-side)
Krisp’s server-side platforms list includes Linux (x64, armv8a), Windows (x64), and macOS (x64, armv8a). This is useful if you deploy voice agents on Linux servers, run local speech infrastructure on Windows, or ship macOS tooling.
6.2 Integration guides (web/RTC)
Krisp’s JavaScript SDK documentation highlights integration paths with popular communication and RTC frameworks and platforms. For web-based voice and calling, this matters because you usually want “drop-in noise cancellation” without rewriting your entire media stack.
- Twilio Voice: Krisp documents a guide for integrating with Twilio’s Audio Processor API (notable for web calling apps).
- Amazon Connect: guidance for contact center use cases.
- Jitsi: guidance for self-hosted conferencing.
- SIP.js: guidance for SIP endpoints in the browser.
- React: integration guide for React applications.
- Electron: guidance for desktop apps built with web technologies.
The big idea: Krisp tries to meet you where you already are—WebRTC, Twilio Voice, SIP, and “framework world.” That reduces integration time and improves adoption.
If you apply noise cancellation in multiple places (device, browser, agent, server), the second model may see “already processed” audio and produce worse results. Pick a single primary place to apply Krisp and measure quality end-to-end.
7) Licensing & security: what production teams must plan for
GovernanceKrisp’s SDK documentation states that the SDK is a commercial product and that developers need to obtain a commercial license to integrate it into their products. That means licensing is not an afterthought—your architecture should include a clear plan for how licenses are provisioned, validated, and rotated.
7.1 Why licensing affects architecture
- Deployment model: server-side deployment vs on-device deployment affects how license verification is performed.
- Environment separation: you’ll likely need different keys/licenses for dev, staging, and production.
- Operational readiness: you should have monitoring and alerting for license verification failures (because that can break audio quality features in real time).
7.2 Security posture (as described in the SDK docs)
Krisp’s security documentation claims that Krisp does not access, collect, or store audio data, and that the SDK is designed to operate on the customer’s premises, with processing performed on the customer side. It also notes that the SDK accesses the network only for license verification needs and that metadata generation is activated only upon explicit instruction.
From a builder’s standpoint, this implies a privacy-friendly model where audio remains under your control. It also implies you must:
- Review how license verification networking behaves in your environment
- Decide whether optional metadata generation is enabled
- Document your privacy posture clearly to end users (especially in regulated industries)
If you are in a regulated environment (healthcare, finance, government), run a privacy review that includes data flow diagrams: where audio exists, where it is processed, what gets logged, and who can access outputs.
8) Krisp Meeting Assistant Webhook API: sending transcripts/notes/outlines to your system
AutomationKrisp also offers a Webhook API in its help documentation for automatically sending meeting outputs—like transcripts, notes, and outlines—to another tool or internal system via an HTTPS endpoint.
8.1 What events can be sent?
The help doc describes events that fire when:
- A transcript is created
- Notes are generated (key points/action items)
- An outline is generated
8.2 What you need before enabling
You typically need:
- A Webhook URL that can receive HTTPS POST requests
- Optional authentication headers (for example, a token) if your endpoint requires auth
8.3 Recommended webhook receiver design
Don’t treat webhooks as a “fire-and-forget magic pipe.” Make them durable:
- Verify authenticity: validate any provided signatures or shared secrets if available; otherwise use per-endpoint tokens and strict IP allowlists when practical.
- Idempotency: store an event id or hash so retries don’t create duplicates.
- Queue: immediately enqueue the payload and return 200 OK; process asynchronously.
- Observability: log event type + meeting id + timestamp; capture failures with alerts.
// Example webhook receiver (pseudo-Node/Express)
app.post("/krisp/webhook", async (req, res) => {
// 1) authenticate (token header, signature, etc.)
const token = req.header("X-Webhook-Token");
if (token !== process.env.KRISP_WEBHOOK_TOKEN) return res.status(401).send("Unauthorized");
// 2) idempotency key
const eventId = req.body?.event_id || hash(req.body);
// 3) enqueue for processing
await queue.publish("krisp_events", { eventId, payload: req.body });
// 4) acknowledge fast
res.status(200).send("OK");
});
With this pattern, even if your downstream systems are slow or temporarily down, you can still accept events reliably and process them later.
9) Krisp Platform API (OAuth 2.0): subscriptions + meeting-note events
OAuthKrisp has documentation in a Postman public workspace for a “Krisp Platform API.” It describes a RESTful API using JSON output, with OAuth 2.0 authorization. The stated current capabilities focus on subscribing to meeting note-related events and triggering actions within a connected application.
9.1 OAuth 2.0 model (high-level)
OAuth-based APIs are designed for scenarios where an end user authorizes your app to access specific data. In practice:
- Your app sends the user to an authorization URL with requested scopes
- User grants access
- Your backend exchanges authorization code for an access token (and often a refresh token)
- Your backend calls Krisp API endpoints with the access token
The Postman docs show scopes for managing subscriptions and a “meetings read” scope that is described as necessary to subscribe to meeting-related updates. The docs also illustrate authorization code + PKCE concepts (code_verifier/code_challenge) and token endpoints for exchanging and refreshing tokens.
9.2 Subscriptions API: connect events to your webhook
The Postman documentation includes subscription endpoints such as:
GET /subscriptions(list)GET /subscriptions/:id(get)POST /subscriptions(create)PUT /subscriptions/:id(update)DELETE /subscriptions/:id(delete)
For event types, the docs highlight meeting-note related events like summary_generated and action_items_generated.
A subscription payload can include the event type, a subscription type (webhook), and an HTTPS destination URL.
// Conceptual JSON body to create a webhook subscription
{
"event": "summary_generated",
"type": "webhook",
"details": { "url": "https://example.com/webhooks/krisp" },
"params": {},
"status": "enabled"
}
9.3 When to use Platform API vs Meeting Webhook API
They can look similar (both can deliver events), but they solve different integration needs:
- Meeting Assistant Webhook API is configured in-product and is a fast path for “send meeting outputs to my system.”
- Platform API is developer/OAuth-driven and is suitable when you are building an application integration that many users will install, authorize, and manage with scopes.
If you’re building a multi-tenant SaaS integration, prefer OAuth + scoped subscriptions so each customer can grant and revoke access cleanly. If you’re building an internal integration for one organization, the in-product webhook approach may be simpler.
10) Krisp Speech-to-Text (STT) API: transcription for call centers & BPOs
TranscriptionKrisp announced a Speech-to-Text (STT) API positioned for call centers and BPO environments. The announcement emphasizes device-side processing, real-time redaction capabilities (PII/PCI), out-of-the-box compatibility with CX and voice platforms, and a focus on cost efficiency. It also cites a word error rate (WER) figure from evaluations across datasets.
10.1 How to think about STT API in your architecture
In call center systems, transcription is rarely the end goal. Usually you want transcription because it powers:
- Quality assurance (QA): scoring calls, coaching agents, and analyzing objection handling
- Compliance: detecting disclosures, required statements, and risky phrases
- Speech analytics: discovering themes, churn signals, customer intent, and sentiment proxies
- Agent assist: real-time prompts, knowledge lookup, recommended next steps
- Summaries and action items: structured notes for CRM
10.2 Real-time redaction and privacy expectations
If your business handles sensitive information (credit cards, addresses, account numbers), real-time redaction can be a major requirement. From an engineering viewpoint, design your pipeline so sensitive snippets are redacted before storage, indexing, or downstream analytics. Even if you trust your storage, least-privilege design is the safest route.
Define retention rules (how long audio and transcripts are kept), enforce role-based access, and ensure your analytics pipelines never store raw sensitive strings when redaction is enabled.
11) Use cases: what you can build with the Krisp “API ecosystem”
IdeasThe easiest way to understand Krisp’s developer surfaces is to map them to shipping products. Below are common build patterns, with notes on which Krisp surface is usually the best fit.
11.1 Voice AI agent with fewer interruptions
- Use: VIVA server-side voice isolation + turn-taking
- Why: voice isolation reduces cross-talk and background voices; turn-taking reduces false interrupts and awkward overlaps
- Pair with: your STT + LLM + TTS stack
11.2 Web calling app with “Krisp-quality audio”
- Use: JS/Browser SDK integrated into WebRTC (or via Twilio Voice audio processing)
- Why: improves mic stream quality before sending audio over the network
- Pair with: echo management, AGC, and your RTC platform’s jitter buffer/transport
11.3 Call center desktop app with cross-talk protection
- Use: BVC + NC in desktop/on-device stack
- Why: call center floors are voice-noisy; BVC helps suppress other agents’ speech leaking into the mic
- Pair with: monitoring of CPU usage and “quality modes” for different hardware tiers
11.4 Meeting summaries automatically sent to CRM
- Use: Meeting Assistant Webhook API (fastest) or OAuth Platform API (multi-tenant)
- Why: webhook events let you sync transcripts, notes, outlines, summaries, and action items into your systems
- Pair with: idempotent receiver, queue-based processing, and CRM schema mapping
11.5 Compliance monitoring and analytics from transcripts
- Use: STT API + analytics pipeline (or STT output from your platform)
- Why: transcription is the substrate for scoring, compliance checks, and speech analytics
- Pair with: PII/PCI redaction, strict access control, retention policies
12) Production architecture: reliability, latency, and “don’t make it worse” rules
Best practicesAudio is unforgiving. A great demo can become a terrible product if you don’t manage latency, CPU budgets, and signal-chain complexity. Here are practical engineering rules to keep quality high.
12.1 Decide where audio processing happens
If you process in the browser, the user’s device bears the CPU cost and you reduce network dependency. If you process on the server, you standardize results and can centralize monitoring—but you pay for compute and you must manage stream transport. In many real products, you’ll do one primary layer (client or server) and keep the other layer minimal.
12.2 Avoid “double processing”
Applying noise cancellation multiple times can create warbling artifacts and unnatural voice gating. Pick a single main noise cancellation layer and ensure other layers are disabled or set to “raw” wherever possible.
12.3 Provide a quality mode switch
Because models can have different CPU costs (and Krisp mentions small vs big model variants), it’s smart to expose “Balanced” vs “High Quality” modes. You can default to balanced and allow “high quality” for high-end devices or when CPU headroom is available.
12.4 Instrument and monitor
- CPU usage during calls (per core, per process)
- Audio frame processing time (p50/p95/p99)
- End-to-end latency (mic → remote hear)
- Dropouts and buffer underruns
- User-reported “robotic voice” incidents
12.5 Make failure safe
If the filter fails (license verification issues, initialization error, incompatible device), your app should continue to function— just without enhancement. A “hard failure” that breaks the call is unacceptable.
Always have a safe fallback path to unprocessed audio. Don’t let “better audio” become “no audio.”
13) FAQ
Quick answersIs Krisp an API or an SDK?
For most developer use cases, Krisp is best described as an SDK: you embed it into your app’s audio path to process audio in real time. There are also webhook and OAuth-based platform endpoints for delivering meeting outputs and subscribing to meeting-note events.
What’s the difference between Noise Cancellation (NC) and Background Voice Cancellation (BVC)?
NC targets non-speech background noise. BVC targets other human voices near the primary speaker (and also reduces noise and reverberation), which is especially useful in call centers and shared office environments.
Where should I apply Krisp in a voice AI agent?
Typically before VAD and before STT so those components see cleaner speech. This can reduce false interruptions and improve transcription quality.
How do Krisp webhooks work?
You configure an HTTPS endpoint and (optionally) authentication headers. Krisp posts meeting outputs to your endpoint when events occur (for example transcript created, notes generated, outline generated). Your endpoint should be idempotent and queue-based.
When should I use the OAuth Platform API?
Use it when you are building a real “integration app” that many users install and authorize, and you want scoped access and subscription management (create/list/update/delete subscriptions for meeting-note events).
Is Krisp SDK free?
Krisp’s SDK documentation describes it as a commercial product and indicates you need a commercial license to integrate it into your products.
14) Official resources to bookmark
Links- Krisp Developer Hub / SDK Docs (AI Voice SDK documentation and guides)
- Integrations docs (Twilio Voice, Amazon Connect, Jitsi, SIP.js, React, Electron)
- Krisp Platform API (Postman public workspace docs: OAuth2 + subscriptions)
- Krisp Meeting Assistant Webhook API (help center article)
- Krisp STT API announcement (blog post describing positioning and capabilities)
If you want, tell me what you’re building (voice agent, web calling app, call center desktop, CRM integration), and I’ll draft a tighter architecture diagram + integration checklist for that specific use case.