Background speech denoising

Overview

Background speech denoising helps create clearer conversations by filtering out unwanted sounds while users speak. Vapi offers two complementary denoising technologies that can be used independently or together for optimal results.

In this guide, you’ll learn to:

Enable Smart Denoising using Krisp technology (recommended for most users)
Configure experimental Fourier denoising with customizable parameters
Combine both methods for enhanced noise reduction
Fine-tune settings for different environments

For most use cases, Smart Denoising alone provides excellent results. Fourier denoising is a highly experimental feature that requires significant tuning and may not work well in all environments.

Denoising methods

Smart Denoising (Krisp)

Smart Denoising uses Krisp’s AI-powered technology to remove background noise in real-time. This method is highly effective for common noise sources like:

Keyboard typing
Background conversations
Traffic and street noise
Air conditioning and fans
Pet sounds

Fourier Denoising (Experimental)

Fourier denoising uses frequency-domain filtering to remove consistent background noise. This experimental method offers fine-grained control through multiple parameters and includes automatic media detection for TV/music/radio backgrounds.

Fourier denoising is highly experimental and comes with significant limitations:

Requires extensive tweaking to work properly
May not work well in all audio environments (e.g., when headphones are used)
Can introduce audio artifacts or distortions
Should only be used when Smart Denoising alone is insufficient

For most users, Smart Denoising should be sufficient. Only proceed with Fourier denoising if you have specific requirements and are prepared to test extensively.

Configuration

Background speech denoising is configured through the backgroundSpeechDenoisingPlan property on your assistant:

1 import { VapiClient } from "@vapi-ai/server-sdk";
2 
3 const vapi = new VapiClient({ 
4   token: process.env.VAPI_API_KEY 
5 });
6 
7 const assistant = await vapi.assistants.create({
8   name: "Customer Support",
9   backgroundSpeechDenoisingPlan: {
10     // Enable Smart Denoising
11     smartDenoisingPlan: {
12       enabled: true
13     },
14     // Enable Fourier Denoising (optional)
15     fourierDenoisingPlan: {
16       enabled: true,
17       mediaDetectionEnabled: true,
18       staticThreshold: -35,
19       baselineOffsetDb: -15,
20       windowSizeMs: 3000,
21       baselinePercentile: 85
22     }
23   }
24 });

Smart Denoising configuration

Smart Denoising has a simple on/off configuration:

smartDenoisingPlan.enabled

booleanDefaults to false

Enable or disable Krisp-powered smart denoising

Example: Smart Denoising only

1 const assistant = await vapi.assistants.create({
2   name: "Support Agent",
3   backgroundSpeechDenoisingPlan: {
4     smartDenoisingPlan: {
5       enabled: true
6     }
7   }
8 });

Fourier Denoising configuration

Fourier denoising offers multiple parameters for fine-tuning:

fourierDenoisingPlan.enabled

booleanDefaults to false

Enable or disable experimental Fourier denoising

fourierDenoisingPlan.mediaDetectionEnabled

booleanDefaults to true

Automatically detect and filter consistent background media (TV/music/radio)

fourierDenoisingPlan.staticThreshold

numberDefaults to -35

Fallback threshold in dB when no baseline is established (-80 to 0)

fourierDenoisingPlan.baselineOffsetDb

numberDefaults to -15

How far below the rolling baseline to filter audio, in dB (-30 to -5)

Lower values (e.g., -10) = more aggressive filtering
Higher values (e.g., -20) = more conservative filtering

fourierDenoisingPlan.windowSizeMs

numberDefaults to 3000

Rolling window size in milliseconds for baseline calculation (1000 to 30000)

Larger windows = slower adaptation, more stability
Smaller windows = faster adaptation, less stability

fourierDenoisingPlan.baselinePercentile

numberDefaults to 85

Percentile for baseline calculation (1 to 99)

Higher percentiles (e.g., 85) = focus on louder speech
Lower percentiles (e.g., 50) = include quieter speech

Example: Adding Fourier Denoising to Smart Denoising

1 const assistant = await vapi.assistants.create({
2   name: "Call Center Agent",
3   backgroundSpeechDenoisingPlan: {
4     // Always enable Smart Denoising first
5     smartDenoisingPlan: {
6       enabled: true
7     },
8     // Add Fourier Denoising for additional filtering
9     fourierDenoisingPlan: {
10       enabled: true,
11       mediaDetectionEnabled: true,
12       // More aggressive filtering for noisy environments
13       baselineOffsetDb: -10,
14       // Faster adaptation for dynamic environments
15       windowSizeMs: 2000,
16       // Focus on louder, clearer speech
17       baselinePercentile: 90
18     }
19   }
20 });

Combined denoising

For maximum noise reduction, combine both methods. Processing order:

Smart Denoising (Krisp) processes first
Fourier Denoising processes the Krisp output

Environment-specific configurations

Quiet office environment

Minimal speech denoising for clear environments:

1 const assistant = await vapi.assistants.create({
2   name: "Office Assistant",
3   backgroundSpeechDenoisingPlan: {
4     smartDenoisingPlan: {
5       enabled: true
6     }
7     // No Fourier denoising needed
8   }
9 });

Noisy call center

Aggressive filtering for high-noise environments:

1 const assistant = await vapi.assistants.create({
2   name: "Call Center Agent",
3   backgroundSpeechDenoisingPlan: {
4     smartDenoisingPlan: {
5       enabled: true
6     },
7     fourierDenoisingPlan: {
8       enabled: true,
9       mediaDetectionEnabled: true,
10       baselineOffsetDb: -10, // Aggressive filtering
11       windowSizeMs: 2000,    // Fast adaptation
12       baselinePercentile: 90 // Focus on clear speech
13     }
14   }
15 });

Home environment with TV/music

Optimized for media background noise:

1 const assistant = await vapi.assistants.create({
2   name: "Home Assistant",
3   backgroundSpeechDenoisingPlan: {
4     smartDenoisingPlan: {
5       enabled: true
6     },
7     fourierDenoisingPlan: {
8       enabled: true,
9       mediaDetectionEnabled: true, // Essential for TV/music
10       baselineOffsetDb: -15,
11       windowSizeMs: 4000,
12       baselinePercentile: 80
13     }
14   }
15 });

Best practices

For most users, Smart Denoising alone is the recommended solution. It handles the vast majority of common noise scenarios effectively without configuration complexity. Only consider adding Fourier denoising if you have specific requirements that Smart Denoising cannot address.

When to use each method

Smart Denoising only:

General-purpose noise reduction
Unpredictable noise patterns
When simplicity is preferred

Smart Denoising + Fourier Denoising:

Maximum noise reduction required
Consistent background noise that Smart Denoising alone cannot fully handle
Complex acoustic environments with media (TV/music/radio)
Premium user experiences requiring fine-tuned control
Willing to invest time in testing and tuning
Not using headphones (Fourier may cause issues with headphone audio)

Fourier Denoising should never be used alone. It’s designed to complement Smart Denoising by providing additional filtering after Krisp has done the initial noise reduction.

Performance considerations

Audio quality: Aggressive filtering may affect voice quality. Test different settings to find the right balance between noise reduction and natural speech preservation.

Testing recommendations

Test in your target environment
Start with default settings
Adjust parameters incrementally
Monitor user feedback
A/B test different configurations

Troubleshooting fourier denoising

Voice sounds robotic or distorted

Reduce filtering aggressiveness:

Increase baselineOffsetDb (e.g., -20 instead of -15)
Decrease baselinePercentile (e.g., 75 instead of 85)
Try Smart Denoising only

Background noise still audible

Increase filtering:

Enable both denoising methods
Decrease baselineOffsetDb (e.g., -12 instead of -15)
Ensure mediaDetectionEnabled is true for TV/music

Speech cutting out intermittently

Adjust detection sensitivity:

Increase windowSizeMs for more stability
Adjust staticThreshold if baseline isn’t establishing
Check if user’s voice level is consistent