🎤 Speaker Separation with Voice Activity Detection

Separate mixed audio into individual speakers

Choose between standard separation or separation with Voice Activity Detection (VAD).

📝 Example Audio

Select an example audio file below, or upload your own!

🎵 Input

Select Example Audio

📁 Upload or 🎙️ Record Mixed Audio

0:00

🤖 Model Selection

VAD models provide voice activity detection for each speaker

🔧 Separation Only 🚀 Separation With VAD

📊 Processing Status

🎧 Separated Audio Outputs

👤 Speaker 1

👤 Speaker 2

📊 Audio Spectrograms

🎵 Mixed Audio Spectrogram

👤 Speaker 1 Spectrogram (with VAD overlay)

👤 Speaker 2 Spectrogram (with VAD overlay)

📋 Instructions:

Upload an audio file or record directly using the microphone
Select your preferred model (with or without VAD)
If using VAD, adjust the threshold as needed
Click "Separate Speakers" to process
Download the separated audio files and view the spectrograms

🔧 Technical Notes:

Audio is automatically resampled to 16kHz
Multi-channel audio uses the first channel
Spectrograms: Show frequency content over time with VAD activity highlighted
VAD Overlay: A white line at the top indicates when the speaker is active

📖 Reference

Opochinsky, R., Moradi, M., & Gannot, S. (2025).
Single-microphone speaker separation and voice activity detection in noisy and reverberant environments.
EURASIP Journal on Audio, Speech, and Music Processing, 2025(1), 18. Springer.

📋 BibTeX Citation

@article{opochinsky2025single,
title={Single-microphone speaker separation and voice activity detection in noisy and reverberant environments},
author={Opochinsky, Renana and Moradi, Mordehay and Gannot, Sharon},
journal={EURASIP Journal on Audio, Speech, and Music Processing},
volume={2025},
number={1},
pages={18},
year={2025},
publisher={Springer}
}