tech
January 28, 2026
Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation
SAM Audio transforms audio processing by making it easy to isolate any sound from complex audio mixtures using natural, multimodal prompts — whether...

TL;DR
- SAM Audio is a new unified multimodal model for audio separation using text, visual, or time segment prompts.
- It is powered by the Perception Encoder Audiovisual (PE-AV) engine, an advancement of the Perception Encoder model.
- SAM Audio-Bench is introduced as the first in-the-wild audio separation benchmark.
- SAM Audio Judge is a new automatic model for evaluating audio separation quality based on perceptual criteria.
- The model offers state-of-the-art performance, faster-than-real-time processing, and supports multimodal prompting.
- Limitations include no support for audio as a prompt and challenges in separating highly similar sounds.