tech

January 28, 2026

Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation

SAM Audio transforms audio processing by making it easy to isolate any sound from complex audio mixtures using natural, multimodal prompts — whether...

Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation

TL;DR

  • SAM Audio is a new unified multimodal model for audio separation using text, visual, or time segment prompts.
  • It is powered by the Perception Encoder Audiovisual (PE-AV) engine, an advancement of the Perception Encoder model.
  • SAM Audio-Bench is introduced as the first in-the-wild audio separation benchmark.
  • SAM Audio Judge is a new automatic model for evaluating audio separation quality based on perceptual criteria.
  • The model offers state-of-the-art performance, faster-than-real-time processing, and supports multimodal prompting.
  • Limitations include no support for audio as a prompt and challenges in separating highly similar sounds.