Meta is extending its "Segment Anything" approach into the audio domain, with a focus on echoes and 3D modeling. The new AI model, SAM Audio, can isolate individual sound sources from complex audio mixtures using text commands, temporal markers, or visual clicks.
According to Meta, this system is the first unified model capable of handling such tasks across diverse input modalities. Instead of relying on separate tools for different use cases, it flexibly responds to any type of user command.
The system offers three interchangeable control methods. Users can enter text prompts—such as “barking dog” or “singing voice”—to extract specific sounds. They can click directly on objects or people in a video to capture corresponding audio. Alternatively, they can use time-based markers, known as span prompts, to identify segments where the target sound occurs.
Potential applications span music production, podcasting, and film editing—such as removing traffic noise from outdoor footage or isolating instruments within a recording.