Meta has expanded its open-source Segment Anything computer vision model suite with the launch of SAM 3 and SAM 3D, significantly enhancing object recognition and 3D reconstruction capabilities.
According to Meta, Segment Anything 3 (SAM 3) can detect and track objects in images and videos using text prompts, while SAM 3D generates highly realistic 3D representations of objects or people from any input image.
Both SAM 3 and SAM 3D fall under the category of "image segmentation" models. Image segmentation—a subfield of computer vision—aims to train algorithms to identify specific objects within images or video frames. This technology is widely used in applications ranging from satellite imagery analysis to photo editing.
Meta is widely recognized as a leader in image segmentation. The original Segment Anything Model (SAM) debuted in April 2023 alongside a massive dataset containing millions of annotated object images, designed to support the open-source AI research community.
A Breakthrough in Object Recognition
SAM 3 builds upon the foundation of the original SAM model, offering improved accuracy in detecting, segmenting, and tracking individual objects in both images and videos. It also enables users to manipulate segmented objects through detailed text prompts. For instance, a user could upload a photo of themselves wearing a blue shirt and request the model to change it to red.
Meta describes this as a significant advancement, noting that AI models have historically struggled to link natural language inputs with specific visual elements in images or videos. While many existing models can segment broad categories like “bus” or “car,” they often rely on limited sets of predefined text labels and fail to interpret more nuanced descriptions such as “yellow school bus.”
SAM 3 overcomes these limitations by supporting a much broader range of descriptive inputs. When prompted with “red baseball cap,” the model accurately segments all matching items in an image or video. Moreover, it can be integrated with multimodal large language models to interpret complex instructions like “a seated person who is not wearing a red baseball cap.”
Meta believes SAM 3 unlocks new possibilities for photo and video editing tools, as well as creative media applications. The company is already testing the model in its new AI-powered video creation app, Edits, where it plans to introduce special effects that users can apply to specific objects or people in videos. Additionally, SAM 3 will be integrated into Vibes—a TikTok-like platform for short-form and AI-generated videos.
Reconstructing Objects and Humans in 3D
SAM 3D takes the segmentation capabilities of SAM 3 a step further by not only identifying but also reconstructing the 3D geometry of detected objects, humans, and animals. For example, someone with a photograph of a late relative could use SAM 3D to generate a lifelike 3D model and place it into virtual reality environments or digital videos, according to Meta.
SAM 3D is powered by two specialized models: SAM 3D Objects for reconstructing general objects and scenes, and SAM 3D Body for accurately estimating human body shapes and proportions to create realistic 3D avatars.
Meta sees significant potential for SAM 3D across diverse fields such as robotics, scientific research, sports medicine, and creative industries. The technology can facilitate the creation of immersive 3D virtual worlds and augmented reality experiences based on real-world subjects, as well as streamline asset generation for video games. It also holds promise for AI-assisted 3D modeling workflows.
As part of its ongoing efforts, Meta is leveraging SAM 3D to power a new “Room View” feature on Facebook Marketplace. Shoppers browsing home decor items—such as lamps, tables, or chairs—can now visualize how these products would look in their own living rooms before making a purchase.
Both models are accessible via Meta’s new Segment Anything Playground, which requires no technical expertise to use. Users can simply upload images or videos and enter text prompts to isolate specific objects. With SAM 3D, they can even explore scenes from multiple angles and virtually rearrange objects or overlay dynamic effects like motion trails.
In line with its commitment to open research, Meta is releasing SAM 3 to the academic and developer communities, providing model weights, source code, a new evaluation benchmark, an open-vocabulary segmentation dataset, and a detailed research paper outlining the model’s architecture.
While SAM 3D is not yet fully open-sourced, Meta plans to share model checkpoints and inference code alongside a new 3D reconstruction benchmark. A comprehensive training dataset featuring diverse images and objects will also be made available to support further innovation in the field.