Meta AI's Segment Anything Model (SAM) is a computer vision system that can segment objects within any image with a single click. The AI model can generalize to unknown objects and images with zero-shot generalization without needing any further training.
SAM is designed to accept a variety of input prompts, including interactive points, bounding boxes and text, which means you can use it for a variety of segmentation tasks. For example, you can ask SAM to automatically segment everything in an image or to generate multiple masks for ambiguous prompts. Its promptable design means it can be integrated with other systems, for example, accepting user input from AR/VR headsets to select objects.
The output masks from SAM can be used as input to other AI systems for tasks like tracking objects in video, editing images, creating 3D objects or collaging. The model's abilities are made possible by training on more than 11 million images and 1 billion masks using its interactive data engine.
SAM is made of a single image encoder and a lightweight mask decoder that can run in a web browser in a few milliseconds per prompt. That fast response time means the model can power its own data engine.
SAM supports a variety of platforms, including PyTorch and ONNX, and can run on both CPU and GPU. The model is large, with the image encoder having 632M parameters and the prompt encoder and mask decoder having 4M parameters.
The project is open-sourced and available on GitHub. Pricing and specific use-case details aren't shared. For more information and to try out SAM for yourself, check out the company's website.
Published on June 9, 2024
Analyzing Segment Anything Model...