Segment Anything From Meta

Otman HeddouchOtman Heddouch
2 min read

Segment Anything is state-of-the-art sementic segementation that was introduced in 2023 by Meta .

Segement Anything identify which image pixels belong to an object . One of the challenging tasks in computer vision and it used on various application from medical imaging, object detection, data annotation to editing photos .

Segement Anything has trained on SA-1B dataset, the largest ever segmentation dataset .

How it works ?

Segement anything returns a valid segmentation mask for any prompt, where a prompt can be a box, mask, freeform text or even foreground/background points

Under the hood, the input image is encoded to an image embedding using Mask AutoEncoder (MEA) and pre-trained vision transformers (ViT) .

The first part in image encoder is MEA. The input image is splited into different patches (Patch embedding) using Convolution layer then we added it to position embedded to retain the positional informations .The second part is Vision transformers ;

For the prompt which can be text, box or points are encoded to an embedding vector. The two information are combined in lightweight mask decoder that produces a valid masks with the confidence score each . A valid masks means even if the prompt is ambigious and refer to multiple objects, the outputs should be reasonable mask for at least one of the objects .

source : https://ai.meta.com/blog/segment-anything-foundation-model-image-segmentation/

SAM has trained on a high quality dataset contain over 1 billion masks .

What makes SAM achieve strong generalization is “data engine” solution. This solution used SAM to interactively annotate images, and then the newly annotated data used to update SAM in turn .

Right now this model is used in almost every data labelling platforms to automate labeling such as Roboflow .

More resources: