SAM3 – AI-based Semantic Segmentation of marine biodiversity Images and Videos

Overview
Creative Commons License: CC-BY Questions:
  • How can you automatically segment marine biodiversity objects in images or videos using a simple text prompt?

Objectives:
  • Segment marine animals in a photograph and/or video using a text prompt

Requirements:
Time estimation: 20 minutes
Translations: Français
Supporting Materials:
Published: May 7, 2026
Last modification: May 7, 2026
License: Tutorial content is licensed under Creative Commons Attribution 4.0 International License. The GTN framework is licensed under MIT
version Revision: 1

This tutorial will guide you through using the SAM3 ( Galaxy version 1.0.1+galaxy4) (Segment Anything Model 3) tool on Galaxy. SAM3 can automatically detect and segment objects in images or videos using text prompts, with no specific training required.

We will work through two concrete examples from the Moorev project:

  1. A photograph of a jellyfish (Pelagia noctiluca)
  2. A video of shrimps
Agenda: In this tutorial, we will cover:
  1. Loading data into Galaxy
  2. Segmenting an image: the jellyfish photograph
    1. Configuring SAM3 for the image
  3. Segmenting a video: the shrimp video
    1. Configuring SAM3 for the video
  4. Conclusion

Log in to Galaxy

  1. Open your preferred web browser
  2. Go to your Galaxy instance (please verify that the Galaxy instance you want to use propose SAM3 tool as Galaxy Europe is doing)
  3. Log in or create an account

Screenshot of the Galaxy interface with the login button highlighted.

This screenshot shows the Galaxy Ecology instance, available at usegalaxy.eu

The Galaxy home page is divided into 3 panels:

  • Tools on the left
  • The visualisation panel in the centre
  • The history of analyses and files on the right

Screenshot of the Galaxy interface showing the three panels.

The first time you use Galaxy, there will be no files in your history panel.

Loading data into Galaxy

Before running SAM3 Galaxy tool, you need to import the following files into Galaxy:

  • The jellyfish photo: https://zenodo.org/records/19890809/files/Moorev-jellyfish.jpg
  • The shrimp video: https://zenodo.org/records/19891364/files/2024-09-20-PorzBreign-shrimps.mp4
  • Copy the link location
  • 1- Click galaxy-upload Upload at the top of the activity panel
  • 2- Select galaxy-wf-edit Paste/Fetch Data
  • 3- Paste the link(s) into the text field https://zenodo.org/records/19890809/files/Moorev-jellyfish.jpg
  • 4- Press Start
  • 5- Close the window Galaxy upload link.

Segmenting an image: the jellyfish photograph

In this first section, we will run SAM3 Galaxy tool on the photo Moorev-jellyfish.jpg to detect and segment the jellyfish.

Type SAM3 in the tool search bar at the top left, then click on the tool in the results.

SAM3 tool search interface. Open image in new tab

Figure 1: Search for SAM3 in Galaxy

Configuring SAM3 for the image

Hands On: Segment the jellyfish in the photo
  1. SAM3 Semantic Segmentation ( Galaxy version 1.0.1+galaxy4) with these parameters:

    • param-file “Model data”: Segment Anything Model 3 (SAM 3) (default)
    • param-select “Input type”: One or more images (default)
    • param-file “Input images”: Moorev-jellyfish.jpg
    • param-select “Output formats”: COCO
    • param-text “Text prompt”: jellyfish
    • version “Confidence threshold”: 0.5
    • version “Video frame stride”: 5 (default)
    • param-toggle “Show bounding boxes on annotated output”: Yes (default)
    • param-toggle “Normalize outputs?”: No (default)

    The text prompt should describe the object to segment in English, using simple and precise terms. To detect multiple classes at once, separate them with commas: jellyfish, shrimp, fish

    Avoid overly vague descriptions like animal if you are specifically looking for a jellyfish. You can also use more descriptive prompt like small blue fish, but results may vary depending on the objects you want to detect.

  2. Click Run Tool

    Comment: Processing time

    Processing may take a few minutes depending on the image size and the resources available on the server. Wait until the outputs appear in green in the history.

  3. Once processing is complete, the following outputs appear in your history:
    • COCO Annotation: the annotations.json file containing the segmentation masks
    • Annotated Outputs: the collection of annotated images with overlaid masks
  4. Viewing the annotated result

    You should see the jellyfish outlined with a coloured mask and a bounding box.

    Click on Annotated Outputs in the history panel: Click on Annotated Outputs.

    Then use the galaxy-eye icon to display the image in the central panel:
    Click the eye icon to display.

    Or click galaxy-save to download the file directly.

    SAM3 segmentation mask on the jellyfish photo. Open image in new tab

    Figure 2: Segmentation result
  5. Exploring the COCO file
    • Look at the content of your COCO Annotation file in your history
    • Use galaxy-eye to view the JSON, or galaxy-save to download it

    The file contains the images, annotations, and categories fields. Each annotation includes:

    • segmentation: the polygon coordinates of the mask
    • bbox: the bounding box [x, y, width, height]
    • category_id: the identifier of the detected class (1 = jellyfish)

    If you need to train a YOLO model with your annotations, you can export results in YOLO format in addition to COCO. In the param-select “Output formats” parameter, select COCO and/or YOLO segmentation masks and/or YOLO bounding boxes.

    Each line in a YOLO segmentation label file follows this format:

    <class_id> <x1> <y1> <x2> <y2> ... <xn> <yn>
    

    Coordinates are normalised between 0 and 1 relative to the image dimensions. Example for a jellyfish (class 0): 0 0.423 0.312 0.456 0.298 ...

Segmenting a video: the shrimp video

In this second section, we will execute SAM3 tool to the video 2024-09-20-PorzBreign-shrimps.mp4. SAM3 model analyses the video frame by frame, tracking the shrimps over time.

Configuring SAM3 for the video

Hands On: Segment the shrimps in the video
  1. SAM3 Semantic Segmentation ( Galaxy version 1.0.1+galaxy4) with these parameters:
    • param-file “Model data”: Segment Anything Model 3 (SAM 3) (default)
    • param-select “Input type”: One video
    • param-file “Input video file”: 2024-09-20-PorzBreign-shrimps.mp4
    • param-select “Video quality”: "2000k" = video bitrate 2000 kbps (480p~720p)
    • param-select “COCO output mode”: Annotate the video — one COCO entry per frame, referencing the video file (default)
    • param-text “Text prompt”: shrimp
    • version “Confidence threshold”: 0.25 (default)
    • version “Video frame stride”: 5 (default)
    • param-toggle “Show bounding boxes on annotated output”: Yes (default)
    • param-toggle “Normalize outputs?”: No (default)

    version “Video frame stride”: determines how often frames are analysed. A stride of 5 means one frame in every five is processed.

    • Low stride (1–3): more precise analysis, but longer processing time
    • High stride (10–30): faster processing, useful for long videos where objects move slowly

    param-select “Video quality”: controls the quality of the annotated output video, with no impact on processing speed or annotations.

    param-select “COCO output mode”: controls how COCO annotations are generated.

    • Annotate the video: one COCO entry per frame, referencing the video file (default)
    • Annotate extracted frames: saves frames as JPGs with one COCO entry per image — useful for pre-processing, for example with the AnyLabeling Interactive tool as shown in the Moorev tutorial
  2. Click Execute

    Comment: Video processing time

    Video processing takes significantly longer than processing a single image. For a video of a few minutes, expect between 5 and 20 minutes depending on the server and the stride chosen.

  3. The following outputs appear in your history:
    • COCO Annotation: the JSON file with annotations for each processed frame
    • Annotated Outputs: the annotated video with segmentation masks overlaid frame by frame
  4. Viewing the annotated video

    • Click on Annotated Outputs in the history panel
    • Click galaxy-eye
    • Click galaxy-visualise
    • Select Media Player

    Display the annotated video in Galaxy.

    Warning: Video not loading?

    The video may not load in Galaxy for several reasons:

    • The file is too large for your internet connection
    • The param-select “Video quality”: Original quality (copy) setting makes in-browser playback unavailable

    In that case, use galaxy-save to download the video and play it locally with your usual media player.

    You will see the shrimps tracked with a coloured segmentation mask throughout the video.

    SAM3 segmentation of shrimps.

  5. Downloading the annotated video
    • Click the Annotated Outputs collection
    • Use galaxy-save to download the .mp4 video
    Comment: Limitations of SAM3 and pre-processing

    SAM3 tool is a first attempt to propose prompt-based Galaxy tool. As it is using SAM3 model, you can have highly heterogenous results in term of quality depending on the objects you are searching to segment, notably if such kind of object can be on data used to train the SAM3 model. Adjusting the confidence threshold can help, but it does not solve everything. Pre-processing your images or videos is often necessary to improve results.
    To learn more, check out the dedicated tutorial: Tuto Moorev

Conclusion

You now know how to use SAM3 Galaxy tool to:

  • Segment objects in an image using a simple text prompt
  • Segment objects in a video frame by frame with temporal tracking
  • Export results in COCO format (for annotation and evaluation tools) or YOLO format (for model training)