REMBI - Recommended Metadata for Biological Images – metadata guidelines for bioimaging data

Overview
Creative Commons License: CC-BY Questions:
  • What is REMBI and why should I use it?

  • What information should be included when collecting bioimage data?

Objectives:
  • Organise bioimage metadata

  • Find out what REMBI is and why it is useful

  • Categorise what metadata belongs to each of the submodules of REMBI

  • Gather the metadata for an example bioimage dataset

Requirements:
Time estimation: 15 minutes
Supporting Materials:
Published: Oct 31, 2024
Last modification: Oct 31, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00364
version Revision: 0

Metadata guidelines for bioimaging data

REMBI (Recommended Metadata for Biological Images) was proposed as a draft metadata guidelines to begin addressing the needs of diverse communities within light and electron microscopy. Currently, these guidelines are in draft form to encourage discussion within the community, but they provide a useful guide as to what metadata should be gathered to make your image data FAIR. They divide the metadata requirements into eight modules which further split into attributes - that seems to be a daunting task, doesn’t it? But at the same time it’s exciting news for the community! To find out more, have a look at the REMBI article.

Question

In the REMBI paper, the authors consider three potential user groups who require different metadata. Find out what are these three groups and their metadata requirements.

The identified three user groups are: Biologists, Imaging scientists, Computer-vision researchers.

  • A research biologist may be interested in the biological sample that has been imaged to compare it to similar samples that they are working with.
  • An imaging scientist may be interested in how the image was acquired so they can improve upon current image acquisition techniques.
  • A computer vision researcher may be interested in annotated ground-truth segmentations, that can be obtained from the image, so they can develop faster and more accurate algorithms.

If you’re an instructor leading this training, you might ask people to work in small groups for this exercise and encourage the discussion. Ask group members to share which of the user groups they identify as and what metadata they would want.

Categories of metadata

REMBI covers different categories of metadata, such as:

  • study
  • study component
  • biosample
  • specimen
  • image acquisition
  • image data
  • image correlation
  • analyzed data

Within each module, there are attributes that should be included to make the published data FAIR. We will explore all the modules and attributes suggested by REMBI and we’ll show some examples as well.

Study

The first module of REMBI metadata describes the Study and should include:

  • Study type
  • Study description
  • General dataset information

Study type

Ideally, the study type will be part of an ontology. You can look up the main subject of your study using a tool like OLS to find a suitable ontology. This will help others to see where your study sits within the wider research area.

Comment: Example
Study type Regulation of mitotic cell division

Study description

A brief description of the project. The Study Description should include the title of the study, a brief description and any related publication details such as authors, title and DOI. If you are gathering metadata prepublication, you can fill in the publication details later or enter a draft title or the journal name you plan to submit to. It’s still a good idea to include the category, so you don’t forget.

Comment: Example
Study description
Title Imaging mitotic cells
Description Visualising HeLa cells using confocal microscopy
Publication details TBC

General dataset information

This should include all the information that relates to all the data in the project. This can include the names of contributors and the repository where the data is or will be stored. State the licence under which you intend to make the data available, the repository you intend to submit to and if you are using a schema for structuring your metadata. This helps to keep all collaborators on the same page. Any other general information with respect to the study can be included here, but try to keep this broad as more detailed information should be included in other sections of the metadata.

Comment: Example
General Dataset Information
Contributors Alica and Bob
Repository Bioimage Archive
Licenses CC-BY
Schemas Datacite Metadata

Study component

A study component can be thought of as an experiment, both the physical experiment and subsequent data analysis, or a series of experiments that have been conducted with the same aim in mind.

The associated metadata should describe the imaging method used and include a description of the image dataset. The REMBI guidelines store high-level metadata in the study component and then divide the more detailed metadata into other modules.

Within the Study component we include the Imaging Method which should describe the techniques used to acquire the raw data. This could be one or multiple methods, which should be part of a relevant ontology. For Confocal Microscopy data, we can use the Biological Imaging Methods Ontology, although it is also present in a number of other ontologies.

The description of the study component should include an overview of what was imaged as well as any processed data that is created during analysis.

Comment: Example
Imaging Method Confocal Microscopy
Study Component Description Images of cells and segmented binary masks

You could either choose to store the metadata in the same file as your study data or have a new file for each study component. This could be stored in the same place as your study metadata, or you could create a subdirectory structure.

Biosample

The first thing you need for the biosample metadata is an Identity. This is a code that you assign to each sample you are describing, which will link this metadata to the physical sample. Then, state what the biological entity is, which should come from a relevant ontology. Use a taxonomy to name the organism. Next, describe the variables in your experiment. The REMBI guidelines split the variables into three types:

  • intrinsic - describe an innate trait of the biosample, such as a genetic alteration
  • extrinsic - describe something you added to the sample, for example, a reagent
  • experimental - things that you intentionally vary, like time

You can leave out some of the variables if they are not part of your experiment.

Comment: Example
Identity CM001
Biological entity JURKAT E-6.1 cell
Organism Homo sapiens
Intrinsic variable Jurkat E6.1 transfected with emerald-VAMP7
Extrinsic variable Aspirin
Experimental variables Dose response of aspirin

Specimen

The specimen metadata should include:

  • the experimental status (control or test)
  • the location within the biosample, such as a coordinate or a particular well in a plate
  • how the sample was prepared
  • how the signal is being generated
  • the content and biological entities of different channels.

Include enough information so that someone with experience in the field could reproduce a sample by following the information you provided. Assume they would know typical techniques and name them using terms from an ontology if possible. Only include lots of detail if you are describing a novel technique.

Comment: Example
Experimental status Control
Location within biosample Plasma membrane within 100 nm of coverslip (TIRF)
Preparation method Cos-7 cells cultured in DMEM medium, and then plated on #1 coverslips and imaged live in L-15 medium
Signal/contrast mechanism fluorescent proteins
Channel – content Green: eGFP, Red: mCherry
Channel – biological entity Green: EGFR, Red: Src

Image acquisition

Here you should include all the information about the instrument you used and how it was set up. Like with the specimen metadata, describe this information as though you are speaking to someone who already knows how to use a similar instrument. What would they need to know to produce the same image data?

Check with your facility manager if they have any guidelines for what details need to be recorded for your particular instrument. Make sure that the parameters you record can actually be used by someone else if they don’t have exactly the same instrument or setup. For example, don’t say that you used a certain percentage of laser power, as this doesn’t tell you how much power was used unless you also provide the total power of the laser. If the instrument software has automatically generated a metadata file, remember to save this. Depending on its content, this may be sufficient.

Start with the details of the equipment for the Instrument Attributes. If this is commercial equipment, include the make and model, a short description of what type of instrument it is and details about its configuration. If the instrument is bespoke, you will need to include more details. Next, you should include image acquisition parameters. These relate to how the instrument was set up for the particular experiment. Some of these may be captured automatically by the instrument’s software, so make things easy for yourself and check if a file is generated and what’s in it. If a file is generated, then you only need to manually record anything that is missing from the file.

Comment: Example
Instrument attributes Olympus FV3000, laser point scanning confocal, 500-550 nm filter, 37-degree chamber.
Image acquisition parameters
Objective 20x
Excitation Wavelength 488 nm
PMT gain 500 V
Pixel dwell time 2 𝜇s
Confocal aperture 200 𝜇m

To help you collect the information for your own data, you might have a look at the local resources from your institution or universities. For example, at Warwick University, there are webpages describing the metadata that needs to be collected for some of the microscopes.

Image data

In this section, you record all the information related to all the images you have. Not only the primary or raw images, but also any processed images, perhaps such as binary files showing the resulting segmentation.

You need to say what format the images are in and if they have undergone any compression, the dimensions of the images, and what the physical size of the pixel or voxel is, including the units. Most of this information you should be able to get from the metadata or header of the image files.

Next, you need to state the physical size of the image or magnification, calculated from the pixel or voxel size and the dimension extents. Give any information related to how the channels are represented. For processed images, you need to provide the methods used for processing.

Finally, say you have used contrast inversion, do the bright features in the image correspond to areas of high signal, or is it the other way around?

Comment: Example
Type Primary Image, Segmentation
Format and compression Primary: .oir (Olympus), Segmentation: .tiff
Dimension extents x: 512, y: 512, z: 25
Size description 153.6 x 153.6 x 25 𝜇m
Pixel/Voxel size description 0.3 x 0.3 x 0.1 𝜇m
Image processing method Fiji: Median filter (3 pixel kernel), Otsu threshold
Contrast inversion No

Image correlation

If you have used different imaging modalities with the same sample, this part of the metadata should describe how the images relate to one another. You could use this section to describe generally the relationship between images. In the example below, images from different modalities have been aligned.

Comment: Example
Spatial and temporal alignment Manual
Fiducials used Soil grains
Transformation matrix See file: Transforms.csv
Size description 153.6 x 153.6 x 25 𝜇m
Related images and relationship Primary XCT: Data/XCT Primary XRF: Data/XRF Processed XRF: Data/Transformed_XRF

Analysed data

This section should not include metadata for any image data, including processed images, as that should have been covered in the Image Data section. Instead, it should describe the analysis results you have, such as measurements. Have you done some numerical analysis or some phenotyping or something else? There is no need to describe the methods in great detail if they are already described in the relevant publication.

Comment: Example
Analysis results type Speed of cell division
Data used for analysis Preprocessed images, Cell tracks
Analysis method Track cell lineage: BayesianTracker (btrack) with configuration track_config.json Measure speed: Numerical analysis in Python

Final notes

For more examples, check out REMBI Supplementary Information - either in pdf or spreadsheet.

At first glance, it might seem to be quite a stretch to collect all that metadata! But don’t get discouraged - following those guidelines will ensure better communication between the scientists and will make your research FAIR: Findable, Accessible, Interoperable, Reusable. During big data era when we are surrounded by so much resources, it’s crucial to get good data management habits, share them with others and hence contribute to the development of Science toghether.