VerSe'20 has two sub-tasks:

  1. Vertebra Labelling, comprised of localisation and identification of vertebrae.
  2. Vertebra segmentation.

The following text contains additional details about each of the tasks such as: task description, annotation procedure, and evaluation metrics. 

Task 1: Vertebra Labelling

Labelling of vertebrae has immediate diagnostic and modelling significance, e.g.: localised vertebrae are used as markers for detecting kyphosis or scoliosis, in surgical planning, or for subsequent analysis tasks such as vertebral segmentation or bio-mechanical modelling for load analysis. 

Task Description: In this task, given a spine CT scan, the task is to label all the vertebrae within the field-of-view. Essentially, this is a landmark detection task. The output of this stage should be a list of the vertebral labels fully present in the image, with their corresponding three-dimensional coordinate locations of the vertebrae according to the coordinate system described in the 'Data' page .

Different from VerSe'19: VerSe'20 deals with 'special cases' in spinal anatomy. Specifically, it deals with transitional vertebrae. Vertebral labels now refer to the following legend:

We only label “free” vertebrae,  i.e we do not label the sacrum or transitional vertebrae that are (partly) fused with the sacrum. Such vertebrae are referred to as Castellvi grade 3 and 4. In this regard, all "free" vertebrae (including an ankylosis due to degeneration), are called lumbar. We consider L1 to be the first vertebra without ribs or with rib remnants smaller than 4cm on both sides in a horizontal alignment (including heterotopic ossification of the transverse process). The last thoracic vertebra should have at least one rib longer than 4cm in a typical diagonal downward alignment. In ambiguous cases, the shape of vertebra and facet joints are considered. If T1 is not present in the scan (i.e. visible within the scan's field-of-view), the thoracic spine is considered to have 12 vertebrae

Evaluation MetricsWe use two metrics for evaluating the labelling performance of your algorithm,  all of them at a  scan level.

1. Identification rate (in %):   Denotes the ratio of vertebrae 'identified' per scan. A vertebra is correctly 'identified' if the ground truth vertebral location if closest to the predicted vertebral location (eg. predicted L1 to ground truth L1) and this distance is less than 20mm.   

2. Localisation distance (in mm): Denotes mean localisation distance over all vertebrae in the scan. Distance of each predicted vertebral location from its ground truth vertebral location.  

Both the metrics are based on their dataset-level counterparts introduced  in [1].  The only difference is the set of vertebrae over which they are computed. In [1], they are computed over all the vertebrae in the test set. In our case, they are computed over all vertebrae in a scan and averaged over the dataset.

Task 2: Vertebra Segmentation

Spine segmentation is a crucial component in quantitative medical image analysis. It directly allows detection and assessment of vertebral fractures and indirectly supports modelling and monitoring of the spinal ageing process.

Task Description: In this task, given a spine CT scan, the task is to generate accurate voxel-level segmentation maps of the vertebrae present in the scan. Essentially, this is a multi-label segmentation taskThe output of this task should be another 3D volume of the same size and orientation as the input scan.

Different from VerSe'19: The segmentation mask now contains integer values between 1 and 28, where the 'value' corresponds to the vertebra's label as listed in Task 1We only segment “free” vertebrae, i.e. cervical, thoracic and lumbar vertebrae. We do not segment the sacrum or transitional vertebrae that are fused with the sacrum as in Castellvi grade 3 and 4.

Evaluation Metrics: We use two ubiquitous metrics prevalent in the medical image segmentation domain. 

1. DICE Coefficient (in %):  Measures the ratio of segmentation overlap accuracy in the form of F1 score at voxel level. Here, DICE is computed per-label as 2|AB|/(|A| + |B|), where 'A' is the ground foreground voxels of a certain label and 'B' is the predicted set.

2. Hausdorff Surface Distance (in mm): Measures the local maximum distance between the two surfaces constructed out of the ground truth and predicted segmentation map.

NOTE: There will be certain exceptions that need to be handled while evaluating and ranking the algorithms.  Such exceptions will be communicated with the participants when appropriate.


  1. Glocker, B., et al.: Automatic localization and identification of vertebrae in arbi- trary field-of-view ct scans. In: MICCAI. (2012)