Verse @ Miccai, Shenzhen - VerSe'20

Key Information¶

Overview¶

Two tasks:
(1) Vertebral localisation and identification.
(2) Vertebral segmentation.
Best-performing in each sub-tasks and of all varieties, along with off-beat approaches will be collected, analysed, and presented in a journal article co-authored with the contributors.
Contributors retain the intellectual rights to the submitted codes. The organisers will use it solely towards analysis and manuscripts related to the challenge.

Data¶

Download¶

Train set: 100 MDCT scans
- 1. Phase 1: 30 scans, 4 files per case. (Download)
  2. Phase 2: 30 scans, 4 files per case. (Download)
  3. Phase 3: 20 scans, 4 files per case. (Download)
Test set: 40 scans (public, Download) + 40 (hidden) MDCT scans.

Data Structure¶

The dataset has three files corresponding to one data sample, structured as follows:
- 1. verse.nii.gz - Image
  2. verse_seg.nii.gz - Segmentation Mask
  3. verse_ctd.json - Centroid annotations
  4. verse_snapshot - A PNG overview of the annotations.
The images need NOT be in the same orientation. Their spacing need NOT be the same. However, an image and its corresponding mask will be in the same orientation.
!! verse201.nii.gz and verse201_seg.nii.gz have different orientations. For now, we suggest reorienting them to a common orientation. This case will be fixed in future releases. !!
Both masks and centroids are linked with the label values [1-24] corresponding to the vertebrae [C1-L5]. Some cases might contain 25, the label L6.
The centroid annotations are with respect to the coordinate axis fixed on an isotropic scan (1mm) and a (P, I, R) or (A, S, L) orientation, described as:
- 1. Origin at Superior (S) - Right (R) - Anterior (A)
  2. 'X' corresponds to S -> I direction
  3. 'Y' corresponds to A -> P direction
  4. 'Z' corresponds to R -> L direction
  5. 'label' corresponds to the vertebral label

Data Usage Agreement¶

By downloading the data, you are agreeing to the following terms:

Ownership: The downloaded material is owned by the Department of Neuroradiology, School of Medicine, Technical University Munich, 81675 Munich, Germany.

Licensing: The data is released under the CC BY-SA 2.0 license.

Liability: Annotations are created manually according to the best knowledge of two neuroradiologists; however they may not be perfect in every regard. We explicitly exclude any responsibility regarding the correctness of all provided data as well as any or liability regarding the usage of the data and all consequences thereof.

Citations: You are free to use and/or refer to the VerSe datasets in your own research, provided that you always cite two manuscripts, one related to the database and one related to the methods [1], the segmentations have been created with [2, 3]. You will be updated by email regarding the final citations when available online.

References:

[1] Under preparation.

[2] Under preparation.

[3] Sekuboyina A. et al. (2018) Btrfly Net: Vertebrae Labelling with Energy-Based Adversarial Learning of Local Spine Prior. Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. Springer, Cham.

Description¶

The training dataset includes raw CT images and its corresponding voxel-level, multi-class annotation for every vertebra visible in the image. Dataset distribution closely resembles a typical clinical distribution of scan ranges, settings, and findings in an emergency, oncological, as well as a neurosurgical patient collective.

~25% of scans include complete cervico-/thoraco-lumbar spine (possibly in 2 stacks) mainly without contrast enhancement.
~75% of scans include the complete or any part of the thoracolumbar spine with or without contrast enhancement.
~50% of scans will be available in sagittal reformations (i.e. a spatial resolution of at least 1mm × 1mm × 3mm).
~50% of scans will be available in isotropic spatial resolution (i.e. at least 1mm × 1mm × 1mm).
~25% show vertebral fractures.
~10% show implants or other foreign materials.

Tasks¶

Task 1: Vertebra Labelling¶

Labelling of vertebrae has immediate diagnostic and modelling significance, e.g.: localised vertebrae are used as markers for detecting kyphosis or scoliosis, in surgical planning, or for follow-up analysis tasks such as vertebral segmentation or their bio-mechanical modelling for load analysis.

In this task, given a spine CT scan, the task is to label all the vertebrae within the field-of-view. Essentially, this is a landmark detection task. The output of this stage should be a list of the three-dimensional coordinate locations of the vertebrae according to the coordinate system described in the 'Data' page.

Evaluation:¶

We use four metrics for evaluating the labelling performance of your algorithm, two at the dataset level and two at the scan level.

1. Identification rate (in %): As defined in [1]. Ratio of vertebrae 'identified' in the full test set. A vertebra is correctly 'identified' if the ground truth vertebral location if closest to the predicted vertebral location (eg. predicted L1 to ground truth L1) and this distance is less than 20mm.

2. Localisation distance (in mm): As defined in [1]. Mean localisation distance over all vertebrae in the test set. Distance of each predicted vertebral location from its ground truth vertebral location.

3. Recall (in %, subject to slight modification): As defined in [2]. R = #hits/#actual, where #hits is the number of vertebrae satisfying the condition of identification as defined for id. rate above and #actual is the number of vertebrae actually present in the image. It captures the ratio of correctly 'identified' vertebrae per scan.

4. Precision (in %, subject to slight modification): As defined in [2]. P = #hits/#predicted, where #predicted is the total number of vertebrae predicted to be in the image. For example: this penalises the case where the scan has five vertebrae L1-L5, while the algorithm predicts eight vertebra T10-L5.

Task 2: Vertebra Segmentation¶

Spine segmentation is a crucial component in quantitative medical image analysis. It directly allows detection and assessment of vertebral fractures and indirectly supports modelling and monitoring of the spinal ageing process.

In this task, given a spine CT scan, the task is to generate accurate voxel-level segmentation maps of the vertebrae present in the scan. Essentially, this is a multi-label segmentation task. The output of this task should be another 3D volume of the same size and orientation as the input scan with integer values between 1 and 24.

Evaluation:¶

We use two ubiquitous metrics prevalent in the medical image segmentation domain.

1. DICE Coefficient (in %): Measures the ratio of segmentation overlap accuracy in the form of F1 score at voxel level. Here, DICE is computed per-label as 2|A⋅B|/(|A| + |B|), where 'A' is the ground foreground voxels of a certain label and 'B' is the predicted set.

2. Hausdorff Surface Distance (in mm): Measures the local maximum distance between the two surfaces constructed out of the ground truth and predicted segmentation map.

NOTE: There will be certain exceptions that need to be handled while evaluating and ranking the algorithms. Discussions are underway to elegantly handle such exceptions and this page will be updated as and when a conclusion is reached.

References:

Glocker, B., et al.: Automatic localization and identification of vertebrae in arbi- trary field-of-view ct scans. In: MICCAI. (2012)
Sekuboyina A, et al.: Btrfly Net: Vertebrae Labelling with Energy- based Adversarial Learning of Local Spine Prior. In: MICCAI. (2018)

Rules¶

Please read this page carefully before submission. Certain parts of the rules are subject to change as we strive to make the participation smoother. Please refer to the 'Change Log' at the end of this page. Major changes will also be announced on the Home Page.

General¶

Teams: Participants can form teams. In such cases, the team will be regarded as one unit and are asked to submit one set of results (see 'submission' below). Please use the forum here to find your teammates!

Tasks: It is NOT mandatory to participate in both the tasks of labelling and segmentation. However, for an optimal overall ranking on the grand-challenges webpage, both results are needed. We do not intend to limit the usage of annotations across tasks. i.e: you can use the segmentation masks to train for the task of vertebrae labelling or vice versa.

Discussion: The discussion forum here can be used for any and all sorts of discussions related to the challenge. Please keep the discussion civil and respectful. The moderators reserve the right to remove unrelated posts.

External Data Usage: Use of external data is permitted as long as this data is publicly accessible. The participants are required to explicitly mention the usage of external data in the PDF report.

Special cases in evaluation:

In the ground truth, a vertebrae is ‘labelled’ or ‘segmented’ only if it is ‘fully visible’ within the scan. A vertebra is 'fully visible' if it is entirely within the field-of-view in cranio-caudal direction. However, It will be labelled if a small part of the transverse process is missing due to limited field of view in the lateral direction).
If your algorithm labels or segments such partial vertebrae, such prediction will NOT be penalised. For example: mask of a partially visible L1 vertebra (label 20) will not counted towards DICE. Similar is the case for labelling.
Labels are restricted to the numbers 1-25, corresponding to C1-L6. We removed cases with 13 thoracic vertebrae or 9 cervical vertebrae from the dataset due to limited numbers.

Submission¶

A submission to the grand-challenge page has two parts:

A zipped folder team_name.zip containing the predictions. Predictions for the labelling task should be in .json format with the same name as the image file with a '_ctd' appended. Predictions for the segmentation task should be in .nii.gz format with the same name as the image file with a '_seg' appended. For example: Predictions for an image verse999.nii.gz will be verse999_ctd.json and verse999_seg.nii.gz.
NOTE: The orientation and spacing on the scan and its segmentation should be identical!
NOTE-2: In case you see unexpected results in the evaluation portal, try creating a zip out of your files directly. In other words, if you unzip your team_name.zip, it should be the predicted files and not a 'folder with the files'
A PDF report in LNCS format describing the method. In case of semi-automated submissions, please describe appropriate actions and version numbers of any third-party tools that have been used. There is no minimum page limit but a report of 4-6 pages is encouraged.

For Test Phase 1:To finalise the submission: Send the team-name.zip and a PDF report to verse@deep-spine.de. Only the sent submissions are considered to be final.

For Test Phase 2: Finally, a full submission for the overall ranking must include a third component: the code in a docker container that the organisers can run to evaluate the submission on the hidden test set. The structure for this docker will be announced shortly. Please note that the authors hold all the rights to the code. The organisers will only use the code for evaluating it on the hidden test set and report the performance in the ensuing publication.

Scoring¶

Challenge overview:

Challenge has two tasks: vertebrae labelling and vertebrae segmentation.
Submissions will be evaluated on two test sets. 50 public scans (Test Set 1) and 50 hidden scans (Test Set 2)
Submissions can be fully-automated or semi-automated.

Grand Challenges Ranking: The following is the ranking adapted for the challenge while adhering to the servcie provided by grand-challenge.org (GC). The score on the GC leaderboard is computed as follows:

overall_position = 0.5*(vertebrae_labelling_position) + 0.5*(vertebrae_segmentation_position).vertebrae_labelling_postion = mean(id_rate_position + localisation_distance_position + precision_position + recall_position)) vertebrae_segmentation_position = mean(DICE_position + hausdorff_distance_position)

Final Ranking: The final ranking, however, will be more robust and statistically sound.

Similar to the Medical Decathlon and the BraTS challenges, we will evaluate the final ranking based on a pair-wise comparison of all methods on all metrics across both the tasks. The method 'significantly' better than the others is a majority of the metrics is the winner.
There will be three sets of ranks: one of Test Set 1, one for Test Set 2, and one overall ranking. We intend to release a list for individual tasks as well.
Note that, owing to the common-ness in the two tasks, the relative importance for the 'overall rank' between 'labelling' and 'segmentation' will be 30/70.

Miscellany¶

Fully-automated submissions are encouraged. Since by design, semi-automated methods can only be tested on the public dataset. Hence, they will NOT be ranked in the final ranking.
A method qualifies as 'fully-automated' only if a docker container is made available to be tested on the hidden test set.
Co-authorship: Two authors per submission will be included in the entailing publication. In this publication, we intend to include as many fully-automated submissions as possible and the best (and any interesting) semi-automated submissions.
Beyond the summary of the challenge, we intend to perform an extensive study on inter-rater variability in spine/vertebral segmentation. The submitted docker will be used for this study too.
NOTE: The authors will retain all rights to the submitted code. The organisers will only use the code for publications related to the VerSe 2019 challenge.