Accuracy and reproducibility of automated white matter hyperintensities segmentation with lesion segmentation tool: A European multi-site 3T study

Ribaldi, F.; Altomare, D.; Jovicich, J.; Ferrari, C.; Picco, A.; Pizzini, F. B.; Soricelli, A.; Mega, A.; Ferretti, A.; Drevelegas, A.; Bosch, B.; Muller, B. W.; Marra, Camillo; Cavaliere, C.; Bartres-Faz, D.; Nobili, F.; Alessandrini, F.; Barkhof, F.; Gros-Dagnac, H.; Ranjeva, J. -P.; Wiltfang, J.; Kuijer, J.; Sein, J.; Hoffmann, K. -T.; Roccatagliata, L.; Parnetti, L.; Tsolaki, M.; Constantinidis, M.; Aiello, M.; Salvatore, M.; Montalti, M.; Caulo, M.; Didic, M.; Bargallo, N.; Blin, O.; Rossini, Paolo Maria; Schonknecht, P.; Floridi, P.; Payoux, P.; Visser, P. J.; Bordet, R.; Lopes, R.; Tarducci, R.; Bombois, S.; Hensch, T.; Fiedler, U.; Richardson, J. C.; Frisoni, G. B.; Marizzoni, M.

doi:10.1016/j.mri.2020.11.008

Brain vascular damage accumulate in aging and often manifest as white matter hyperintensities (WMHs) on MRI. Despite increased interest in automated methods to segment WMHs, a gold standard has not been achieved and their longitudinal reproducibility has been poorly investigated. The aim of present work is to evaluate accuracy and reproducibility of two freely available segmentation algorithms. A harmonized MRI protocol was implemented in 3T-scanners across 13 European sites, each scanning five volunteers twice (test-retest) using 2D-FLAIR. Automated segmentation was performed using Lesion segmentation tool algorithms (LST): the Lesion growth algorithm (LGA) in SPM8 and 12 and the Lesion prediction algorithm (LPA). To assess reproducibility, we applied the LST longitudinal pipeline to the LGA and LPA outputs for both the test and retest scans. We evaluated volumetric and spatial accuracy comparing LGA and LPA with manual tracing, and for reproducibility the test versus retest. Median volume difference between automated WMH and manual segmentations (mL) was −0.22[IQR = 0.50] for LGA-SPM8, −0.12[0.57] for LGA-SPM12, −0.09[0.53] for LPA, while the spatial accuracy (Dice Coefficient) was 0.29[0.31], 0.33[0.26] and 0.41[0.23], respectively. The reproducibility analysis showed a median reproducibility error of 20%[IQR = 41] for LGA-SPM8, 14% [31] for LGA-SPM12 and 10% [27] with the LPA cross-sectional pipeline. Applying the LST longitudinal pipeline, the reproducibility errors were considerably reduced (LGA: 0%[IQR = 0], p < 0.001; LPA: 0% [3], p < 0.001) compared to those derived using the cross-sectional algorithms. The DC using the longitudinal pipeline was excellent (median = 1) for LGA [IQR = 0] and LPA [0.02]. LST algorithms showed moderate accuracy and good reproducibility. Therefore, it can be used as a reliable cross-sectional and longitudinal tool in multi-site studies.

Ribaldi, F., Altomare, D., Jovicich, J., Ferrari, C., Picco, A., Pizzini, F. B., Soricelli, A., Mega, A., Ferretti, A., Drevelegas, A., Bosch, B., Muller, B. W., Marra, C., Cavaliere, C., Bartres-Faz, D., Nobili, F., Alessandrini, F., Barkhof, F., Gros-Dagnac, H., Ranjeva, J. -., Wiltfang, J., Kuijer, J., Sein, J., Hoffmann, K. -., Roccatagliata, L., Parnetti, L., Tsolaki, M., Constantinidis, M., Aiello, M., Salvatore, M., Montalti, M., Caulo, M., Didic, M., Bargallo, N., Blin, O., Rossini, P. M., Schonknecht, P., Floridi, P., Payoux, P., Visser, P. J., Bordet, R., Lopes, R., Tarducci, R., Bombois, S., Hensch, T., Fiedler, U., Richardson, J. C., Frisoni, G. B., Marizzoni, M., Accuracy and reproducibility of automated white matter hyperintensities segmentation with lesion segmentation tool: A European multi-site 3T study, <<MAGNETIC RESONANCE IMAGING>>, 2021; 76 (76): 108-115. [doi:10.1016/j.mri.2020.11.008] [http://hdl.handle.net/10807/179168]