University of California, San Diego

Author Of 1 Presentation

Imaging Poster Presentation

P0583 - High throughput lesion evaluation and quality control for incorporating quantitative imaging metrics into clinical practice (ID 1502)

Speakers
Presentation Number
P0583
Presentation Topic
Imaging

Abstract

Background

Automated multiple sclerosis lesion counts and volumes are poised to be salient clinical biomarkers of disease progression; however, algorithmic variability and low expert agreement prevents widespread adoption in clinical practice. Because every method has a non-negligible error rate, visual quality control (QC) is required before a clinical decision can be made. QC is a bottleneck to the use of automated lesion count and volume metrics in the clinic. A method is needed to 1) quickly evaluate experts and non-experts to understand and resolve disagreements, and 2) quickly QC the output of automated lesion segmentation methods.

Objectives

To evaluate the feasibility of a web application called braindr (Keshavan et al., 2019) for high-throughput QC of automated lesion segmentation by measuring the 1) intra-rater reliability, 2) the inter-rater reliability, and 3) to characterize the types of lesions that are disagreed upon.

Methods

3D T1 and FLAIR images from 32 subjects were registered, N4 bias field corrected, and z-scored. Subtraction images (Z_FLAIR-Z_T1) were thresholded at varying levels. A triplanar image of each resulting segmentation (called a potential lesion, PL) was generated, resulting in over 80,000 individual PL’s needing QC, which simulates a high-throughput scenario with a high error rate. Expert and non-expert raters were asked to pass or fail PLs based on the 2D triplanar image on the app. We measured variability between and within raters by calculating the intraclass correlation coefficients (ICC).

Results

1) Feasibility: 14,973 PLs were labelled by 5 raters. The raters were a neuroradiologist (MI), a general neurologist (BD), 2 experienced technicians (AK, KL), and 1 beginner (MB). 2) Intra-rater reliability (ICC(1,1)) : a) neuroradiologist: 0.97, b) beginner: 0.90, c) experienced techs: 0.87, 0.85, and d) neurologist: 0.84. 3) Inter-rater reliability for an average rating ICC(2,k) = 0.92, and individual ICC(2,1) = 0.74. 4) Disagreements occurred more frequently on PL’s in the brainstem, cerebellum, hippocampal, and basal ganglia.

Conclusions

We simultaneously evaluated raters, and QC’d lesions from an automated method using a quick, scalable, web application. This enables us to 1) improve expert agreement on lesion identification, 2) develop better quality education materials for experts and non-experts alike, 3) train new raters quickly, and 4) ensure the quality of the measurements at scale.

Collapse