Genentech

Author Of 1 Presentation

Machine Learning/Network Science Poster Presentation

P0007 - Detecting treatment response on T2 lesion burden in multiple sclerosis clinical trials with deep learning (ID 1171)

Speakers
Presentation Number
P0007
Presentation Topic
Machine Learning/Network Science

Abstract

Background

Deep Learning (DL) methods are promising to automate or assist the radiologist in many of the measurements currently made manually during clinical trial drug development; thereby reducing inter/intra rater variability and reading time. T2 lesion burden is a routine imaging endpoint in Multiple Sclerosis (MS) clinical trials. Ocrelizumab (OCR) is a humanized anti-CD20+ monoclonal antibody approved for the treatment of relapsing and primary progressive forms of MS. It has been shown to strongly suppress the number of new and enlarging T2 lesions (compared to control treatment) in the relapsing remitting MS (RRMS) based on manual neuro-radiologist assessments.

Objectives

Evaluate DL models to quantify T2 lesion burden and detect the therapeutic response identified by manual neuro-radiologist assessments.

Methods

We evaluated three DL models using multimodal brain MRI (T2w, FLAIR and T1w; voxel size of 1x1x3 mm3): a stack model with three consecutive slices as input; a patch model with 3D sub volumes as input; and a novel model architecture. Models were trained on Opera I (NCT01247324, n=898) datasets (baseline) and tested on Opera II (NCT01412333, n=905) datasets (baseline, 24, 48 and 96 weeks). Number of new and enlarging T2 lesions were estimated heuristically from serial lesion masks predicted by the model.

Results

Lesion size distributions were heavily skewed toward small lesions (~40% < 10 voxels, ~80% < 50 voxels with a minimum lesion size threshold of 3 voxels). All models achieved similar mean dice coefficient (0.7), mean lesion true positive rate (LTPR) of 0.87, 0.85, 0.82 and mean lesion false positive rate (LFPR) of 0.23, 0.24, 0.18. Generally, the models overestimated the number of new and enlarging T2 lesions when compared to the ground truth (GT) masks. This was driven by a high number of false positives (FP) for small lesions. All three models were able to detect a significant treatment response in favor of OCR at weeks 48 (p<0.01) and 96 (p<0.001). Only the novel model showed a significant treatment effect at week 24 (p<0.01).

Conclusions

The cross-sectional performance metrics of all 3 models were comparable to those reported in literature. All models successfully reproduced the treatment response from GT manual reads at 48 and 96 weeks, but only the novel model detected the early response at 24 weeks seen in the GT assessments. To further improve the model’s reliability, future work will be aimed to reduce the number of FPs for small size lesions.

Collapse