Extended Abstract (for invited Faculty only) Cartilage Imaging and Functional Testing

8.3.2 - AI in OA Diagnosis

Presentation Topic
Cartilage Imaging and Functional Testing
Date
13.04.2022
Lecture Time
11:00 - 11:15
Room
Potsdam 3
Session Name
Session Type
Special Session
Speaker
  • S. Nehrer (Krems, AT)
Authors
  • S. Nehrer (Krems, AT)
  • R. Ljuhar
  • C. Goetz (Vienna, AT)
  • T. Paixao (Wien, AT)

Abstract

Introduction

Introduction Radiographic classification of osteoarthritis (OA) in the knee has typically been performed using semi-quantitative grading schemes 1, the most widely used of which being the Kellgren-Lawrence (KL) scale 2 which was recognized by the World Health Organization in 1961 as the standard for clinical studies of OA. The KL grading scheme requires the assessment of presence and severity degree of several individual radiographic features (IRFs), including osteophytes, sclerosis, and joint space narrowing. These assessments are them summarized into a 5 point scale, reflecting the severity of OA. However, the KL grading scheme has come under criticism for assuming a unique progression mode of OA 3 and for depending on subjective assessments 4,5, exacerbated by the vague verbal definitions of individual radiographic features at each stage 6. In order to deal with these issues, the Osteoarthritis Research Society International (OARSI) proposed a classification system for each of the IRFs supported by a reference atlas, in which canonical examples of the classification of each of the IRFs are depicted 7.

In a first study (Part A) Joint Space Width has been the gold standard to assess loss of cartilage in knee OA. Here we describe a novel quantitative measure of joint space width: standardized JSW (stdJSW). We assess the performance of this quantitative metric for joint space width (JSW) at tracking Joint Space Narrowing OARSI grade (JSN) changes and provide reference values for different joint space narrowing OARSI grades and their annual change.

One of the main purposes of a systematic OA grading scheme, such as the KL and the OARSI systems, is to standardize diagnostic and assessments of OA. However, several studies report that the KL grading scheme, as well as the accessory assessments, suffer from subjectivity and low inter-observer reliability 8,9. This leads to differences in assessments of the prevalence of the disease 4 and variability of diagnoses of the same patient. This is especially problematic for the early stages of the disease: severe forms of OA are easily recognized in radiographs but its early stages are less consensual 10. In part this stems from the high degree of subjectivity of the assessments11, even with the guidance of the OARSI atlas. This problem has consequences at several levels: In the clinical practice, it can lead to misdiagnosis, leading to unnecessary examination procedures or omitted treatment, and psychological stress to the patient12. In the context of clinical trials, the variability of assessments can decrease the power to detect statistical effects of the efficacy of treatments13 and complicate the estimation of prevalence and incidence rates 14.

One potential, albeit not practical, solution for the problem of variability of diagnosis would be to have the same radiograph reviewed by several physicians and to have a procedure to determine consensus, as it is done when establishing the gold-standard readings in many clinical studies. This is clearly not a practical solution for clinical practice. However, one way to approach such a problem could be make use of a computer assisted detection system to standardize the readings of the relevant features. Artificial intelligence, and especially deep learning, has proven remarkably efficient at recognizing complex visual patterns. When applied to medical imaging, these systems can provide guidance and recommendations for radiographic assessments to the reader in a robust fashion. These artificial intelligence systems can be trained on the assessments of several clinicians (or the consensus readings after several physicians have reviewed the case) and so incorporate the experience of several clinicians and could potentially simulate a consensus procedure. Here we take this latter approach.

In Part B we make use of a computer assisted detection system (KOALA, IB Lab GmbH) that was trained in a large dataset of radiographs graded for KL and JSN, Sclerosis and Osteophyte OARSI grades through a consensus procedure. KOALA makes use of deep learning networks to provide fully automated KL and OARSI grades in the form of a report. Here, we assess how the use of this computer assisted detection system affects physicians’ performance in terms of inter-observer variability at assessing KL grade and IRFs, as well as their accuracy performance at detecting several clinically relevant conditions.

Content

Methods Part A: We collected 18.934 individual knee images from the OAI study, from the follow-up visits up to month 48 (baseline plus 4 follow-up exams). Absolute JSW and JSN readings were collected from the OAI study. Standardized JSW was calculated for each knee as well as 12-month JSN grade changes. For each JSN grade and 12-month grade change, the distribution of JSW loss was calculated for stdJSW and absolute JSW measurements retrieved from the OAI study. Area under the curve of the ROC curves was calculated for the performance of both absolute and stdJSW at discriminating between different JSN grades. Standardized response mean (SRM) was used to compare the responsiveness of the two measures to change in JSN grade.

Part B: A set of 124 unilateral knee radiographs from the OAI study were analyzed by a computerized method with regard to Kellgren-Lawrence grade, as well as Joint Space Narrowing, Osteophytes and Sclerosis OARSI grades. Physicians scored all images, with respect to osteophytes, sclerosis, joint space narrowing OARSI grades and KL grade, in two modalities: through a plain radiograph (unaided) and a radiograph presented together with the report from the computer assisted detection system (aided). Intra-Class Correlation between the physicians was calculated for both modalities. Furthermore, physicians’ performance was compared to the grading of the OAIstudy , and accuracy, sensitivity and specificity were calculated in both modalities for each of the scored features.

Results Part A: The areas under the ROC curve for stdJSW at discriminating between successive JSN grades were AUCstdJSW= 0.87, 0.95, and 0.96, for JSN>0, JSN>1 and JSN>2, respectively, whereas these were AUCfJSW= 0.79, 0.90, 0.98 for absolute JSW. We find that standardized JSW is significantly more responsive than absolute JSW, as measured by the SRM.

Part B: Agreement rates for KL grade, sclerosis, and osteophyte OARSI grades, were statistically increased in the aided vs the unaided modality. Readings for Joint Space Narrowing OARSI grade did not show a statistically difference between the two modalities. Readers’ accuracy and specificity for KL grade > 0, KL>1, sclerosis OARSI grade > 0, and osteophyte OARSI grade > 0 was significantly increased in the aided modality. Reader sensitivity was high in both modalities.

Conclusions Our results (Part A) show that stdJSW outperforms absolute JSW at discriminating and tracking changes in JSN. Furthermore, our results show that this effect is in part because stdJSW cancels the variation in JSWs that comes from variation in height. In conclusion, our study suggests that the use of a computer assisted detection system, such as KOALA, improves both agreement rate and accuracy when assessing radiographic features relevant for the diagnosis of knee osteoarthritis. These improvements in physician performance and reliability come without trade-offs in terms of accuracy. These results argue for the use of this type of software as a way to improve the standard of care when diagnosing knee osteoarthritis.

References

1. Braun HJ, Gold GE. Diagnosis of osteoarthritis: imaging. Bone. 2012;51(2):278-288. doi:10.1016/j.bone.2011.11.019

2. Kellgren JH, Lawrence JS. Radiological assessment of osteo-arthrosis. Ann Rheum Dis. 1957;16(4):494-502. http://www.ncbi.nlm.nih.gov/pubmed/13498604.

3. Kohn MD, Sassoon AA, Fernando ND. Classifications in Brief: Kellgren-Lawrence Classification of Osteoarthritis. Clin Orthop Relat Res. 2016;474(8):1886-1893. doi:10.1007/s11999-016-4732-4

4. Culvenor AG, Engen CN, Øiestad BE, Engebretsen L, Risberg MA. Defining the presence of radiographic knee osteoarthritis: a comparison between the Kellgren and Lawrence system and OARSI atlas criteria. Knee Surgery, Sport Traumatol Arthrosc. 2015;23(12):3532-3539. doi:10.1007/s00167-014-3205-0

5. Wright RW, MARS Group TM, Wright RW, et al. Osteoarthritis Classification Scales: Interobserver Reliability and Arthroscopic Correlation. J Bone Joint Surg Am. 2014;96(14):1145-1151. doi:10.2106/JBJS.M.00929

6. Schiphof D, Boers M, Bierma-Zeinstra SMA. Differences in descriptions of Kellgren and Lawrence grades of knee osteoarthritis. Ann Rheum Dis. 2008;67(7):1034-1036. doi:10.1136/ard.2007.079020

7. Altman RD, Gold GE. Atlas of individual radiographic features in osteoarthritis, revised. Osteoarthr Cartil. 2007;15 Suppl A:A1-56. doi:10.1016/j.joca.2006.11.009

8. Gunther KP, Sun Y. Reliability of radiographic assessment in hip and knee osteoarthritis. Osteoarthr Cartil. 1999;7(2):239-246. doi:10.1053/joca.1998.0152 [doi]

9. Damen J, Schiphof D, Wolde S Ten, Cats HA, Bierma-Zeinstra SMA, Oei EHG. Inter-observer reliability for radiographic assessment of early osteoarthritis features: the CHECK (cohort hip and cohort knee) study. Osteoarthr Cartil. 2014;22(7):969-974. doi:10.1016/j.joca.2014.05.007

10. Hart DJ, Spector TD. Kellgren & Lawrence grade 1 osteophytes in the knee--doubtful or definite? Osteoarthr Cartil. 2003;11(2):149-150. doi:10.1053/JOCA.2002.0853

11. Gossec L, Jordan JM, Mazzuca SA, et al. Comparative evaluation of three semi-quantitative radiographic grading techniques for knee osteoarthritis in terms of validity and reproducibility in 1759 X-rays: report of the OARSI-OMERACT task force. Osteoarthr Cartil. 2008;16(7):742-748. doi:10.1016/j.joca.2008.02.021

12. Balint G, Szebenyi B, Bálint G, et al. Diagnosis of osteoarthritis - Guidelines and current pitfalls. Drugs. 1996;52(SUPPL. 3):1-13.

13. Sadler ME, Yamamoto RT, Khurana L, Dallabrida SM. The impact of rater training on clinical outcomes assessment data: a literature review. Int J Clin Trials. 2017;4(3):101. doi:10.18203/2349-3259.ijct20173133

14. Marshall DA, Vanderby S, Barnabe C, et al. Estimating the Burden of Osteoarthritis to Plan for the Future. Arthritis Care Res. 2015;67(10):1379-1386. doi:10.1002/acr.22612

Collapse