Meta-analysis finds good news, bad news for radiology AI research

By Erik L. Ridley, staff writer

April 9, 2021 -- A meta-analysis of radiology artificial intelligence (AI) research found the technology had high overall accuracy in chest and breast imaging, but also widespread methodological and reporting issues that may preclude definitive assessments of clinical utility.

In a report published online April 7 in NPJ Digital Medicine, a team from Imperial College London shared their analysis of over 200 studies published on the use of deep-learning algorithms in breast or respiratory imaging applications. Although the models offered high overall diagnostic performance overall, the researchers cautioned that the research studies were also highly heterogeneous, with extensive variation in methodology, terminology, and outcome measures.

"While the results demonstrate that [deep learning] currently has a high diagnostic accuracy, it is important that these findings are assumed in the presence of poor design, conduct, and reporting of studies, which can lead to bias and overestimating the power of these algorithms," wrote first author Dr. Ravi Aggarwal, corresponding author Dr. Hutan Ashrafian, and colleagues.

The study is the latest to address the potential shortcomings of AI research in radiology, which have included the following:

In this meta-analysis, the researchers sought to quantify the diagnostic accuracy of AI in specialty-specific radiology applications, as well as to assess the variation in methodology and reporting of deep learning-based radiological diagnosis. After initially identifying nearly 12,000 abstracts for deep learning in medical imaging, Aggarval et al eventually winnowed the list down to 279 total studies, including 115 in respiratory medicine, 82 in breast cancer, and 82 in ophthalmology.

The researchers found high overall performance for radiology AI applications:

  • Diagnosing lung nodules or lung cancer on chest x-ray or CT: Area under the curve (AUC) range = 0.864-0.937
  • Diagnosing breast cancer on mammography, ultrasound, MRI, or digital breast tomosynthesis (DBT): AUC range = 0.868-0.909

The authors found high sensitivity, specificity, and AUC for algorithms in identifying chest pathology on CT scans and chest x-rays. Deep-learning algorithms on CT had higher sensitivity and AUC for detecting lung nodules, while chest x-ray algorithms produced higher specificity, positive predictive value, and F1 scores. In addition, deep-learning models for CT yielded higher sensitivity than those for chest x-ray in diagnosing cancer or lung mass.

In breast imaging, the researchers found generally high diagnostic accuracy -- and very similar performance by AI between modalities -- for identifying breast cancer on mammography, ultrasound, and DBT. Diagnostic accuracy was lower, however, for AI in breast MRI, perhaps due to small datasets and the use of 2D images, according to the authors. Utilizing larger databases and multiparametric MRI may increase diagnostic accuracy, they said.

Despite the results showing AI's high accuracy, it's difficult to determine if the algorithms are clinically acceptable or applicable, according to the researchers.

"This is partially due to the extensive variation and risk of bias identified in the literature to date," they wrote. "Furthermore, the definition of what threshold is acceptable for clinical use and tolerance for errors varies greatly across diseases and clinical scenarios."

The researchers found a large degree of variation in methodology, reference standards, terminology, and reporting. The most common variables included issues with the quality and size of datasets, metrics used to report performance, and validation methods, according to the authors.

The researchers offered five recommendations for improving the quality of future AI research:

  • Availability of large, open-source, diverse anonymized datasets with annotations
  • Collaboration with academic centers to utilize their expertise in pragmatic trial design and methodology
  • Creation of AI-specific reporting standards
  • Development of specific tools for determining the risk of study bias and applicability
  • Creation of an updated specific ethical and legal framework

'Frankenstein' datasets, other flaws hamper COVID-19 AI studies
Promising results were reported in 2020 for many artificial intelligence (AI)-based image analysis algorithms for COVID-19 applications. But none of these...
AI training sets often aren't geographically diverse
Clinical artificial intelligence (AI) algorithms across multiple disciplines -- including radiology -- are disproportionately trained on patient cohorts...
ECR 2020: Radiologists, data scientists needed for AI
Greater involvement by radiologists and more help from data scientists are needed to improve the performance of artificial intelligence (AI) algorithms...
Could imaging AI research be endangering patients?
Many research studies claim that artificial intelligence (AI) algorithms are better or equal to human experts for interpreting medical images. But many...
Most radiology AI studies lack proper validation
Radiology artificial intelligence (AI) algorithms must be properly validated on external image data before being used clinically for image analysis tasks....

Copyright © 2021