Artificial intelligence (AI) algorithms and clinicians are virtually equal when diagnosing fractures on x-rays and CT scans, and AI may be a useful aid for clinicians in busy emergency departments, according to a study published March 29 in Radiology.
In a systematic review and meta-analysis, a group at the Botnar Research Centre in Oxford, England, compared the diagnostic performance in fracture detection between AI and clinicians in peer-reviewed publications, including the gray literature. The authors found no significant differences between the two, and they suggest AI algorithms hold promise as diagnostic adjuncts in future clinical practice.
"[AI] could also be helpful as a 'second reader,' providing clinicians with either reassurance that they have made the correct diagnosis or prompting them to take another look at the imaging before treating patients," said corresponding author Dr. Rachel Kuo, in an RSNA news release.
Between April 2019 to April 2020, 1.2 million patients presented to an emergency department in the U.K. with acute fractures or dislocations, an increase of 23% from the year before. Missed or delayed diagnosis of fractures on radiographs is a common diagnostic error, ranging from 3% to 10%. Thus, an increasing number of studies apply artificial intelligence (AI) techniques to fracture detection as an adjunct to clinician diagnosis, the authors wrote.
Previous narrative reviews have reported high accuracy for deep learning in fracture detection, yet just one meta-review of nine studies has reported pooled estimates, with findings of 87% for sensitivity and 91% for specificity.
With this backdrop, the U.K. group aimed to take a deeper dive. They extracted contingency tables from 42 studies to construct hierarchical summary receiver operating characteristic (ROC) curves and calculated pooled sensitivities and specificities to compare performances between AI and clinicians.
Thirty-seven studies identified fractures on radiographs, of which 18 focused on lower limb, 15 on upper limb, and four on other fractures. Five studies identified fractures on CT images.
Thirty-six of the studies developed and internally validated an algorithm, and nine of these studies also externally validated their algorithm. Sixteen studies compared the performance of AI with expert clinicians, seven compared AI to experts and nonexperts, and one compared AI to nonexperts only. Six studies included clinician performance with and without AI assistance as a comparison group.
Results indicate AI had high reported diagnostic accuracy, with a pooled sensitivity of 91% and specificity of 91%. In addition, AI and clinicians had comparable performance at external validation, with a pooled sensitivity of 94% and specificity of 94%, according to the findings.
"The results from this meta-analysis cautiously suggest that AI is noninferior to clinicians in terms of diagnostic performance in fracture detection, showing promise as a useful diagnostic tool," the authors stated.
The authors noted that there were significant flaws in study methods among the 42 studies that may limit the real-world applicability of their findings. For instance, it is likely that clinician performance was underestimated, with only one study providing clinicians with background clinical information, the authors wrote.
"External validation and evaluation of algorithms in prospective randomized clinical trials is a necessary next step toward clinical deployment," they concluded.
In an accompanying editorial, Dr. Jeremie Cohen, PhD, of the University of Paris, and Dr. Matthew McInnes, PhD, of the University of Ottawa in Ontario, Canada, wrote that the authors should be commended for their efforts and noted one of the strengths of the review was the authors' use of "state-of-the-art methods." These included the prospective registration of their protocol, a specific quality assessment tool, dedicated statistical methods for synthesis of diagnostic test data, and complete reporting.
Yet they picked up where the study authors left off discussing methodological flaws of the studies included in the review.
"AI may have advantages over humans because machines are not influenced by stress and fatigue and are not limited by interrater variability. However, AI must be compared with radiologists under fair experimental conditions that reflect typical clinical practice before conclusions can be drawn regarding comparative performance," they concluded.