A team of researchers led by Dr. Nisha Sharma of Leeds Teaching Hospitals NHS Trust in the U.K. simulated the use of AI as a second reader by retrospectively applying an AI algorithm to over 40,000 screening mammograms and then comparing the software's results with that of the initial radiologist interpretation. They found that over 80% of the exams could have been accurately categorized as either normal or abnormal without requiring any additional interpretation.
"Our study shows that an AI algorithm is a viable option to replace the second human reader in the double reading of screening mammograms," Sharma said.
Standard of care
Although the practice isn't common in the U.S., double reading of screening mammography studies is the standard of care in many countries around the world. In the U.K., women ages 50 to 70 are eligible to receive breast screening under the National Health System (NHS) National Breast Screening program every three years. Screening mammograms are interpreted independently by two radiologists.
If the readers agree on the findings, this consensus result is used to categorize a mammogram as normal and the woman is then invited back for routine screening in three years. Women will be recalled for second-stage screening if their screening mammogram is considered to be abnormal by both readers. If the two readers have discordant findings, then a third reader or readers will review the study to determine the diagnosis.
This model is labor-intensive, however, and difficult to achieve due to the ongoing workforce crisis in the U.K., according to Sharma. Furthermore, 26% of breast radiologists in the U.K. are expected to retire in the next five years.
Replacing second human reader
As a result, the researchers sought to explore the potential role of utilizing AI software to replace the second human reader. They gathered 40,588 anonymized mammograms acquired at three breast screening centers from January 2012 to 2019. All of these mammograms had been interpreted via double reading with either a single human reader or group of readers providing arbitration for discordant opinions, Sharma said.
Next, the authors obtained the original human reading opinions and the patients' outcomes for recalled cases with pathology from the National Breast Screening information system. Sharma noted that the mammograms used in the study were a random sample and had not been used in developing or training the AI algorithm.
Of the 40,588 mammograms in the study, 1,216 (3%) had a discordant opinion that required arbitration. Overall, there were 358 biopsy-proven cancers and 40,230 normal exams. The recall rate was 4%, with a cancer detection rate of 8.5 per 1,000.
Simulated double reading
To simulate double reading, the AI algorithm's opinion -- normal or cancer -- was paired with the opinion of the first human reader to simulate a double reading process. The researchers then calculated the sensitivity, specificity, and discordant opinion rate.
After applying the AI algorithm to the test set, the researchers found that it agreed with the radiologist on whether to recall or not to recall the patient in 33,255 (81.9%) of the cases. The remaining 7,333 (18%) exams had discordant opinions.
|Diagnostic performance of AI on 40,688 screening mammograms
||Combined results of AI and reader 1
The combination of AI and reader 1 yielded a cancer detection rate of 8.4 per 1,000 and a recall rate of 4%, according to Sharma.
Using an AI algorithm to replace the second human reader would have allowed 81.9% of the women to obtain a definitive diagnosis of normal or abnormal, according to Sharma.
"Only 18.1% of cases would need the input of an additional human reader, providing a feasible solution to combat the workforce crisis within breast imaging," Sharma said. "This solution would allow the opportunity to create efficiency within the workforce."
A definitive diagnosis
Under their proposed model, screening mammograms that have matching normal interpretations by the human reader and AI would require no further human reads. These women would then go back to routine screening, while those considered to have abnormal studies would be required to attend second-stage screening, Sharma said. Cases that have discordant opinions would need to be reviewed by another human reader or a group of readers to decide on the final diagnosis of normal or abnormal.
Sharma acknowledged a number of limitations of their study, including its retrospective nature and relatively small number of cases. Also, interval cancer data was not included, she said.
Copyright © 2020 AuntMinnie.com