Researchers at Google Health made use of two large datasets, one consisting of more than 750,000 images from India and the other consisting of 112,120 images from the U.S. National Institutes of Health. Their goal was to form, train, and test AI models to tackle challenges such as interreader variability and subpar sensitivity in detecting key clinical findings on chest x-rays, an area of interpretation in radiology that can involve qualitative assessment.
A panel of radiologists was assembled to produce reference standards for specific abnormalities discernible on chest x-rays used to train the AI models.
Trials of the deep-learning models demonstrated that they function as well as radiologists in identifying four findings on frontal chest x-rays: fractures, masses or nodules, opacity, and pneumothorax.
Radiologist adjudication resulted in greater expert consensus of the labels used for model tuning and performance evaluation: The overall consensus rose from slightly more than 41% after the initial read of the x-ray to close to 97% subsequent to adjudication.