Can AI diagnose pneumonia better than radiologists?

By Erik L. Ridley, AuntMinnie staff writer

November 17, 2017 -- An artificial intelligence (AI) algorithm developed by Stanford University researchers can detect 14 types of medical conditions on chest x-ray images, and it's even able to diagnose pneumonia better than expert radiologists, according to a paper published November 14 on

Their model, called CheXNet, is one of the first to benefit from a public dataset of chest x-ray images released by the U.S. National Institutes of Health (NIH) Clinical Center in September to stimulate development of deep-learning algorithms. That dataset of more than 100,000 frontal-view chest radiographs -- labeled with up to 14 possible pathologies -- was released along with an algorithm that could diagnose many of those pathologies. The NIH hoped the initial algorithm would inspire other researchers to advance the work.

Thanks to the NIH's ChestX-ray14 dataset, the Stanford team made rapid progress in developing CheXNet. After just a week of training, the deep-learning model was able to diagnose 10 of the 14 pathologies more accurately than the initial NIH algorithm, and within a month, CheXNet was better for all 14 pathologies.

What's more, it could also diagnose pneumonia more accurately than four Stanford radiologists, according to the team led by Pranav Rajpurkar and Jeremy Irvin of Stanford's Machine Learning Group.

Diagnostic challenges

A chest x-ray is currently the best available method for detecting pneumonia. Spotting pneumonia can be challenging, however, due to an often vague appearance and an overlap with other diagnoses. It may also mimic many other benign conditions, according to the Stanford group.

The researchers used the ChestX-ray14 dataset to train the 121-layer dense convolutional neural network. Of the 112,120 x-rays, 80% were used for training and 20% were reserved for validation of the model. The images were preprocessed prior to being input into the network, and the training data were also augmented with random horizontal flipping.

In addition to calculating a percentage probability of pneumonia, the model provides a "heat map" on the x-ray that shows the radiologist the areas of the image that are most indicative of pneumonia and other pathology.

To see how the algorithm would compare with radiologist interpretations, the researchers had four practicing academic radiologists independently annotate a subset of 420 images from the NIH dataset for possible indications of pneumonia. For the purposes of the study, the majority vote of the radiologists was considered to be the ground truth. Receiver operating characteristic (ROC) analysis showed that CheXNet had an area under the curve of 0.788.

"The sensitivity-specificity point for each radiologist and for the average lie below the ROC curve, signifying that CheXNet is able to detect pneumonia at a level matching or exceeding radiologists," the authors wrote.

ROC analysis also showed that the algorithm achieved the highest performance for all 14 pathologies among any research published so far from the NIH chest x-ray dataset, according to the group.

CheXNet has the potential to reduce the number of missed pneumonia cases and significantly accelerate workflow by showing radiologists where to look first, according to the researchers. They also hope it may be useful in areas of the world that might not have easy access to a radiologist.

In addition to the CheXNet work, Stanford's Machine Learning Group has been developing AI algorithms for diagnosing irregular heartbeat and mining electronic medical record data.

Deep learning shows promise for bone age assessment
A deep-learning convolutional neural network algorithm can estimate skeletal maturity on pediatric hand radiographs as accurately as expert radiologists...
Chest doctors issue guidelines for CT lung cancer screening
An expert panel of chest physicians has updated its guidelines for low-dose CT lung cancer screening based on a review of results from various trials....
AI algorithm predicts lung disease, adverse events on CT
Recent studies have demonstrated the power of artificial intelligence (AI) software to detect disease, but how well can it predict future adverse events?...
NIH opens massive x-ray database to propel AI research
Optimizing artificial intelligence (AI) algorithms for medical applications demands immense stores of imaging data. Researchers may now begin sating that...
NIH releases massive database of chest x-rays
With the goal of spurring research into medical applications of artificial intelligence, the U.S. National Institutes of Health (NIH) has released to...

Copyright © 2017

57 comments so far ...
11/20/2017 12:05:51 PM

Interesting article about the Stanford group's CheXnet project for looking at chest radiographs. The full paper can be found here: which is a non-peer-reviewed site. Perhaps this is a prepublication edition.
Supposedly, the algorithm can identify chest pathology "better than a radiologist" which is maybe true in a narrow sense, as the machine's ROI did beat one of the four rads who read the sample dataset. But there are a few methodological problems with the study in my humble opinion, which are better voiced by a very sharp fellow named Luke Oakden-Rayner, MD, PhD Candidate. (His blog,, is a must-read for all interested in this topic.) He notes that the detection of pneumonia is not quite as cut and dried as the Stanford authors imply. Throw in a bit more confusion with atelectasis, consolidation, collapse, and so on, and the results might not be quite what they seem. Also, to quote Dr. O-R,
They used text mining in the radiology reports to get the labels. So even accepting the confusing labels, the NIH team estimates that they are only 90% accurate compared to the original reports (which they are unable to release for privacy reasons)....
They also have lots of x-rays from the same people, often ten or more films taken across several days. These x-rays are almost indistinguishable from each other, and have very similar labels.
The paper does not really go into how the labels are, well, labeled. The images presented in the paper show color overlays to let us know where the region of interest lies as shown in the image above.
My thoughts go immediately to how this nascent technology can help us in Radiology. Without seeing the actual labels, and just how localized they are, I'm thinking that a more mature version of CheXNet could be used to provide a color overlay to help direct us toward pathology on the radiograph. A super-CAD if you will. But I would add a caveat. What if the computer sees one pathology and not another? Even if it's more accurate than a human reader, this is still in the realm of possibility. And so the human rad must be as vigilant, or even more vigilant, about what's beyond the red zone, lest he succumb to the Forest for the Trees syndrome.
I should note that Andrew Ng has had a very prolific career in IT and AI, mainly with commercial entities, although he is on the Stanford faculty and founded the educational website Coursera. He has stated a number of times that Radiology will be taken over by AI, but to the best of my search abilities, this paper represents his only real foray into this space. Similar claims have come from other non-radiologists like Zeke Emanuel and even non physicians like Gregory Hinton. Even this paper was written with the assistance of radiologists, who were guinea pigs readers, but apparently not BY radiologists. I have to think that those with overly optimistic opinions don't really understand what radiologists do, and/or have a financial (or vindictive) reason to push the hype of AI.
Still, this represents an opportunity...What have the Stanford rads done with the AI group other than serve as test subjects? Have they asked for further development along the lines of how the AI techniques can symbiotically (if you can have -biosis with a machine...) make us better rads? If they haven't, they should. Anyone know any rads in that program?
CheXNet has a long way to go, but it does demonstrate a number of interesting concepts. Let's see what WE can do with it.

11/20/2017 12:35:38 PM
Interesting take. I'm with you on that I think AI will be a helpful tool for some time to come. The only way AI takes over for any modality is it can be proven that AI alone = AI + person. 
For any med students reading, to think that an algorithm like this would supplant radiologists anytime soon is crazy. Not only does it sound like there were big methodological problems with this study, but even if the machine is "better", it would have to be proven in more robust studies, then take a few years for FDA approval to supplant radiologists. And that's only for one disease, and one modality. In addition, much of what we do as radiologists is assess disease progression or regression - so far none of the AI applications have addressed this. Let alone comparing a CXR to a chest CT to a PET. 
This algorithm was for pneumonia only. Consider all the other diseases out there (they name 14, but think of all the other ones out there that are evident at CXR ) pericardial calcification, pleural plaques, destructive bone lesions just to name a few. 
The other aspect I have questions about is artifacts - while adversarial training and ensembles of algorithms may help machines adapt to changes in pixels and portions of images, the sheer magnitude of variability in artifacts in imaging poses a significant challenge. Think of all the ways a piece of tubing, or a hair braid, could lie across the chest on a CXR and screw up the algorithm. Without knowing a great deal about machine learning, I suspect it will be some time before these technologies will be able to fly on their own. My gut is that diagnostic only radiologists will only be obsolete when a general AI is a mature technology, at which time humanity is probably screwed anyhow.  
The one thing that concerns me is corporatization + AI. The way it stands now, if someone came up with machine learning that could read, say a knee XR without supervision, private rad groups would simply purchase a license, and would need less manpower. We'd have a bit slower workday until growth caught up - no displacement of rads. But with corporate radiology, less work for humans = layoffs. I suppose something similar could happen with unscrupulous PP groups and their associates. You could argue that the hospital would purchase the license and cut us out, but that's basically a turf battle - we would point out that we provide 24/7 IR services and all other imaging services, and that we want that business as well. In general if you aren't a line item cost on a hospital's balance sheet, they don't care. 
For what it's worth, my humble advice for newly minted rads is this - make your face seen by doctors and admins. Encourage the MDs to come to the reading room to go over cases. Drop what you're doing when they walk in and attend to their needs. Make them rely on you to guide them through the studies and show them the findings. Those are things that they will come to depend on, and that a machine won't be able to do for decades, if ever. And you may find your workday more satisfying than sitting in the dark cranking out reports and talking to no one. 

11/20/2017 12:52:29 PM
I think the pneumothorax picture kind of proves a great point.  A patient with a chest tube for a pneumothorax would be expected to have some residual pneumothorax in the immediate post-treatment period.  Setting off a bunch of emergency alerts for someone that is being treated for the condition isn't useful. 
Would love for the system to page a doc and emergency response team at 3AM for a patient with a chest tube and a small pneumothorax.
It's interesting that they would use an example of why this system fails as a proof-of-concept....

11/20/2017 1:17:12 PM
Great. Here's my stack of 50 ICU films that I read for free. Have at it, I'll verify your crappy false-positive-galore prelim reports.

11/20/2017 2:24:01 PM
Quote from deadwing

Great. Here's my stack of 50 ICU films that I read for free. Have at it, I'll verify your crappy false-positive-galore prelim reports.

Exactly. Could make us more productive and profitable, as well as better at what we do. 
Sad thing about "he who is not to be mentioned" from the other thread that is currently in the penalty box, is that there was some wheat in all the chaff. I think there are possibilities of losing some business. The ICU portables wouldn't break my heart at all. But what about practices where the ER reads XR overnight and rad reads the next day? ER might decide the AI is good enough backup, and start reading/billing their own. Orthopedists who have MRI and have radiologists read them may decide AI is sufficient. We need to be prepared to mitigate the losses ahead of time - in my own group, for example, we currently use nighthawk services. I think we're going to have to internalize our nights, thus reading the XR in real-time so the ER depends on it.