June 21, 2018 -- How can radiologists make the most of artificial intelligence (AI) software in breast ultrasound? By using the software concurrently during the interpretation process, according to research presented at the recent Society for Imaging Informatics in Medicine (SIIM) annual meeting.
Researchers led by Lev Barinov of AI software developer Koios Medical, Princeton University, and Rutgers University Robert Wood Johnson Medical School found that three radiologists produced significantly higher diagnostic performance when interpreting breast ultrasound exams concurrently with AI software than they achieved without the aid of the software. That wasn't the case, however, when the radiologists only viewed the software's analysis after providing an initial diagnosis.
What's more, the concurrent reading paradigm decreased interreader variability and improved the performance of a less-experienced radiologist to nearly the level of the expert.
AI's real-world impact
"Measuring standalone AI performance without considering the physician involvement in how they're using it isn't really a sufficient metric to evaluate the performance or impact," Barinov said. "It's understandable to start with just [measuring] the system performance against a ground truth; I think that's where we all started when designing these types of systems. But as we think about implementation and utilization [of AI], we have to move closer and closer to [assessing] the physician-system interface."
In their study, the researchers sought to evaluate two reading paradigms for implementing radiology AI. In the "second-read" approach, the physician makes the initial diagnosis and the software then provides confirmation or a second opinion. This has been the traditional workflow used with mammography computer-aided detection (CAD) software, for example, Barinov said.
The second-read model has a number of benefits; it's efficient, less expensive, and currently the accepted methodology in the industry, he said. On the downside, the use of AI as a second reader can be affected by confirmation bias and may not reflect clinical practice for certain types of diagnoses, he said.
On the other hand, using AI concurrently provides the radiologist with system-generated analysis along with the imaging studies during the initial interpretation. Previous studies have found that viewing multiple imaging modalities concurrently offers equivalent results to the second-read workflow. Concurrent reading does have the potential to eliminate any issues with confirmation bias, but this comes at the cost of being more time-consuming than the second-read model, Barinov noted.
Comparing reading methodologies
The researchers set out to test both reading methods to establish their effects on radiologist accuracy and variability. They also wanted to assess the system's performance compared with radiologists. Three radiologists -- with three, more than 10, and more than 20 years of breast ultrasound experience, respectively -- used the Koios DS software to review 500 pathology-proven breast ultrasound studies. After rendering an initial diagnosis, they were immediately presented with the software's analysis and could change their diagnosis. Four weeks later, the radiologists again read the cases, but this time concurrently with the software's analysis.
Receiver operating characteristic analysis showed that the AI software outperformed all physicians at baseline by an average of 11.5% in the area under the curve (AUC). The use of the software as a second reader led to a small -- but statistically insignificant -- improvement in performance for the three readers. The radiologists did, however, gain a statistically significant improvement in performance over their initial baseline interpretation by using AI concurrently.
|Effect of breast ultrasound AI software on radiologist performance|
|Using AI as second reader||Using AI concurrently in interpretation|
|Average reader improvement in AUC from use of AI||3.3%||7.1%|
"Contrary to a lot of the prior studies within diagnostic decision support, [our research shows that] you can't really call these two different [reading] methodologies equivalent anymore," he said. "Whether it's for a clinical study approval for the [U.S. Food and Drug Administration] or just an academic research study that's happening at an institution, I think it's important to tuck this information away in the back of your mind."
The researchers also assessed interreader and intrareader variability. While both the second-read and concurrent approaches reduced variability from the baseline interpretation, concurrent use led to the highest level of interreader agreement.
"This makes sense, since you're effectively having them listen to a better-performing system more often in one methodology than another," he said. "And in this case, the agreement that they had with the concurrent reader system was actually higher than the intrareader variability. So if you tested these readers over a four-week period, they will change their own minds more than they will as a collective group, which is really interesting."
The system was able to increase reader accuracy as well as reader consistency or agreement across a group of readers with varying levels of experience, according to Barinov. The radiologist in the study who had three years of experience was able to raise his performance close to the level of a highly experienced reader just by using the system, he said.
"That's a very exciting notion that you can cross this experience gap just by using a new system," he said.
Barinov noted that the superiority of concurrent reading has practical implications for how these types of systems are implemented into the clinical workflow.
"This is where, as we move forward as an industry, we can think about what's the standard for how we display these results, for example, in the diagnostic space," he said.