Can ChatGPT support diagnostic imaging decisions?

Jul 6, 2023

2023 05 22 21 21 0293 Chat Gpt Artificial Intelligence Ai Head 400

A proof-of-concept study suggests that ChatGPT-4 can generate relevant differential diagnoses for specific imaging patterns, according to a group in Germany.

Researchers led by Jonathan Kottlors, MD, of the University of Cologne, tested the large language model's (LLM) ability compared with a panel of experts in four radiology subspecialties and found high concordance rates. The study highlights ChatGPT''s potential to support diagnostic decisions, the group wrote.

"One important benefit of the proposed approach is potentially significant time savings compared to traditional literature research, which might be particularly relevant for radiologists in training aiming to reconcile clinical productivity and continuous expansion of knowledge," the researchers wrote in an article published July 5 in Radiology.

Recognition of imaging patterns and their attribution to certain pathologies are key steps of the diagnostic process in radiology, with doctors often consulting relevant literature to verify or expand diagnoses. LLMs such as ChatGPT-4 by OpenAI allow accessing and contextualizing vast amounts of information.

In this study, the group hypothesized that leveraging ChatGPT-4's ability to comprehend and generate human-like text-information could be used to emulate the process of deriving important differential diagnoses for certain imaging patterns.

The researchers selected four imaging patterns with potential differential diagnoses in neuroradiology and abdominal and musculoskeletal radiology. They then entered text-based descriptions of the patterns into GPT-4 and prompted it to provide the top five most important differential diagnoses.

Example of a prompt and subsequent outputs of GPT-4 that attained lower concordance (60% [3/5]) and acceptance (80% [4/5]). Image courtesy of Radiology through CC BY 4.0.

Next, three experts in each subspecialty provided their consensus of the five most important differential diagnoses for each pattern. Experts were also asked to determine the number of AI diagnoses that were "acceptable."

According to the analysis, GPT-4 attained a concordance of 68.8% (55 of 80) with the experts at determining top differential diagnoses based on imaging patterns, and 93.8% (75 of 80) of differential diagnoses proposed by GPT-4 were deemed acceptable alternatives.

"Our investigation serves as a proof-of-concept for the ability of LLMs to generate relevant differential diagnoses for specific imaging patterns, and hence their potential for diagnostic decision support," the researchers wrote.

Further research is warranted, they noted.

"Our results are preliminary and prone to bias from the retrospective study design, requiring verification in a prospective, real-world setting, implying contextualization and integration of non-imaging clinical information," the researchers concluded.

The full study can be found here.