ChatGPT-4 delivers differential diagnosis on abdominal imaging

Nov 16, 2023

Shawn Sun, MD, will share details of a study that took 70 gastrointestinal and genitourinary imaging cases from a radiology textbook, converted the case images and history into standardized prompts, and fed the prompts into both the ChatGPT-4 and ChatGPT-3.5 large-language models.

The accuracies for the top one and top three differential diagnoses were defined as the percentage of ChatGPT-generated responses that matched the original diagnosis and the complete differential provided by the original literature, according to Sun, et al.

While the two generations of ChatGPT were able to produce a differential diagnosis for prompts containing descriptive radiological findings, the responses matched the expert literature very little, though a statistically significant improvement was made in the top 1 diagnosis accuracy from 3.5 to the 4th generation algorithm.

An additional differential diagnosis score was defined as the proportion of differentials that matched the original literature’s answers for each case. The top 1 accuracy and top 3 accuracy of differential diagnoses for ChatGPT-3.5 versus ChatGPT-4 were 35.7% compared with 51.4% (p = 0.031) and 7.1% compared to 10% (p = 27), respectively. The average differential diagnosis score of ChatGPT-3.5 versus ChatGPT-4 was 42.4% compared with 44.7% (p = 0.39). ChatGPT-3.5 and ChatGPT-4 hallucinated 38.2% versus 18.8% (p = 0.0012) of the references provided and generated 23 total false statements versus four total false statements, respectively.

Sun et al also observed the hallucination effect as more common in the citations produced by the algorithm than in statements made by the algorithm, with the hallucination effect improved with ChatGPT-4 compared with ChatGPT-3.5.

Get all the details by sitting in on this Tuesday afternoon talk.

ChatGPT-4 delivers differential diagnosis on abdominal imaging

Road to RSNA 2023: Imaging Informatics Preview

AI reduced CT brain reporting time in teleradiology

Seeking trustworthy automated detection for hip implant

Locally derived calibration sets prove useful for predicting AI performance

Road to RSNA 2023: Imaging Informatics Preview

AI reduced CT brain reporting time in teleradiology

Seeking trustworthy automated detection for hip implant

Locally derived calibration sets prove useful for predicting AI performance

Novel method aims to stop white-box, black-box attacks on AI diagnosis models

Are LLMs effective without specific training?

NLP tool helps make radiology reports more readable

AI model shows promise for spotting errors in CXR reports

NLP technique eases reporting requirements

Macro tool tracks incidental findings for follow-up

Machine-learning model prioritizes STAT imaging orders

The cloud can enable better radiology AI integration