ChatGPT, Bard similar in suggesting best method for urologic imaging

Oct 23, 2023

ChatGPT and Bard are similar in suggesting the most appropriate urologic imaging method based on American College of Radiology (ACR) criteria, a study published October 20 in Current Problems in Diagnostic Radiology found.

Researchers led by Sishir Doddi from the University of Toledo in Ohio found that both chatbots had an appropriate imaging modality rate of over 60% and showed no significant difference from each other in the proportion of correct imaging modality selected.

“Nonetheless, both ... lack consistent accuracy and further development is necessary for implementation in clinical settings,” Doddi and colleagues wrote.

Large-language models such as OpenAI’s ChatGPT and Google Bard have been tools of interest within the past year for medical professionals, including radiologists. The idea is that one day, these tools will improve daily clinical workflows. While previous studies have shown that chatbots have a way to go before that becomes a viable option, they have also demonstrated their ability to provide responses that are in line with ACR criteria.

Doddi and co-authors investigated whether ChatGPT and Bard could provide consistent accurate responses for the best imaging modality for urologic clinical situations, as well as if they are in line with the ACR appropriateness criteria.

The criteria provide a series of 21 urologic clinical scenarios. For each scenario, the criteria provide one to six different prompts or presentations for these cases. For each scenario, all prompts were put into ChatGPT and Bard and rated as “not appropriate,” may be appropriate,” or “usually appropriate.”

The researchers found that for all urologic situations, the chatbots had a “usually appropriate” imaging modality response rate of 62%. They also had similar “not appropriate" response rates of 34% and 36%, respectively.

The team also found no significant difference in the proportion of correct imaging modality was found overall (p > 0.05). It also reported no significant difference in “usually appropriate” response between the chatbots for each individual urologic scenario.

ChatGPT, however, produced shorter responses than Bard for all urologic prompts. This included a response length of 1,595 characters compared with Bard’s 1,925 characters (p < 0.01). The team also reported that Bard had a significantly faster response time of 10.3 seconds compared with ChatGPT’s 13.5 seconds (p < 0.01).

Finally, the researchers found that Bard was not able to determine the appropriate imaging modality for three clinical scenarios, while ChatGPT had one accurate scenario where no appropriate imaging modality was determined.

The study authors suggested that while the chatbots were not consistent with their responses, they “have a promising role in aiding healthcare providers in determining the best imaging modality.”

The full report can be found here.