LLM shows promise in translating radiology reports for patients

Large language models (LLMs) could help create accurate, patient-friendly radiology reports, suggest findings published April 9 in the Journal of the American College of Radiology.  

Researchers led by Marc Succi, MD, from Mass General Brigham in Boston, found that GPT-o1 (Open AI) simplified emergency radiology reports that were rated as accurate, clear, and useful for patients. It also produced translations across Spanish, Arabic, and Mandarin languages that medical interpreters preferred over Google Translate. 

“These findings suggest that LLMs could serve as tools to enhance health literacy for [limited English proficiency] populations,” Succi and colleagues wrote. 

Radiology reports often contain complex medical jargon that can be difficult for patients to understand, especially for patients with limited English proficiency. The researchers highlighted that need for patient-friendly language and multilingual interpretation in radiology reports, citing increased access to electronic health records. 

The Succi team studied how GPT-o1 performed for these tasks and compared the translated reports to Google Translate in Spanish, Arabic, and Mandarin languages. 

The study included 30 deidentified emergency radiology reports. Three board-certified emergency radiologists evaluated reports that were simplified by GPT-o1. From there, nine medical interpreters (three per language) assessed outputs by GPT-o1 and Google Translate. Both groups rated the outputs on a five-point Likert scale. 

This made for a total of 90 evaluations of the simplified radiology reports. Of these, the evaluators categorized 69 reports (76.9%) as either “extremely accurate” or “very accurate.” They also categorized 41 reports (45.6%) as “very clear” and 89 reports (98.9%) as useful for patient understanding. 

GPT-o1 achieved significantly higher translational accuracy over Google Translate. This included Likert scores of 4.0 for all respective languages translated by GPT-o1 and scores of 3.0 for all respective languages translated by Google Translate (p < 0.001). The same trends were prevalent for the highest language register, which uses more formal and elaborate language. 

Finally, the interpreters rated translations by GPT-o1 as more comprehensible than Google Translate in all languages tested. 

This work by LLMs could help address health disparities and improve adherence to follow-up recommendations, the study authors highlighted. However, they called for human oversight due to the inherent risks of using AI interpretation. 

“These tools should function strictly as clinical adjuncts rather than autonomous systems,” the authors wrote. “Integration of LLM-generated summaries for patients into the radiology workflow could support shared decision-making and enhance overall medical transparency when combined with expert interpreter and translation services.” 

They called for future research to directly assess patient understanding and health outcomes when exposed to these radiology reports generated by LLMs. The authors also wrote that future research should explore how LLM-generated summaries can be integrated into electronic health record systems in coordination with professional interpreter workflows. 

Read the full study here.

Page 1 of 389
Next Page