Researchers explore concept of locally run LLMs

Oct 10, 2023

Researchers have shown that it may be feasible to run large language models (LLMs) to generate x-ray reports on local systems without compromising patient privacy, according to a study published October 10 in Radiology.

A team at the National Institutes of Health (NIH) led by Ronald Summers, MD, PhD, assessed the feasibility of using a locally run LLM called Vicuna-13B. The model labeled specific findings on chest x-ray reports from a large NIH data set and performed well compared with standard labeling tools, the group found.

“This proof-of-concept study showed that it may be feasible to use publicly available large language models, such as Vicuna-13B, run locally in a privacy-preserving manner,” Summers noted.

LLMs such as ChatGPT have garnered substantial attention due to their ability to perform language generation and language understanding tasks, including question answering and conversion of free text to structured formats, the researchers explained.

However, these models are not suitable for use with radiology reports due to patient privacy constraints. They require the user to send data to OpenAI sources for processing, they added.

Vicuna is an open-source chatbot developed by researchers at the University of California, Berkeley. The model was created by fine-tuning a previously released foundational LLM (Llama by Meta and Microsoft) using approximately 70,000 user-shared ChatGPT conversations.

Of the open-source LLMs available, the researchers chose Vicuna because preliminary evaluations have reported that it performs comparably with ChatGPT, the researchers noted. In this study, the Vicuna model was used as is, with no additional training or adjustments, they added.

The group tested Vicuna using two publicly available and de-identified data sets, one comprised of 3,269 chest x-ray reports (MIMIC-CXR, developed by a group at the Massachusetts Institute of Technology) and the other an NIH data set consisting of 25,596 reports.

Using two prompts for two tasks, the researchers asked Vicuna to identify and label the presence or absence of 13 specific findings on the reports, including cardiomegaly, edema, fracture, lung lesion, and pneumonia, for instance. The researchers then compared the LLM’s performance with two widely used natural language processing labelers (non-LLMs), namely CheXpert and CheXbert.

According to the findings, a statistical analysis showed that Vicuna’s output achieved moderate to substantial agreement with the labelers on the MIMIC-CXR data set (κ median, 0.57 ) and NIH data set (κ median, 0.52). In addition, Vicuna performed at par (a median area under the curve [AUC] of 0.84) with both labelers on nine of 11 findings.

“Our study demonstrated that the LLM’s performance was comparable to the current reference standard. With the right prompt and the right task, we were able to achieve agreement with currently used labeling tools,” Summers said.

Ultimately, Summers noted that LLMs like Vicuna could be run locally to extract features from the text of radiology x-ray reports and combine them with features from images to answer clinical questions.

“LLMs that are free, privacy-preserving, and available for local use are game changers,” he said.

The full article is available here.