Natural language processing yields rad-path correlation

By Erik L. Ridley, AuntMinnie staff writer

December 19, 2018 -- The combination of open-source natural language processing (NLP) technology and commonly used radiology structured reporting systems can facilitate correlation between radiologic and pathologic findings, according to research presented at the recent RSNA 2018 meeting in Chicago.

Making use of NLP and the prostate and thyroid imaging reporting and data systems (PI-RADS and TI-RADS), researchers from the University of California, Davis created an automated radiologic-pathologic correlation system that enables radiologists to track their performance and anonymously compare their results with those of their peers.

"Using NLP and structured reporting, radiology and pathology report data can be mined to automate radiology-pathology correlation," said Dr. Geoffrey McWilliams, a fourth-year radiology resident who presented the study at the meeting.

Rad-path correlation

As many radiologists will tell you, comparing their imaging interpretations with corresponding tissue obtained at biopsy or surgery is a critical part of their job, according to McWilliams.

"If a radiologist suggests the presence of cancer, it's extremely helpful to know if cancer was actually present or not," he told

Correlation of radiologic findings with pathologic specimens helps radiologists and institutions calibrate their interpretations, improve their accuracy, and track their progress. Structured reporting systems for lesions -- such a PI-RADS, TI-RADS, and others -- are meant to convey objective estimates of the malignancy risk for a lesion.

Unfortunately, however, there currently isn't an automated system for linking biopsy results to the structured radiological estimate, McWilliams noted.

"Manually following pathology results, comparing to imaging interpretations, and processing the data is time-consuming and requires constant input and oversight that is often infeasible for the average practice," he said.

As a result, McWilliams and his mentor, Dr. Thomas Loehfelm, PhD, sought to create an automated radiologic-pathologic correlation system. Due to the relative simplicity of processing data from structured reports, the researchers elected to focus on reports produced using structured reporting.

System design

Using the Python programming language, the researchers first performed regular expression pattern matching to search radiology reports for the standard phrasing used in TI-RADS and PI-RADS. If TI-RADS or PI-RADS expression matches were found, then the TI-RADS or PI-RADS score was saved for use in the system. Because a radiology report can contain multiple lesions and there was no way to automatically link a specific lesion in the radiology report to a specific biopsy target, the report was classified based on the worst lesion described in the report, McWilliams said.

"For example, if a report contained three lesions -- two PI-RADS 3 and one PI-RADS 5 lesion -- then it would be classified as PI-RADS 5, since that was the worst lesion," he said.

To process the unstructured pathology reports, the researchers used the Apache Clinical Text Analysis and Knowledge Extraction System (cTAKES), an open-source NLP system that can extract data from clinical free text. cTAKES was used to process more than 100,000 pathology reports produced over 20 years at their institution, McWilliams said.

After mining 1,446 unique anatomical terms in the "specimen" section of the pathology report, the researchers found seven terms referring to the thyroid gland or its subparts and three terms referring to the prostate gland or its subparts. They also used cTAKES to mine the "diagnosis" field of the pathology reports for 30 terms referring to thyroid malignancy and 18 terms referring to prostate malignancy. The clinical significance of the cancers was determined by reviewing thyroid terminology with an endocrine surgeon and by extracting the combined Gleason score for prostate cancers, McWilliams said.

"The data that we acquired in cTAKES is then used for ongoing processing of pathology reports with Python and within our rad-path database," he said.

Rad-path dashboard

Using a web browser, radiologists can then view their performance results with the automated rad-path dashboard. For example, radiologists can visualize -- on interactive JavaScript charts from the Highcharts charting library -- their cancer detection rate by PI-RADS level for clinically significant prostate cancers per patient biopsied, compared with the rest of their section and also with the expected PI-RADS cancer detection rate. They can also see if they are prone to undercalls or overcalls by PI-RADS category.

Radiologists can access the dashboard on the local network or remotely via a virtual private network (VPN) through a personal login page. The system is in clinical use at their institution, McWilliams said.

"The implications of this project are already being felt, with plans to build automated radiologic-pathological correlation tools across many different sections of diagnostic and interventional radiology," he said. "We expect that similar tools will be deployed across multiple sections within radiology over the coming year. This project also demonstrates a reproducible method for building a radiologic-pathologic correlation tool, and we would welcome any inquiries from institutions wishing to implement such a system."

Copyright © 2018
Member Sign In:
MemberID or Email Address:  
Do you have a password?
No, I want a free membership.
Yes, I have a password:  
Forgot your password?
Sign in using your social networking account:
Sign in using your social networking