Machine learning may aid annotation of radiology reports

Nov 2, 2014

Thursday, December 4 | 11:50 a.m.-12:00 p.m. | SSQ11-09 | Room S403A

In this presentation, a study team will explore the potential of applying machine learning techniques to annotate or add structured data, such as an ICD-9 code, to unstructured data, such as text in radiology reports.

The group's investigation of machine annotation of radiology reports stems from a project to automatically identify incidental findings in the text of reports, said presenter Eamon Johnson of the department of electrical engineering and computer science at Case Western Reserve University.

"Over the past few years, trauma centers have been leading the way in management of incidental findings, but we know that the tracking of incidental findings identified in diagnostic reports is not perfect," Johnson told AuntMinnie.com. "So we started looking at ways to automatically identify reports containing incidentals."

The researchers found dozens of research studies that used "supervised" learning approaches to teach computational models (i.e., machine classifiers) how to analyze data. However, these studies are time-intensive to perform due to the need to have a physician review every example used to build the model, Johnson said.

As a result, they investigated the use of "unsupervised" learning methods in which examples are not reviewed by a physician. While this would lead to less confidence in the machine interpretation of the data than if it were manually reviewed by a physician, more data could be used to build the model.

"The folk wisdom in machine learning tells us that more data generally beats a better algorithm, so unsupervised approaches are an attractive direction," he said.

The group's effort revealed promise as well as challenges.

"We feel that some problems in medical text analysis are intractable and will always require manual review, especially considering ethical and legal concerns," Johnson said. "But we can help reduce the effort of annotating medical text by using unsupervised learning methods to augment the information available to the manual reviewer."