Natural language processor autogenerates standardized knee MRI reports

Nov 17, 2010

Tuesday, November 30 | 11:00 a.m.-11:10 a.m. | SSG08-04 | Room S402AB

For all their attributes, structured reporting templates can be cumbersome and tedious for radiologists to use compared with conventional dictation or dictation using speech recognition systems and macros.

In this presentation, a research team from Stanford University School of Medicine in Stanford, CA, will describe how they developed and validated a natural language processor (NLP) that identifies semantic content in knee MRI statements from unstructured text, and automatically generates full, structured knee MRI reports.

"No, it isn't magic," Bao H. Do, MD, a clinical instructor in the department of diagnostic radiology, told AuntMinnie.com.

First, the natural language processor processes the whole-knee MRI report. It uses a lexicon of "signals," regular expressions that specify anatomy, disease terms, or findings.

Each sentence in the report is assigned to one of eight categories in a standardized knee MRI template.

The semantic categories include joint/effusion/synovitis/loose bodies, menisci, cruciate ligaments, collateral ligaments, extensor mechanism, cartilage, bone and marrow, and miscellaneous, a category that includes muscle, tendon, and Baker's cyst.

Do will explain how the research team developed the NLP. Two musculoskeletal subspecialists initially reviewed approximately 2,000 sentences from 125 knee reports prepared between 2005 and 2009 to develop the eight semantic categories. An additional 25 knee MRI reports were randomly selected for validation, and accuracy was assessed.

In the validation study, the NLP classified 381 sentences to the eight categories. Ten sentences in nine reports were inaccurately categorized, with an overall accuracy of 97% per sentence and 64% per report.

"We are encouraged by these accuracy rates, but these rates are likely to fall as we expand the capabilities of our NLP to process all types of musculoskeletal reports, not only the knee," Do said.

At some point in the future, Do expects the development team to integrate the NLP with its speech recognition system. That would enable it to assign sentences to each semantic category in real-time.

"The significance of this is that the system might provide real-time decision support. For example, if the NLP identifies that a radiologist is describing an abnormal meniscus, the NLP could retrieve classification schemes of meniscal injuries, statistics, and images to aid the radiologist," he said.

"The key to designing NLP-based automated report structure software is to combine computer programming and informatics knowledge with domain-specific knowledge. The software designer must be familiar with and understand how radiology reports are generated, and where, when, and why a radiologist uses a specific term," Do explained.