Data mining with natural language processing identifies mammo trends

Thursday, December 2 | 11:50 a.m.-12:00 p.m. | SSQ10-09 | Room S402AB
Mapping mammography trends can potentially measure the impact of screening mammography, provide information about the relative prevalence of breast diseases in a given setting, and help identify inappropriate ordering of high-cost imaging procedures, such as breast MRI exams.

However, the sheer volume of data generated makes the task of distilling any knowledge a considerable computational challenge, according to Supriya Gupta, MD, clinical research coordinator at Boston's Massachusetts General Hospital's Institute for Technology Assessment and Imaging Informatics.

In a scientific session focused on the use of informatics for education and research applications, Gupta will present the findings of a study to identify the most common breast pathologies and their notation in mammography reports.

Researchers at Massachusetts General Hospital used a software application that performed clinical data mining and natural language processing (NLP) (Leximer, Nuance Communications, Burlington, MA) in conjunction with ontology trees in Radlex to identify the imaging observations in nearly 75,000 mammography reports produced from 2004 through 2009.

The most common breast pathologies were identified, and their prevalence was noted in mammography reports. The data mining also identified the percentages of the most common findings, and the percentages of automated e-mail alerts generated to report the need for urgent clinical intervention.

Approximately 15% of reports containing the most common pathology terms generated e-mail alerts. The overwhelming majority of alerts were for reporting masses (80%), carcinomas (6%), fibroadenomas (4%), and suspicious nodules (4%). Data were charted for each of the seven calendar years, and trends in the increase in both mass and carcinoma were identified. However, urgent notifications fell significantly for fibroadenomas and necrosis.

Overall, the most common pathology terms identified were mass (80%), carcinoma (5%), suspicious nodules (3%), and fibroadenoma (3%). The key trend was that mass continues to grow as the most common pathologic finding of concern.

"Quality assurance and quality control can be enhanced through the use of interactive data mining. The systems using NLP can efficiently analyze interphysician reliability and consistency among both radiologists and referring physicians. It also enables evaluation of referring physicians on their breast exam ordering practices," Gupta said.

Page 1 of 569
Next Page