Because breast tissue density categorization is subjective, the findings suggest that AI could address the problem of variability in classifying breast tissue between radiologists, according to lead author Dr. Constance Lehman, PhD, of Massachusetts General Hospital (MGH) in Boston.
"We developed a deep-learning model that can perform a function that has been heavily human-dependent: assessing breast tissue density," Lehman told AuntMinnie.com. "We trained the model, tested it, and implemented it into clinical practice. What's exciting is that we now have a structure we can continue to refine."
Dr. Constance Lehman, PhD, from Massachusetts General Hospital.
Dense breast tissue is associated with increased cancer risk, and because dense tissue can mask lesions on mammography, more than 30 states have passed legislation that mandates women be notified of their tissue density. In addition, if they have dense tissue, they must be advised of supplemental screening options.
Density assessment can vary widely between mammographers, and deep-learning technology could help mitigate that variability, wrote Lehman's group. Although there are commercially available methods for categorizing breast tissue, they have produced mixed results when it comes to agreement between the method and the human reader. That's why artificial intelligence interests breast imagers.
"Deep learning has been gaining traction in radiology," the group wrote. "Specifically, there has been preliminary work with deep-learning methods to assess breast density. ... Our purpose was to develop a deep-learning algorithm we could use to reliably assess breast density and to measure the acceptance of its predictions in real-time clinical practice."
The researchers worked with a group led by Regina Barzilay, PhD, from the Massachusetts Institute of Technology in Cambridge to build an algorithm that could measure breast density. Their investigation included the following steps:
- A total of 41,479 digital mammograms taken between January 2009 and May 2011 at MGH were used to train the algorithm.
- Of these mammograms, 8,677 were reserved as a test set; these were interpreted for density by both the algorithm and 12 radiologists.
- The group then conducted a reader study in which five radiologists worked by consensus to assess tissue density on 500 mammograms taken from the larger test set; their results were compared with the algorithm's.
- Finally, within a clinical practice setting, the model and eight radiologists assessed density on 10,763 mammograms taken between January and May 2018.
Lehman's team used k statistics to evaluate the agreement between the algorithm and the three sets of readings (with a value of 1 equal to perfect agreement).
|Agreement between deep-learning algorithm and radiologist readers on breast density
||No. of mammograms
||Agreement (measured by k statistics)
|Deep-learning model compared with original interpreting radiologist (test set)
|Deep-learning model compared with reader consensus
|Deep-learning model compared with radiologist assessment in clinical practice setting
The researchers found that 94% of the time, the interpreting radiologists in the clinical practice setting and the algorithm agreed on density. Disagreement on the 6% of cases wasn't necessarily due to errors on the algorithm's part and could have been due to reader variability, they noted.
"Our deep-learning model was deployed in the mammography clinic to assess performance and acceptance in a large academic breast imaging practice," the authors wrote. "In this setting, the deep-learning model density assessment was accepted as the final reading in [more than] 90% of mammograms by an experienced breast imager."
Variability in breast tissue density assessments has long been acknowledged as a problem; it not only can cause anxiety in patients but also may result in unnecessary imaging, according to the authors. The application of artificial intelligence to this clinical task could help, but more work needs to be done, Lehman said.
"Going forward, we need to determine what the gold standard for deep learning will be, and how best to develop, test, evaluate, and implement it," she said. "We'll need to figure out what kind of studies will do this and find common ground between the computer science and medical science worlds."
Copyright © 2018 AuntMinnie.com