NIH issues huge database of CT scans for AI testing

Jul 19, 2018

2018 07 20 20 30 2455 Ct Database 20180720202302

Following its recent release of a massive database of chest x-rays, the U.S. National Institutes of Health (NIH) has now made nearly 10,600 CT scans publicly available to support the development and testing of artificial intelligence (AI) algorithms for medical applications.

Called DeepLesion, the massive storehouse of imaging data was created by Dr. Ronald Summers, PhD, and colleagues at the NIH by culling clinically relevant annotations from CT scans previously acquired by radiologists at their institution. Summers is a senior investigator and staff radiologist at the NIH Imaging Biomarkers and Computer-Aided Diagnosis Laboratory.

These annotations are often complex and involve a collection of arrows, lines, segmentations, and text that describe the size and location of lesions so that clinicians can monitor changes, according to the NIH. Annotating medical images demands extensive clinical experience, and organizing this information manually can be time-consuming.

Indeed, the paucity of large databases of medical images available to train AI algorithms is believed to be one of the major stumbling blocks for the technology. Summers and colleagues helped ameliorate the situation -- at least with respect to x-ray -- with the release last year of the ChestX-ray8 database, a collection of 100,000 x-ray images.

DeepLesion may help bypass these obstacles by supplying a sufficiently robust database of CT scans and accompanying annotations to train deep neural network algorithms. The NIH suggested that this could one day "enable the scientific community to create a large-scale universal lesion detector with one unified framework."

In all, the database includes approximately 10,600 studies from more than 4,400 unique patients at the NIH Clinical Center in Bethesda, MD. Whereas most current databases contain 10 to several hundred lesions of a single type, the group designed DeepLesion to hold more than 32,000 lesions, covering a wide variety of radiological findings such as lung nodules, enlarged lymph nodes, and liver tumors.

The NIH DeepLesion database includes more than 32,000 CT slices. Image courtesy of the NIH.

With a multicategory lesion database, DeepLesion offers researchers the opportunity to develop AI algorithms capable of automating radiological detection and diagnosis for multiple lesion types, the NIH noted. It can also open the possibility of a universal lesion detector that would serve as an initial screening tool and would send its results to other, more specialized algorithms. Furthermore, researchers may be able to study the relationship between distinct kinds of lesions on the same CT scan for whole-body assessment of cancer burden.

To begin demonstrating this potential, Summers and colleagues used the DeepLesion database to train a prototypical universal lesion detector to spot various kinds of lesions. Their detector achieved a sensitivity of 81.1%, with five false positives per image (Journal of Medical Imaging, July 20, 2018).

The researchers plan to continue adding images to DeepLesion to improve the accuracy of their detector, and they hope to include MRI scans in the database as well as combine data from multiple hospitals in the future. Beyond lesion detection, the database may help train algorithms to classify lesions and predict lesion growth based on existing patterns, the team believes.

The full database is available for download as of July 20.