The Lung-RADS classification system for dealing with suspicious findings on CT lung cancer screening exams may be a double-edged sword, reducing false positives but also sensitivity, according to a new study published February 10 in the Annals of Internal Medicine.
In a retrospective analysis of the more than 50,000 subjects in the National Lung Screening Trial (NLST), researchers applied the Lung-RADS nodule classification criteria developed by the American College of Radiology (ACR) to the task of classifying nodules as positive or negative.
If the Lung-RADS scheme had been used in NLST, the false-positive detection rate would have been cut by more than half, to 12.8% at baseline screening, compared with the original nodule classification criteria applied in the trial. In subsequent screening years, the false-positive rate would have dropped by as much as 75% using Lung-RADS.
However, baseline sensitivity fell by nearly nine percentage points to 84.9%, compared with the NLST results, and it stayed low in subsequent screening years (Ann Intern Med, February 10, 2015).
Fortuitous findings
The new findings couldn't be timelier given the just-released final lung cancer screening guidelines from the U.S. Centers for Medicare and Medicaid Services (CMS), which must be followed for sites to secure Medicare reimbursement. The new rules stipulate the use of a standardized nodule classification system such as Lung-RADS in CT screening.
"Lung-RADS may substantially reduce the false-positive result rate; however, sensitivity is also decreased," wrote Paul Pinsky, PhD, from the U.S. National Institutes of Health (NIH), and colleagues from several other institutions. "The effect of using Lung-RADS criteria in clinical practice must be carefully studied."
Based largely on the 2011 results of NLST, which found a 20% mortality reduction from annual CT lung cancer screening, the U.S. Preventive Services Task Force (USPSTF) granted CT lung cancer screening a grade "B" recommendation in 2013. On February 5, CMS issued its final decision to reimburse for annual CT screening in asymptomatic individuals ages 55 to 77 years with a minimum 30-pack-year tobacco smoking history.
"In 2011, NLST came out showing that screening with low-dose CT could reduce mortality by 15% to 20% or so, and that was the good news; the not-so-good news was the very high false-positive rate of about 25%," Pinsky told AuntMinnie.com.
That meant that a quarter of the patients, nearly all of whom did not have cancer, would have to undergo additional imaging or biopsy in a few cases to confirm the finding.
"In part to reduce the high false-positive rate and in part just to standardize reporting for lung cancer screening as it comes into population-wide use ... the American College of Radiology embarked on this classification scheme they call Lung-RADS," Pinsky said. "It was based on data in the literature; not really a person-by-person analysis, but just looking at overall summary statistics from NLST and some other studies."
NLST was designed in 2002 with a nodule size cutoff of 4 mm as positive for cancer, but much has been learned since then. For example, a recent NLST analysis found that increasing the positive threshold to 6 mm or 8 mm would reduce the false-positive rate substantially while barely reducing sensitivity for cancer detection. The eventual design for Lung-RADS relied on several lung cancer trials to set a positivity point at 6 mm or larger.
Compared with the NLST criteria, Lung-RADS does two main things: It increases the size threshold for a positive baseline screening result from a 4-mm greatest transverse diameter to a 6-mm transverse bidimensional average (and to 20 mm for nonsolid nodules), and it requires growth for pre-existing nodules in order for a nodule to continue to be classified as positive.
How would Lung-RADS affect results?
"What we wanted to do is go back and say if we retrospectively applied this Lung-RADS criteria to NLST -- and NLST used a less restrictive criteria for a positive screen -- what would the false-positive rate be, and also the sensitivity?" Pinsky said.
The study used nodule- and participant-level data to apply Lung-RADS criteria to NLST. The researchers aimed to evaluate the effect of Lung-RADS on the performance of low-dose CT screening, including sensitivity, false-positive rate, positive predictive value (PPV), and negative predictive value (NPV). The group also analyzed the characteristics of cancer cases that were detectable using Lung-RADS criteria with those it would have missed.
Lung-RADS categories 1 and 2 (negative and benign appearance, respectively) are deemed negative screening results, and categories 3 (probably benign) and 4 (suspicious) are positive results.
Lung-RADS category 4 is subdivided as follows:
- 4A. Solid: ≥ 8 mm to < 15 mm; part solid: ≥ 6 mm with solid component ≥ 6 mm and < 8 mm
- 4B. Solid: ≥ 15 mm; part solid: solid component ≥ 8 mm
- 4AX. Category 3 or 4 nodules with additional features or imaging findings that increase suspicion of malignancy
The categories differ slightly between baseline and follow-up screening, with the emphasis shifting from size to whether nodules have grown since the last scan.
To apply Lung-RADS to the NLST results, the average diameter for NLST nodules was calculated as the mean of the longest diameter and the longest perpendicular diameter. The NLST attenuation classifications of soft tissue, ground glass, and mixed were mapped to the Lung-RADS classifications of solid, nonsolid, and part-solid lesions. Lung cancer was said to be present at screening if it was diagnosed within a year or before the next screening, or for positive results, if it was diagnosed later but not more than a year after diagnostic procedures.
Among screening results when cancer was present, most cases were category 4A (26.7%) or 4B (42.5%). Among screening results without cancer, most were classified as Lung-RADS category 1 (56.2%) or 2 (31.0%). Cancer prevalence generally increased with the Lung-RADS category, rising from 0.1% (category 1) to 34.7% (category 4B).
NLST vs. Lung-RADS
At baseline, sensitivity using Lung-RADS was 84.9%, lower than the original NLST criteria sensitivity of 93.5%, a drop of 8.6 percentage points. The Lung-RADS false-positive rate of 12.8% was less than half of the false-positive rate using NLST criteria, at 26.6%, a difference of 13.8 percentage points.
For subsequent screenings, Lung-RADS sensitivity decreased to 78.6%, compared with 93.8% at NLST, a difference of 15.2 percentage points. At the same time, the false-positive rate dropped to 5.3%, compared with 21.8% for NLST criteria.
Lung-RADS vs. NLST criteria | |||||
Lung-RADS, baseline | NLST, baseline | Lung-RADS, after baseline | NLST, after baseline | ||
Sensitivity | 84.9% | 93.5% | 78.6% | 93.8% | |
False-positive rate | 12.8% | 26.6% | 5.3% | 21.8% | |
PPV | 6.9% | 3.8% | 11% | 3.5% | |
NPV | 99.8% | 99.9% | 99.8% | 99.9% |
"The biggest change, really, was in subsequent screenings, where you have the ability to compare to baseline," Pinsky said. "By the Lung-RADS criteria, even if you had a 6-mm nodule, which would be positive at baseline, if you saw that again a year later and it hadn't grown at all, then that would be considered a negative screen on Lung-RADS."
At that point, "you'd just come back next year; you wouldn't come back for a diagnostic workup," he said. "That was a big change from NLST, but at NLST even if it hadn't grown, they would basically still call that a positive screen."
As a result of this process -- using prior results to assess postbaseline positivity -- the postbaseline positivity with Lung-RADS dropped even more in subsequent screenings, from about 22% down to 5%, Pinsky explained. And most screening will eventually be follow-up.
"Once clinical practice gets going and people do regular screening, most screens will be repeat screens, so that's where most of the action will be," he said.
Reducing that false-positive rate is not only important in terms of patient safety, but also to keep the cost of screening reasonable, since high false-positive rates are a major cost driver, the authors noted.
The other side of the coin is that sensitivity went down by more than it did at baseline -- by about 15 percentage points, Pinsky said. In this case, some nodules that didn't grow substantially and were not treated as positive by Lung-RADS turned out to be cancer anyway.
Adding it all up
"What you have [with Lung-RADS] is a large reduction in the false-positive rate and less of a reduction, but still a reduction, in sensitivity," Pinsky said.
In this study, the retrospective design was a limitation, because when Lung-RADS criteria were applied, the radiologists had already made their clinical decision based on different criteria.
"In actual clinical practice, if they see something that's 5 mm and they're using Lung-RADS and it has to be 6 mm, but they think it's suspicious, they may round up to 6," he said. "But we don't know radiologists are going to use this, and there may be a learning curve as it develops over time."
Therefore, it will be important to look prospectively at large-population studies of people getting screened to know the true effect of Lung-RADS, to see whether results differ and, if so, by how much, Pinsky said. There will need to be prospective studies; the registries as envisioned by CMS won't have enough information to analyze on their own. They will show positivity rates but not missed cancers; determining actual sensitivities will require the CMS-mandated registries to use Surveillance, Epidemiology, and End Results (SEER) or other cancer registries, according to Pinsky.
The 64K question
In assessing Lung-RADS, the crucial question to answer is whether the trade-off of lower sensitivity versus fewer false positives is worth it. Will the loss in sensitivity affect the mortality benefit of screening -- and if so, by how much? Maybe many of the missed cancers will be slow-growing, so missing them until the next round of screening will not be critical, but maybe it will be. And conclusive answers could take years, Pinsky said.
If one were to speculate on the effect, is Lung-RADS -- with its positives and negatives -- still worth it?
"A general mantra of the screening world is that you have to have a very high specificity, meaning a very low false-positive rate, just because you're treating healthy people," Pinsky said. "You really don't want to intervene in healthy people when you don't need to."
Moreover, 20% or 25% without Lung-RADS is not a low false-positive rate; it's higher than any other screening test, he said. Also, invasive procedures in the lungs could be dangerous -- much more dangerous than a breast biopsy -- meaning that it will be critical to keep false positives as low as possible.
"So I think it's worth seeing how this works in practice, and after we see some of these results, we can have Lung-RADS 2.0, and you can tweak it depending on how it's working," he said.