AI models for classifying lung nodule malignancy on chest CT imaging show high sensitivity but only moderate specificity when tested against external datasets, according to a study published June 3 in Radiology: Artificial Intelligence.
The finding suggests these types of models could offer “a potential role as adjunctive tools for ruling out malignancy” rather than classifying it, wrote a team led by Oke Dimas Asmara, MD, of Frisius Medical Center in Leeuwarden, the Netherlands.
“Current AI models may support rule-out of malignancy in lung nodules; however, moderate specificity limits their use for definitive classification of malignant nodules,” the group noted.
Lung cancer is the leading cause of cancer death and is the most commonly diagnosed cancer around the world, the group noted. Recent studies have reported that AI-based malignancy classification on CT imaging shows promise, but how to integrate this technology into clinical practice isn’t clear, they explained, writing that “an important gap remains in understanding how AI performance generalizes across different clinical environments” -- that is, asymptomatic screening populations versus symptomatic patients.
The researchers searched PubMed, Embase, Web of Science, CINAHL, and the Cochrane Library in January 2025 to identify studies that evaluated AI models for malignancy classification of lung nodules on chest CT using pathology and/or at least two-year follow-up as reference standards. The search produced 21 studies that included 7,454 lung nodules; lung cancer prevalence ranged from 5.7% to 91.5%.
All of the AI models used in these studies were based on deep-learning. Of the studies included in the research, 17 (81%) involved Asian populations, 15 (71%) used non-screening populations, 14 (67%) reported on 2D or 3D convolutional neural network (CNN) architectures, and eight (38%) focused on predefined malignancy thresholds.
The group reported the following overall results:
- Pooled sensitivity was 88%.
- Pooled specificity was 75%.
- Positive likelihood ratio was 3.55, while negative likelihood ratio was 0.16.
- Area under the receiver operating characteristic curve (AUROC) was 0.89.
- The diagnostic odds ratio was 22.4.
- Variation across AI algorithms was high, measuring I2 greater than 90%.
- Higher specificity in the study models’ architecture was associated with those studies that used 2D or 3D CNNs compared with those without reported architecture (83% versus 58%, p = 0.03).
Asmara and colleagues urged further research that includes more use standardized thresholds, diverse populations, and detailed architecture reporting to measure real-world clinical impact of the use of AI to classify lung nodules.
Access the full study here.


















