AM_GEHC_WOM-Logo

ML models find key predictors, racial disparities in breast cancer

Machine learning (ML) models can highlight racial disparities and key predictors of breast cancer, according to research published March 28 in Annals of Epidemiology

An analysis led by Raji Sundararajan, PhD, from Purdue University in West Lafayette, IN, and colleagues found that while white women experience higher breast cancer incidence in older age groups, Black women have the highest rates in younger age groups. Trends also differed by breast cancer type.

“These results agree with the current known risk factors triggering breast cancer,” Sundararajan and coauthors wrote. 

Published data shows that while non-Hispanic white women have the highest breast cancer incidence rate, Black women and American Indian/Alaska Native women have the highest mortality rates.  

Prior research exploring these trends found that Black and Hispanic women tend to have poorer breast cancer outcomes compared to white women, owing to later diagnosis. Other studies, including those using AI and ML, suggest that several factors add to these disparities, including access to care, socioeconomic status, and environmental factors among others. 

The Sundararajan team studied key factors that influence breast cancer risk by using ML and explainable AI, identifying racial differences. They used mammographic data from the Breast Cancer Surveillance Consortium and applied the following ML models to show key disease predictors: Naïve Bayes, logistic regression, and extreme gradient boosting. The team also used variable importance and SHapley Additive exPlanations (SHAP) values to interpret the models and find the most predictive factors. 

Across all models and races included in the study, history of biopsy (50%) and age group (25.9%) were the strongest predictors.

White women had the highest overall incidences (nine per 100,000 women), with the highest being for those ages 65 and older (18.1 per 100,000). Black women had higher incidence rates in younger age groups (7.1 per 100,000 for ages 18 to 29). 

The team also reported the following data:  

  • In middling age groups, non-Hispanic white women ages 60 to 64 years had the highest incidence rate (7.7 per 100,000). 

  • Among older age groups, non-Hispanic white women aged 80 to 84 years had the highest incidence rate (21.3 per 100,000) as well as women ages 85 and older (29.1 per 100,000). 

  • Native American women had the highest incidence rate (17.2 per 100,000) in the 75 to 79 age range. 

  • Triple-negative breast cancer accounts for 10% to 15% of all breast cancers and is more common in Black/African American women (15% to 30%). 

  • Asian/Pacific Islander women have a higher prevalence of HER2-positive breast cancers (30%). 

  • Non-Hispanic white women have higher rates of diagnosis for ER+ and PR+ breast cancers. 

Finally, the study authors reported menopausal status, breast density, and age at first childbirth as strong disease predictors from the ML models. 

The authors called for future studies to refine these models by incorporating more socioeconomic and lifestyle variables and by improving early detection efforts. This especially goes for underserved populations to reduce disparities in breast cancer outcomes.  

“Future work also could focus on balancing the dataset through techniques such as oversampling underserved groups,” they added. “Additionally, exploring more advanced models or incorporating additional features that better capture the diversity of these populations may improve predictive accuracy.” 

Read the full study results here.

Page 1 of 703
Next Page