Background: Unlike most cancers, breast cancer poses a persistent risk of distant recurrence—often years after initial treatment—making long-term risk stratification uniquely challenging. Current tools fall short in predicting late metastatic events, particularly for early-stage patients.Methods: We present an interpretable machine learning (ML) pipeline to predict distant recurrence-free survival at 5, 10, and 15 years, integrating Bayesian network-based causal feature selection, deep feed-forward neural network models (DNMs), and SHAP-based interpretation. Using electronic health record (EHR)-based clinical data from over 6000 patients, we first applied the Markov blanket and interactive risk factor learner (MBIL) to identify minimally sufficient predictor subsets. These were then used to train optimized DNM classifiers, with hyperparameters tuned via grid search and benchmarked against models from 10 traditional ML methods and models trained using all predictors.Results: Our best models achieved area under the curve (AUC) scores of 0.79, 0.83, and 0.89 for 5-, 10-, and 15-year predictions, respectively—substantially outperforming baselines. MBIL reduced input dimensionality by over 80% without sacrificing accuracy. Importantly, MBIL-selected features (e.g., nodal status, hormone receptor expression, tumor size) overlapped strongly with top SHAP contributors, reinforcing interpretability. Calibration plots further demonstrated close agreement between predicted probabilities and observed recurrence rates. The percentage performance improvement due to grid search ranged from 25.3% to 60%.Conclusions: This study demonstrates that combining causal selection, deep learning, and grid search improves prediction accuracy, transparency, and calibration for long-horizon breast cancer recurrence risk. The proposed framework is well-positioned for clinical use, especially to guide long-term follow-up and therapy decisions in early-stage patients.

摘要翻译：

背景：与大多数癌症不同，乳腺癌存在持续的远期复发风险——通常在初始治疗多年后发生——这使得长期风险分层面临独特的挑战。现有工具在预测晚期转移事件方面存在不足，尤其对于早期患者。方法：我们提出了一种可解释的机器学习流程，用于预测5年、10年和15年的无远处转移生存率。该流程整合了基于贝叶斯网络的因果特征选择、深度前馈神经网络模型以及基于SHAP的解释方法。利用来自6000余名患者的电子健康记录临床数据，我们首先应用马尔可夫毯与交互风险因子学习器识别最小充分预测子集，随后使用这些特征训练优化的深度神经网络分类器，并通过网格搜索调整超参数，同时与10种传统机器学习方法及使用全部预测变量的模型进行性能对比。结果：我们最优模型在5年、10年和15年预测中的曲线下面积分别达到0.79、0.83和0.89，显著优于基线模型。马尔可夫毯与交互风险因子学习器在保持精度的同时将输入维度降低了80%以上。值得注意的是，该方法筛选的特征（如淋巴结状态、激素受体表达、肿瘤大小）与SHAP分析得出的主要贡献因子高度重合，增强了模型可解释性。校准曲线进一步显示预测概率与观察到的复发率高度吻合。网格搜索带来的性能提升幅度在25.3%至60%之间。结论：本研究证明，融合因果选择、深度学习和网格搜索的策略能有效提升乳腺癌长期复发风险预测的准确性、透明度和校准度。该框架具备良好的临床转化潜力，尤其适用于指导早期患者的长期随访和治疗决策。

原文链接：

Leveraging Deep Learning, Grid Search, and Bayesian Networks to Predict Distant Recurrence of Breast Cancer

……

文章目录

文章：

利用深度学习、网格搜索与贝叶斯网络预测乳腺癌远处复发风险

Leveraging Deep Learning, Grid Search, and Bayesian Networks to Predict Distant Recurrence of Breast Cancer

原文发布日期：30 July 2025

DOI: 10.3390/cancers17152515

类型: Article

开放获取: 是

英文摘要：

摘要翻译：

原文链接：

相关文章

关于我们

官方邮箱

商务合作