Background/Objectives:Gastric cancer is a leading cause of cancer-related mortality, particularly in East Asia, with a notable burden in Republic of Korea. This study aimed to construct and develop machine learning models for the prediction of gastric cancer mortality and the identification of risk factors.Methods:All data were acquired from the Korean Clinical Data Utilization for Research Excellence by multiple medical centers in South Korea. A total of 23,717 gastric cancer patients were divided into two groups by cause of mortality (all-cause of 2664 and disease-specific of 1620) and investigated. We used comprehensive data integrating clinical, pathological, lifestyle, and socio-economic factors. Cox proportional hazards analysis was conducted to estimate hazard ratios for mortality. Five machine learning models (random forest, gradient boosting machine, XGBoost, light GBM, and cat boosting) were developed to predict mortality. The models were interpreted by SHAP, one of the explainable AI techniques.Results:For all-cause mortality, the gradient-boosting machine learning model demonstrated the highest performance with an AUC-ROC of 0.795. For disease-specific mortality, the light GBM model outperformed others, achieving an AUC-ROC of 0.867. Significant predictors included the AJCC7 stage, tumor size, lymph node count, and lifestyle factors such as smoking, drinking, and diabetes.Conclusions:This study underscores the importance of integrating both clinical and lifestyle data to enhance mortality prediction accuracy in gastric cancer patients. The findings highlight the need for personalized treatment approaches in the Korean population and emphasize the role of demographic-specific data in predictive modeling.
背景/目的:胃癌是癌症相关死亡的主要原因,在东亚地区尤为突出,韩国承受着显著的疾病负担。本研究旨在构建并开发机器学习模型,用于预测胃癌死亡率并识别相关风险因素。 方法:所有数据均来自韩国多家医疗中心参与的韩国临床数据卓越研究项目。研究共纳入23,717例胃癌患者,根据死亡原因分为全因死亡组(2,664例)和疾病特异性死亡组(1,620例)进行分析。我们整合了临床、病理、生活方式及社会经济因素的综合数据,采用Cox比例风险模型评估死亡风险比。开发了五种机器学习模型(随机森林、梯度提升机、XGBoost、LightGBM和CatBoost)进行死亡率预测,并运用可解释人工智能技术SHAP对模型进行解析。 结果:在全因死亡率预测中,梯度提升机模型表现最优,其AUC-ROC值为0.795。在疾病特异性死亡率预测方面,LightGBM模型以0.867的AUC-ROC值优于其他模型。重要预测因子包括AJCC7分期、肿瘤大小、淋巴结数量,以及吸烟、饮酒、糖尿病等生活方式因素。 结论:本研究强调整合临床与生活方式数据对提高胃癌患者死亡率预测准确性的重要意义。研究结果揭示了韩国人群个性化治疗策略的必要性,并凸显了特定人群数据在预测建模中的关键作用。