Background: Gastric cancer (GC) remains a major global health challenge, with rising incidence among patients post-Helicobacter pylori(H. pylori) eradication, particularly those with persistent intestinal metaplasia (IM). Current risk stratification tools are limited in this high-risk population.Aim: To develop, validate, and externally test a machine learning-based prediction model—termed the Early Gastric Cancer Model (EGCM)—for identifying early gastric cancer (EGC) risk in H. pylori-eradicated patients with IM, and to implement it as a web-based clinical tool.Methods: This retrospective, dual-center study enrolled 214 H. pylori-eradicated patients with histologically confirmed IM from 900 Hospital and Fujian Provincial People’s Hospital. The dataset was split into a training cohort (70%) and an internal validation cohort (30%), with an external test cohort from the second center. A total of 21 machine learning algorithms were screened using cross-validation and hyperparameter optimization. Boruta and SHAP analyses were employed for feature selection, and the final EGCM was constructed using the top five predictors: atrophy range, xanthoma, map-like redness (MLR), MLR range, and age. Model performance was evaluated via ROC curves, precision–recall curves, calibration plots, and decision curve analysis (DCA), and compared against conventional inflammatory biomarkers such as NLR and PLR.Results: The CatBoost algorithm demonstrated the best overall performance, achieving an AUC of 0.743 (95% CI: 0.70–0.80) in internal validation and 0.905 in the external test set. The EGCM exhibited superior discrimination compared to individual inflammatory markers (p< 0.01). Calibration analysis confirmed strong agreement between predicted and observed outcomes. DCA showed the EGCM yielded greater net clinical benefit. A web calculator was developed to facilitate clinical application.Conclusions: The EGCM is a validated, interpretable, and practical tool for stratifying EGC risk in H. pylori-eradicated IM patients across multiple centers. Its integration into clinical practice could improve surveillance precision and early cancer detection.
背景:胃癌仍是全球重大健康挑战,幽门螺杆菌根除后患者发病率呈上升趋势,尤其见于持续存在肠上皮化生的人群。现有风险分层工具在这一高危人群中应用有限。 目的:开发、验证并外部测试一种基于机器学习的预测模型(命名为早期胃癌模型),用于识别幽门螺杆菌根除后肠上皮化生患者的早期胃癌风险,并将其转化为网络临床工具。 方法:这项回顾性双中心研究纳入来自900医院和福建省立医院的214例经组织学确诊肠上皮化生的幽门螺杆菌根除患者。数据集按7:3比例划分为训练队列与内部验证队列,并采用第二中心数据进行外部测试。通过交叉验证和超参数优化筛选21种机器学习算法,运用Boruta和SHAP分析进行特征选择,最终基于五个核心预测因子(萎缩范围、黄色瘤、地图样发红、地图样发红范围及年龄)构建EGCM模型。通过受试者工作特征曲线、精确率-召回率曲线、校准曲线和决策曲线分析评估模型性能,并与中性粒细胞-淋巴细胞比值、血小板-淋巴细胞比值等传统炎症标志物进行对比。 结果:CatBoost算法展现出最佳综合性能,内部验证曲线下面积为0.743(95%置信区间:0.70-0.80),外部测试集达0.905。EGCM的鉴别能力显著优于单一炎症标志物(p<0.01)。校准分析证实预测结果与观测结果高度一致,决策曲线分析显示EGCM具有更优的临床净获益。研究同步开发了网络计算器以促进临床应用。 结论:EGCM是经过多中心验证、可解释性强的实用工具,能有效分层幽门螺杆菌根除后肠上皮化生患者的早期胃癌风险。该模型融入临床实践可提升监测精度与早期癌症检出率。