Background: Since postoperative complications after gastrectomy for gastric cancer are associated with poor clinical outcomes, it is important to predict and prepare for the occurrence of complications preoperatively. Conventional models for predicting complications have limitations, prompting interest in machine learning algorithms. Machine learning models have a superior ability to identify complex interactions among variables and nonlinear relationships, potentially revealing new risk factors. This study aimed to explore previously overlooked risk factors for postoperative complications and compare machine learning models with linear regression. Materials and Methods: We retrospectively reviewed data from 865 patients who underwent gastrectomy for gastric cancer from 2018 to 2022. A total of 85 variables, including demographics, clinical features, laboratory values, intraoperative parameters, and pathologic results, were used to conduct the machine learning model. The dataset was partitioned into 80% for training and 20% for validation. To identify the most accurate prediction model, missing data handling, variable selection, and hyperparameter tuning were performed. Results: Machine learning models performed notably well when using the backward elimination method and a moderate missing data strategy, achieving the highest area under the curve values (0.744). A total of 15 variables associated with postoperative complications were identified using a machine learning algorithm. Operation time was the most impactful variable, followed closely by pre-operative levels of albumin and mean corpuscular hemoglobin. Machine learning models, especially Random Forest and XGBoost, outperformed linear regression. Conclusions: Machine learning, coupled with advanced variable selection techniques, showed promise in enhancing risk prediction of postoperative complications for gastric cancer surgery.
背景:胃癌胃切除术后并发症与不良临床结局相关,因此术前预测并做好并发症应对准备至关重要。传统并发症预测模型存在局限性,这促使学界对机器学习算法产生兴趣。机器学习模型在识别变量间复杂交互作用及非线性关系方面具有显著优势,可能揭示新的风险因素。本研究旨在探索既往被忽视的术后并发症风险因素,并比较机器学习模型与线性回归的预测效能。 材料与方法:我们回顾性分析了2018年至2022年间865例胃癌胃切除术患者的临床数据。研究采用包含人口统计学特征、临床指标、实验室检测值、术中参数及病理结果在内的85个变量构建机器学习模型。数据集按8:2比例划分为训练集与验证集。通过缺失数据处理、变量筛选及超参数优化,确定最优预测模型。 结果:采用后向消除法与适度缺失数据处理策略时,机器学习模型表现优异,曲线下面积最高达0.744。通过机器学习算法共识别出15个与术后并发症相关的变量,其中手术时间影响最为显著,术前白蛋白水平与平均红细胞血红蛋白浓度紧随其后。随机森林与XGBoost等机器学习模型整体表现优于线性回归。 结论:机器学习结合先进变量筛选技术,有望提升胃癌术后并发症的风险预测能力。