Risk prediction prior to oncologic esophagectomy is crucial for assisting surgeons and patients in their joint informed decision making. Recently, a new risk prediction model for 90-day mortality after esophagectomy using the International Esodata Study Group (IESG) database was proposed, allowing for the preoperative assignment of patients into different risk categories. However, given the non-linear dependencies between patient- and tumor-related risk factors contributing to cumulative surgical risk, machine learning (ML) may evolve as a novel and more integrated approach for mortality prediction. We evaluated the IESG risk model and compared its performance to ML models. Multiple classifiers were trained and validated on 552 patients from two independent centers undergoing oncologic esophagectomies. The discrimination performance of each model was assessed utilizing the area under the receiver operating characteristics curve (AUROC), the area under the precision–recall curve (AUPRC), and the Matthews correlation coefficient (MCC). The 90-day mortality rate was 5.8%. We found that IESG categorization allowed for adequate group-based risk prediction. However, ML models provided better discrimination performance, reaching superior AUROCs (0.64 [0.63–0.65] vs. 0.44 [0.32–0.56]), AUPRCs (0.25 [0.24–0.27] vs. 0.11 [0.05–0.21]), and MCCs (0.27 ([0.25–0.28] vs. 0.15 [0.03–0.27]). Conclusively, ML shows promising potential to identify patients at risk prior to surgery, surpassing conventional statistics. Still, larger datasets are needed to achieve higher discrimination performances for large-scale clinical implementation in the future.
在肿瘤性食管切除术前的风险预测对于协助外科医生和患者共同做出知情决策至关重要。近期,基于国际食管数据研究组(IESG)数据库,提出了一种新的食管切除术后90天死亡率风险预测模型,可在术前将患者划分至不同风险类别。然而,考虑到患者相关和肿瘤相关风险因素对累积手术风险存在非线性依赖关系,机器学习(ML)可能发展为一种新颖且更综合的死亡率预测方法。我们评估了IESG风险模型,并将其性能与ML模型进行比较。基于来自两个独立中心的552例接受肿瘤性食管切除术的患者数据,对多种分类器进行了训练和验证。利用受试者工作特征曲线下面积(AUROC)、精确率-召回率曲线下面积(AUPRC)和马修斯相关系数(MCC)评估各模型的区分性能。90天死亡率为5.8%。研究发现IESG分类法能够实现充分的基于群体的风险预测,但ML模型展现出更优的区分性能:AUROC(0.64 [0.63–0.65] vs. 0.44 [0.32–0.56])、AUPRC(0.25 [0.24–0.27] vs. 0.11 [0.05–0.21])和MCC(0.27 [0.25–0.28] vs. 0.15 [0.03–0.27])均显著提升。结论表明,机器学习在术前识别高危患者方面展现出超越传统统计方法的潜力,但未来要实现大规模临床应用仍需更大规模数据集以提升区分性能。