Background: Colorectal cancer is one of the most prevalent forms of cancer and is associated with a high mortality rate. Additionally, an increasing number of adults under 50 are being diagnosed with the disease. This underscores the importance of leveraging modern technologies, such as artificial intelligence, for early diagnosis and treatment support. Methods: Eight classifiers were utilized in this research: Random Forest, XGBoost, CatBoost, LightGBM, Gradient Boosting, Extra Trees, the k-nearest neighbor algorithm (KNN), and decision trees. These algorithms were optimized using the frameworks Optuna, RayTune, and HyperOpt. This study was conducted on a public dataset from Brazil, containing information on tens of thousands of patients. Results: The models developed in this study demonstrated high classification accuracy in predicting one-, three-, and five-year survival, as well as overall mortality and cancer-specific mortality. The CatBoost, LightGBM, Gradient Boosting, and Random Forest classifiers delivered the best performance, achieving an accuracy of approximately 80% across all the evaluated tasks. Conclusions: This research enabled the development of effective classification models that can be applied in clinical practice.
背景:结直肠癌是最常见的癌症类型之一,死亡率较高。此外,越来越多的50岁以下成年人被诊断出患有该疾病。这凸显了利用人工智能等现代技术进行早期诊断和治疗支持的重要性。方法:本研究采用了八种分类器:随机森林、XGBoost、CatBoost、LightGBM、梯度提升、极端随机树、k近邻算法(KNN)和决策树。这些算法通过Optuna、RayTune和HyperOpt框架进行了优化。研究基于巴西的公共数据集开展,该数据集包含数万名患者的信息。结果:本研究开发的模型在预测一年、三年和五年生存率,以及总体死亡率和癌症特异性死亡率方面表现出较高的分类准确性。CatBoost、LightGBM、梯度提升和随机森林分类器在所有评估任务中均达到约80%的准确率,表现最佳。结论:本研究成功构建了可应用于临床实践的有效分类模型。