Background:Cancer is one of the main global health threats. Early personalized prediction of cancer incidence is crucial for the population at risk. This study introduces a novel cancer prediction model based on modern recurrent survival deep learning algorithms.Methods:The study includes 160,407 participants from the blood-based cohort of the Korea Cancer Prevention Research-II Biobank, which has been ongoing since 2004. Data linkages were designed to ensure anonymity, and data collection was carried out through nationwide medical examinations. Predictive performance on ten cancer sites, evaluated using the concordance index (c-index), was compared among nDeep and its multitask variation, Cox proportional hazard (PH) regression, DeepSurv, and DeepHit.Results:Our models consistently achieved a c-index of over 0.8 for all ten cancers, with a peak of 0.8922 for lung cancer. They outperformed Cox PH regression and other survival deep neural networks.Conclusion:This study presents a survival deep learning model that demonstrates the highest predictive performance on censored health dataset, to the best of our knowledge. In the future, we plan to investigate the causal relationship between explanatory variables and cancer to reduce cancer incidence and mortality.
背景:癌症是全球主要的健康威胁之一。对高危人群进行早期个性化癌症发病率预测至关重要。本研究基于现代循环生存深度学习算法,提出了一种新型癌症预测模型。 方法:本研究纳入了韩国癌症预防研究-II生物样本库血液队列中自2004年以来持续追踪的160,407名参与者。数据链接设计确保匿名性,数据通过全国性体检收集。采用一致性指数评估十种癌症部位的预测性能,比较了nDeep及其多任务变体、Cox比例风险回归、DeepSurv和DeepHit等模型的预测效果。 结果:我们的模型在十种癌症预测中均获得超过0.8的一致性指数,其中肺癌预测达到峰值0.8922。模型性能优于Cox比例风险回归及其他生存深度神经网络。 结论:本研究提出的生存深度学习模型在截尾健康数据集上展现出目前最优的预测性能。未来我们将探究解释变量与癌症之间的因果关系,以降低癌症发病率和死亡率。
A Study on Survival Analysis Methods Using Neural Network to Prevent Cancers