Background and objectives: Deep learning (DL)-based models for predicting the survival of patients with local stages of breast cancer only use time-fixed covariates, i.e., patient and cancer data at the time of diagnosis. These predictions are inherently error-prone because they do not consider time-varying events that occur after initial diagnosis. Our objective is to improve the predictive modeling of survival of patients with localized breast cancer to consider both time-fixed and time-varying events; thus, we take into account the progression of a patient’s health status over time. Methods: We extended four DL-based predictive survival models (DeepSurv, DeepHit, Nnet-survival, and Cox-Time) that deal with right-censored time-to-event data to consider not only a patient’s time-fixed covariates (patient and cancer data at diagnosis) but also a patient’s time-varying covariates (e.g., treatments, comorbidities, progressive age, frailty index, adverse events from treatment). We utilized, as our study data, the SEER-Medicare linked dataset from 1991 to 2016 to study a population of women diagnosed with stage I–III breast cancer (BC) enrolled in Medicare at 65 years or older as qualified by age. We delineated time-fixed variables recorded at the time of diagnosis, including age, race, marital status, breast cancer stage, tumor grade, laterality, estrogen receptor (ER), progesterone receptor (PR), and human epidermal receptor 2 (HER2) status, and comorbidity index. We analyzed six distinct prognostic categories, cancer stages I–III BC, and each stage’s ER/PR+ or ER/PR− status. At each visit, we delineated the time-varying covariates of administered treatments, induced adverse events, comorbidity index, and age. We predicted the survival of three hypothetical patients to demonstrate the model’s utility. Main Outcomes and Measures: The primary outcomes of the modeling were the measures of the model’s prediction error, as measured by the concordance index, the most commonly applied evaluation metric in survival analysis, and the integrated Brier score, a metric of the model’s discrimination and calibration. Results: The proposed extended patients’ covariates that include both time-fixed and time-varying covariates significantly improved the deep learning models’ prediction error and the discrimination and calibration of a model’s estimates. The prediction of the four DL models using time-fixed covariates in six different prognostic categories all resulted in approximately a 30% error in all six categories. When applying the proposed extension to include time-varying covariates, the accuracy of all four predictive models improved significantly, with the error decreasing to approximately 10%. The models’ predictive accuracy was independent of the differing published survival predictions from time-fixed covariates in the six prognostic categories. We demonstrate the utility of the model in three hypothetical patients with unique patient, cancer, and treatment variables. The model predicted survival based on the patient’s individual time-fixed and time-varying features, which varied considerably from Social Security age-based, and stage and race-based breast cancer survival predictions. Conclusions: The predictive modeling of the survival of patients with early-stage breast cancer using DL models has a prediction error of around 30% when considering only time-fixed covariates at the time of diagnosis and decreases to values under 10% when time-varying covariates are added as input to the models, regardless of the prognostic category of the patient groups. These models can be used to predict individual patients’ survival probabilities based on their unique repertoire of time-fixed and time-varying features. They will provide guidance for patients and their caregivers to assist in decision making.
背景与目的:基于深度学习(DL)的局部乳腺癌患者生存预测模型仅使用时间固定协变量,即诊断时的患者与癌症数据。这些预测本质上容易出错,因为它们未考虑初始诊断后发生的时变事件。我们的目标是改进局部乳腺癌患者的生存预测建模,以同时纳入时间固定和时变事件;因此,我们考虑了患者健康状况随时间推移的演变过程。 方法:我们扩展了四种处理右删失时间事件数据的DL生存预测模型(DeepSurv、DeepHit、Nnet-survival和Cox-Time),使其不仅考虑患者的时间固定协变量(诊断时的患者与癌症数据),还纳入时变协变量(如治疗方案、合并症、年龄增长、衰弱指数、治疗相关不良事件)。研究数据采用1991年至2016年的SEER-Medicare关联数据集,研究对象为经年龄资格认证、65岁及以上加入Medicare的I-III期乳腺癌女性患者。我们界定了诊断时记录的时间固定变量,包括年龄、种族、婚姻状况、乳腺癌分期、肿瘤分级、侧向性、雌激素受体(ER)、孕激素受体(PR)和人表皮生长因子受体2(HER2)状态以及合并症指数。我们分析了六个独立预后类别:I-III期乳腺癌及各分期的ER/PR阳性或阴性状态。每次随访时,我们记录了治疗方案、诱发不良事件、合并症指数和年龄等时变协变量。通过预测三位假设患者的生存情况来验证模型效用。 主要结局指标:模型评估的主要结局指标包括预测误差度量——采用生存分析最常用的评估指标一致性指数,以及衡量模型区分度与校准度的综合Brier评分。 结果:包含时间固定和时变协变量的扩展患者协变量显著改善了深度学习模型的预测误差,并提升了模型估计的区分度与校准度。在六个不同预后类别中,仅使用时间固定协变量的四种DL模型预测误差均约为30%;而纳入时变协变量后,所有四种预测模型的准确率显著提升,误差降至约10%。模型的预测准确度独立于六个预后类别中基于时间固定协变量的已发表生存预测差异。我们通过三位具有独特患者特征、癌症类型和治疗变量的假设病例验证了模型实用性。该模型基于患者个体化的时间固定和时变特征进行生存预测,其结果与社会保障体系基于年龄、以及基于分期和种族的乳腺癌生存预测存在显著差异。 结论:早期乳腺癌患者生存的DL预测模型在仅考虑诊断时时间固定协变量的情况下预测误差约为30%,而当时变协变量作为输入加入模型后,误差值降至10%以下,且不受患者群体预后类别的影响。这些模型可根据患者独特的时间固定和时变特征谱系预测个体生存概率,为患者及其照护者的决策制定提供指导依据。