Background/Objectives: Improved survival due to advances in medical therapy has resulted in increasing numbers of cancer patients living with bone metastases; however, our understanding of the prognostic implications of bone metastases requires larger population-based studies outlining their incidence and prevalence in different primary cancer types, including those with lower incidence. This study aimed to evaluate the incidence and prevalence of bone metastases in solid organ tumors by analyzing reports of staging CT studies with natural language processing (NLP). Methods: In this retrospective study, 639,470 reports representing 129,326 unique patients were analyzed; 6279 randomly selected reports were manually annotated and labeled for the presence or absence of bone metastases. From these data, a BERT-based NLP model was developed and applied to the patient database. The cumulative incidence at 5 years and prevalence of bone metastases in each cancer type were calculated. Results: The accuracy of the NLP model on a validation set was 97.1%, with a positive predictive value (precision) of 88.0% and a sensitivity (recall) of 86.3%. The 5-year incidence rate of bone metastases was highest in prostate, breast, head and neck, and lung cancer (52%, 41%, 36%, 33%). Incidence was lowest in central nervous system cancer and testicular cancer (8%, 5%). Prevalence was highest in prostate, breast, and lung cancer (32%, 25% and 23%), and lowest in central nervous system cancer and testicular cancer (4%, 4%). Conclusions: NLP was utilized to demonstrate patterns of bone metastases in a broad range of cancer types and is a valuable tool in population-based assessment of bone metastases.
背景/目的:随着医疗技术的进步,癌症患者生存期延长,导致伴有骨转移的患者数量日益增多。然而,要深入理解骨转移的预后意义,仍需开展更大规模的基于人群的研究,以阐明不同原发癌种(包括发病率较低的癌种)中骨转移的发生率和患病率。本研究旨在通过自然语言处理(NLP)技术分析分期CT报告,评估实体器官肿瘤中骨转移的发生率和患病率。方法:在这项回顾性研究中,共分析了代表129,326名独特患者的639,470份报告;随机选取的6,279份报告进行了人工标注,以确定是否存在骨转移。基于这些数据,开发了一个基于BERT的NLP模型,并将其应用于患者数据库。计算了每种癌症类型中骨转移的5年累积发生率和患病率。结果:NLP模型在验证集上的准确率为97.1%,阳性预测值(精确率)为88.0%,灵敏度(召回率)为86.3%。骨转移的5年发生率在前列腺癌、乳腺癌、头颈癌和肺癌中最高(分别为52%、41%、36%、33%),在中枢神经系统癌和睾丸癌中最低(分别为8%、5%)。患病率在前列腺癌、乳腺癌和肺癌中最高(分别为32%、25%、23%),在中枢神经系统癌和睾丸癌中最低(均为4%)。结论:NLP技术可用于揭示多种癌症类型中骨转移的模式,是基于人群评估骨转移的重要工具。