Background/Objectives: The accurate delineation of primary tumors (GTVp) and metastatic lymph nodes (GTVn) in head and neck (HN) cancers is essential for effective radiation treatment planning, yet remains a challenging and laborious task. This study aims to develop a deep-learning-based auto-segmentation (DLAS) model trained on external datasets with false-positive elimination using clinical diagnosis reports.Methods: The DLAS model was trained on a multi-institutional public dataset with 882 cases. Forty-four institutional cases were randomly selected as the external testing dataset. DLAS-generated GTVp and GTVn were validated against clinical diagnosis reports to identify false-positive and false-negative segmentation errors using two large language models: ChatGPT-4 and Llama-3. False-positive ruling out was conducted by matching the centroids of AI-generated contours with the slice locations or anatomical regions described in the reports. Performance was evaluated using the Dice similarity coefficient (DSC), 95th percentile Hausdorff distance (HD95), and tumor detection precision.Results: ChatGPT-4 outperformed Llama-3 in accurately extracting tumor locations from the diagnostic reports. False-positive contours were identified in 15 out of 44 cases. The DSCmeanof the DLAS contours for GTVp and GTVn increased from 0.68 to 0.75 and from 0.69 to 0.75, respectively, after the ruling-out process. Notably, the average HD95 value for GTVn decreased from 18.81 mm to 5.2 mm. Post ruling out, the model achieved 100% precision for GTVp and GTVn when compared with the results of physician-determined contours.Conclusions: The false-positive ruling-out approach based on diagnostic reports effectively enhances the precision of DLAS in the HN region. The model accurately identifies the tumor location and detects all false-negative errors.
背景/目的:头颈部肿瘤原发灶(GTVp)和转移淋巴结(GTVn)的准确勾画是有效放射治疗计划制定的关键,但这仍是一项具有挑战性且费时费力的任务。本研究旨在开发一种基于深度学习的自动分割模型,该模型利用外部数据集进行训练,并通过临床诊断报告进行假阳性排除。 方法:深度学习自动分割模型在一个包含882例病例的多机构公共数据集上进行训练。随机选取44例机构内病例作为外部测试数据集。利用两个大型语言模型(ChatGPT-4和Llama-3),将模型生成的GTVp和GTVn与临床诊断报告进行比对验证,以识别假阳性和假阴性分割错误。假阳性排除通过将AI生成轮廓的质心与报告中描述的层面位置或解剖区域进行匹配来实现。使用Dice相似系数、95%豪斯多夫距离和肿瘤检测精确度来评估模型性能。 结果:在从诊断报告中准确提取肿瘤位置方面,ChatGPT-4的表现优于Llama-3。在44例病例中,有15例识别出假阳性轮廓。经过排除处理后,DLAS模型生成的GTVp和GTVn轮廓的平均DSC分别从0.68提升至0.75和从0.69提升至0.75。值得注意的是,GTVn的平均HD95值从18.81毫米降至5.2毫米。排除处理后,与医生勾画结果相比,模型对GTVp和GTVn的检测精确度均达到100%。 结论:基于诊断报告的假阳性排除方法有效提升了深度学习自动分割模型在头颈区域的精确度。该模型能够准确定位肿瘤位置,并检测出所有假阴性错误。