Background/Objectives: This study aimed to investigate the accuracy of Tumor, Node, Metastasis (TNM) classification based on radiology reports using GPT3.5-turbo (GPT3.5) and the utility of multilingual large language models (LLMs) in both Japanese and English. Methods: Utilizing GPT3.5, we developed a system to automatically generate TNM classifications from chest computed tomography reports for lung cancer and evaluate its performance. We statistically analyzed the impact of providing full or partial TNM definitions in both languages using a generalized linear mixed model. Results: The highest accuracy was attained with full TNM definitions and radiology reports in English (M = 94%, N = 80%, T = 47%, and TNM combined = 36%). Providing definitions for each of the T, N, and M factors statistically improved their respective accuracies (T: odds ratio [OR] = 2.35,p< 0.001; N: OR = 1.94,p< 0.01; M: OR = 2.50,p< 0.001). Japanese reports exhibited decreased N and M accuracies (N accuracy: OR = 0.74 and M accuracy: OR = 0.21). Conclusions: This study underscores the potential of multilingual LLMs for automatic TNM classification in radiology reports. Even without additional model training, performance improvements were evident with the provided TNM definitions, indicating LLMs’ relevance in radiology contexts.
背景/目的:本研究旨在探讨基于GPT3.5-turbo(GPT3.5)模型对影像学报告进行肿瘤-淋巴结-转移(TNM)分期的准确性,并评估多语言大语言模型在日语和英语环境中的应用价值。方法:利用GPT3.5开发了从肺癌胸部计算机断层扫描报告中自动生成TNM分期的系统,并评估其性能。通过广义线性混合模型统计分析提供完整或部分TNM定义(日英双语)对结果的影响。结果:在提供完整TNM定义及英文影像报告时获得最高准确率(M=94%、N=80%、T=47%、TNM综合=36%)。为T、N、M各要素提供定义均能显著提升其对应准确率(T:比值比[OR]=2.35,p<0.001;N:OR=1.94,p<0.01;M:OR=2.50,p<0.001)。日语报告的N与M准确率有所下降(N准确率:OR=0.74;M准确率:OR=0.21)。结论:本研究证实了多语言大语言模型在影像学报告中自动进行TNM分期的应用潜力。即使无需额外模型训练,通过提供TNM定义即可显著提升性能,表明大语言模型在放射学领域具有重要应用价值。