Background/ObjectivesRecent advancements in large language models, such as ChatGPT-4o, have created new opportunities for analyzing complex multi-modal data, including medical images. This study aims to assess the potential of ChatGPT-4o in distinguishing between benign and malignant thyroid nodules via multi-modality ultrasound imaging: grayscale ultrasound, color Doppler ultrasound (CDUS), and shear wave elastography (SWE).Materials and MethodsPatients who underwent thyroid nodule ultrasound examinations and had confirmed pathological diagnoses were included. ChatGPT-4o analyzed the multi-modality ultrasound data using two approaches: (1.) a dual-modality strategy which employed grayscale ultrasound and CDUS, and (2.) a triple-modality strategy which incorporated grayscale ultrasound, CDUS, and SWE. The diagnostic performance was compared against pathological findings utilizing receiver operating characteristic (ROC) curve analysis, while consistency was evaluated throughCohen’s Kappaanalysis.ResultsA total of 106 thyroid nodules were evaluated; 65.1% were benign and 34.9% malignant. In the dual-modality approach, ChatGPT-4o achieved an area under the ROC curve (AUC) of 66.3%, moderate agreement with pathology results (Kappa= 0.298), a sensitivity of 70.3%, a specificity of 62.3%, and an accuracy of 65.1%. Conversely, the triple-modality approach exhibited higher specificity at 97.1% but lower sensitivity at 18.9%, with an accuracy of 69.8% and a reduced overall agreement (Kappa= 0.194), resulting in an AUC of 58.0%.ConclusionsChatGPT-4o exhibits potential, to some extent, in classifying thyroid nodules using multi-modality ultrasound imaging. However, the dual-modality approach unexpectedly outperforms the triple-modality approach. This indicates that ChatGPT-4o might encounter challenges in integrating and prioritizing different data modalities, particularly when conflicting information is present, which could impact diagnostic effectiveness.
**背景/目的** 近年来,以ChatGPT-4o为代表的大语言模型的发展为分析包括医学影像在内的复杂多模态数据提供了新的机遇。本研究旨在评估ChatGPT-4o通过多模态超声成像(包括灰阶超声、彩色多普勒超声及剪切波弹性成像)区分甲状腺结节良恶性的潜力。 **材料与方法** 研究纳入了接受甲状腺结节超声检查并经病理确诊的患者。ChatGPT-4o通过两种策略分析多模态超声数据:(1)双模态策略,结合灰阶超声与彩色多普勒超声;(2)三模态策略,整合灰阶超声、彩色多普勒超声及剪切波弹性成像。通过受试者工作特征曲线分析比较其诊断性能,并采用Cohen's Kappa分析评估其与病理结果的一致性。 **结果** 共评估106个甲状腺结节,其中65.1%为良性,34.9%为恶性。在双模态策略中,ChatGPT-4o的ROC曲线下面积为66.3%,与病理结果的一致性中等(Kappa=0.298),敏感性为70.3%,特异性为62.3%,准确率为65.1%。而在三模态策略中,特异性提高至97.1%,但敏感性降低至18.9%,准确率为69.8%,总体一致性下降(Kappa=0.194),ROC曲线下面积为58.0%。 **结论** ChatGPT-4o在多模态超声成像辅助甲状腺结节分类方面展现出一定的潜力。然而,双模态策略的表现意外优于三模态策略,这表明ChatGPT-4o在整合与权衡不同模态数据时可能面临挑战,尤其在信息存在冲突时,这可能影响其诊断效能。