Background/Objectives: Thyroid nodules are a very common finding, mostly benign but sometimes malignant, and thus require accurate diagnosis. Ultrasound and fine needle biopsy are the most widely used and reliable diagnostic methods to date, but they are sometimes limited in addressing benign from malignant nodules, mainly with regard to ultrasound, by the operator’s experience. Radiomics, quantitative feature extraction from medical images and machine learning offer promising avenues to improve diagnosis. The aim of this work was to develop a machine learning model based on thyroid ultrasound images to classify nodules into benign and malignant classes.Methods: For this purpose, images of ultrasonography from 142 subjects were collected. Among these subjects, 40 patients (28.2%) belonged to the class “malignant” and 102 patients (71.8%) belonged to the class “benign”, according to histological diagnosis from fine-needle aspiration. This image set was used for the training, cross-validation and internal testing of three different machine learning models. A robust radiomic approach was applied, under the hypothesis that the radiomic feature could capture the disease heterogeneity among the two groups. Three models consisting of four ensembles of machine learning classifiers (random forests, support vector machines and k-nearest neighbor classifiers) were developed for the binary classification task of interest. The best performing model was then externally tested on a cohort of 21 new patients.Results: The best model (ensemble of random forest) showed Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) (%) of 85 (majority vote), 83.7 ** (mean) [80.2–87.2], accuracy (%) of 83, 81.2 ** [77.1–85.2], sensitivity (%) of 70, 67.5 ** [64.3–70.7], specificity (%) of 88, 86.5 ** [82–91], positive predictive value (PPV) (%) of 70, 66.5 ** [57.9–75.1] and negative predictive value (NPV) (%) of 88, 87.1 ** [85.5–88.8] (*p< 0.05, **p< 0.005) in the internal test cohort. It achieved an accuracy of 90.5%, a sensitivity of 100%, a specificity of 86.7%, a PPV of 75% and an NPV of 100% in the external testing cohort.Conclusions: The model constituted of four ensembles of random forest classifiers could identify all the malignant nodes and the consistent majority of benign in the external testing cohort.
背景/目的:甲状腺结节是一种非常常见的发现,多数为良性,但有时为恶性,因此需要准确诊断。超声和细针穿刺活检是目前应用最广泛且可靠的诊断方法,但有时在区分良恶性结节方面存在局限,尤其是超声检查受操作者经验影响较大。影像组学(从医学图像中提取定量特征)与机器学习为提高诊断水平提供了有前景的途径。本研究旨在开发一种基于甲状腺超声图像的机器学习模型,以对结节进行良恶性分类。 方法:为此,我们收集了142例受试者的超声图像。根据细针穿刺活检的组织学诊断,其中40例患者(28.2%)属于“恶性”类别,102例患者(71.8%)属于“良性”类别。该图像集用于训练、交叉验证和内部测试三种不同的机器学习模型。我们采用了稳健的影像组学方法,基于影像组学特征能够捕捉两组间疾病异质性的假设。针对目标二元分类任务,开发了三种由四种机器学习分类器集成(随机森林、支持向量机和k近邻分类器)组成的模型。随后,在21例新患者队列中对性能最佳的模型进行了外部测试。 结果:在内部测试队列中,最佳模型(随机森林集成)的受试者工作特征曲线下面积(ROC-AUC)(%)为85(多数投票)、83.7**(均值)[80.2–87.2],准确率(%)为83、81.2** [77.1–85.2],灵敏度(%)为70、67.5** [64.3–70.7],特异度(%)为88、86.5** [82–91],阳性预测值(PPV)(%)为70、66.5** [57.9–75.1],阴性预测值(NPV)(%)为88、87.1** [85.5–88.8](*p<0.05,**p<0.005)。在外部测试队列中,其准确率为90.5%,灵敏度为100%,特异度为86.7%,PPV为75%,NPV为100%。 结论:由四种随机森林分类器集成构成的模型在外部测试队列中能够识别所有恶性结节以及绝大多数良性结节。