Background: Glioblastoma (GBM) is one of the most common malignant primary brain tumors, which accounts for 60–70% of all gliomas. Conventional diagnosis and the decision of post-operation treatment plan for glioblastoma is mainly based on the feature-based qualitative analysis of hematoxylin and eosin-stained (H&E) histopathological slides by both an experienced medical technologist and a pathologist. The recent development of digital whole slide scanners makes AI-based histopathological image analysis feasible and helps to diagnose cancer by accurately counting cell types and/or quantitative analysis. However, the technology available for digital slide image analysis is still very limited. This study aimed to build an image feature-based computer model using histopathology whole slide images to differentiate patients with glioblastoma (GBM) from healthy control (HC). Method: Two independent cohorts of patients were used. The first cohort was composed of 262 GBM patients of the Cancer Genome Atlas Glioblastoma Multiform Collection (TCGA-GBM) dataset from the cancer imaging archive (TCIA) database. The second cohort was composed of 60 GBM patients collected from a local hospital. Also, a group of 60 participants with no known brain disease were collected. All the H&E slides were collected. Thirty-three image features (22 GLCM and 11 GLRLM) were retrieved from the tumor volume delineated by medical technologist on H&E slides. Five machine-learning algorithms including decision-tree (DT), extreme-boost (EB), support vector machine (SVM), random forest (RF), and linear model (LM) were used to build five models using the image features extracted from the first cohort of patients. Models built were deployed using the selected key image features for GBM diagnosis from the second cohort (local patients) as model testing, to identify and verify key image features for GBM diagnosis. Results: All five machine learning algorithms demonstrated excellent performance in GBM diagnosis and achieved an overall accuracy of 100% in the training and validation stage. A total of 12 GLCM and 3 GLRLM image features were identified and they showed a significant difference between the normal and the GBM image. However, only the SVM model maintained its excellent performance in the deployment of the models using the independent local cohort, with an accuracy of 93.5%, sensitivity of 86.95%, and specificity of 99.73%. Conclusion: In this study, we have identified 12 GLCM and 3 GLRLM image features which can aid the GBM diagnosis. Among the five models built, the SVM model proposed in this study demonstrated excellent accuracy with very good sensitivity and specificity. It could potentially be used for GBM diagnosis and future clinical application.
背景:胶质母细胞瘤(GBM)是最常见的恶性原发性脑肿瘤之一,占所有胶质瘤的60-70%。目前,胶质母细胞瘤的常规诊断及术后治疗方案的制定主要依赖于经验丰富的医学技术人员和病理学家对苏木精-伊红(H&E)染色组织病理学切片进行基于特征的定性分析。近年来,数字全玻片扫描仪的发展使得基于人工智能的组织病理学图像分析成为可能,通过精确计数细胞类型和/或进行定量分析,有助于癌症的诊断。然而,目前可用于数字切片图像分析的技术仍然非常有限。本研究旨在利用组织病理学全玻片图像,构建一个基于图像特征的计算机模型,以区分胶质母细胞瘤(GBM)患者与健康对照(HC)。 方法:本研究使用了两个独立的患者队列。第一个队列由来自癌症影像档案(TCIA)数据库中癌症基因组图谱胶质母细胞瘤集合(TCGA-GBM)数据集的262名GBM患者组成。第二个队列由从本地医院收集的60名GBM患者组成。同时,还收集了一组60名无已知脑部疾病的参与者。所有H&E染色切片均被收集。从医学技术人员在H&E切片上勾画的肿瘤区域中提取了33个图像特征(22个灰度共生矩阵特征和11个灰度游程矩阵特征)。使用从第一个患者队列中提取的图像特征,应用决策树、极限提升、支持向量机、随机森林和线性模型这五种机器学习算法构建了五个模型。利用从第二个队列(本地患者)中筛选出的用于GBM诊断的关键图像特征对所建模型进行部署和测试,以识别和验证用于GBM诊断的关键图像特征。 结果:所有五种机器学习算法在GBM诊断中均表现出优异的性能,在训练和验证阶段均达到了100%的总体准确率。共识别出12个灰度共生矩阵特征和3个灰度游程矩阵特征,这些特征在正常图像与GBM图像之间显示出显著差异。然而,在利用独立的本地队列进行模型部署时,只有支持向量机模型保持了其优异的性能,准确率为93.5%,敏感性为86.95%,特异性为99.73%。 结论:在本研究中,我们识别出了12个灰度共生矩阵特征和3个灰度游程矩阵特征,这些特征有助于GBM的诊断。在构建的五个模型中,本研究提出的支持向量机模型表现出优异的准确率以及良好的敏感性和特异性。该模型有望用于GBM的诊断及未来的临床应用。