Background/Objectives: High-grade serous carcinomas (HGSCs) are highly heterogeneous tumors, both among patients and within a single tumor. Differences in molecular mechanisms significantly describe this heterogeneity. Four molecular subtypes have been previously described by the Cancer Genome Atlas Consortium: differentiated, immunoreactive, mesenchymal, and proliferative. These subtypes may have varying degrees of progression, relapse-free survival, and overall survival, as well as response to therapy. The precise determination of these subtypes is certainly necessary both for diagnosis and future development of targeted therapies within personalized medicine. Methods: In this study, we analyzed gene expression data based on bulk RNA-seq, scRNA-seq, and spatial transcriptomic data from six cohorts (totaling 535 samples, including 60 single-cell samples). Differential expression analysis was performed using the edgeR package. The KEGG database and GSVA package were used for pathways enrichment analysis. As a predictive model, a deep neural network was created using the keras and tensorflow libraries. Results: We identified 357 differentially expressed genes among the four subtypes: 96 differentiated, 33 immunoreactive, 91 mesenchymal, and 137 proliferative. Based on these, we created OVsignGenes, a neural network model resistant to the effects of platform (test dataset AUC = 0.969). We then ran data from five more cohorts through our model, including scRNA-seq and spatial transcriptomics. Conclusions: Because the differentiated subtype is located at the intersection of the other three subtypes based on PCA and does not have a unique profile of differentially expressed genes or enriched pathways, it can be considered an initiating subtype of tumor that will develop into one of the three other subtypes.
背景/目的:高级别浆液性癌(HGSCs)在患者间及单个肿瘤内部均呈现高度异质性。分子机制的差异显著解释了这种异质性。癌症基因组图谱联盟先前描述了四种分子亚型:分化型、免疫反应型、间质型和增殖型。这些亚型在疾病进展、无复发生存期、总生存期以及对治疗的反应方面可能存在差异。在个性化医疗背景下,精确判定这些亚型对于诊断及未来靶向治疗的开发至关重要。方法:本研究基于六个队列(总计535个样本,含60个单细胞样本)的批量RNA测序、单细胞RNA测序及空间转录组数据进行了基因表达分析。使用edgeR软件包进行差异表达分析,KEGG数据库和GSVA软件包用于通路富集分析。作为预测模型,我们利用keras和tensorflow库构建了深度神经网络。结果:我们在四种亚型间鉴定出357个差异表达基因:分化型96个、免疫反应型33个、间质型91个、增殖型137个。基于这些基因,我们构建了OVsignGenes神经网络模型,该模型对平台效应具有稳健性(测试数据集AUC=0.969)。随后我们将另外五个队列(包括单细胞RNA测序和空间转录组学数据)输入模型进行验证。结论:基于主成分分析显示分化型亚型位于其他三种亚型的交汇区域,且缺乏独特的差异表达基因谱或富集通路特征,因此可被视为肿瘤的起始亚型,最终将发展为其他三种亚型之一。