Background/Objectives:Ovarian cancer is a heterogeneous malignancy with molecular subtypes that strongly influence prognosis and therapy. High-dimensional mRNA data can capture this biological diversity, but its complexity and noise limit robust subtype characterization. Furthermore, current classification approaches often fail to reflect subtype-specific transcriptional programs, underscoring the need for computational strategies that reduce dimensionality and identify discriminative molecular features.Methods:We designed a multi-stage feature selection and network analysis framework tailored for high-dimensional transcriptomic data. Starting with ~65,000 mRNA features, we applied unsupervised variance-based filtering and correlation pruning to eliminate low-information genes and reduce redundancy. The applied supervised Select-K Best filtering further refined the feature space. To enhance robustness, we implemented a hybrid selection strategy combining recursive feature elimination (RFE) with random forests and LASSO regression to identify discriminative mRNA features. Finally, these features were then used to construct a gene co-expression similarity network.Results:This pipeline reduced approximately 65,000 gene features to a subset of 83 discriminative transcripts, which were then used for network construction to reveal subtype-specific biology. The analysis identified four distinct groups. One group exhibited classical high-grade serous features defined by TP53 mutations and homologous recombination deficiency, while another was enriched for PI3K/AKT and ARID1A-associated signaling consistent with clear cell and endometrioid-like biology. A third group displayed drug resistance-associated transcriptional programs with receptor tyrosine kinase activation, and the fourth demonstrated a hybrid profile bridging serous and endometrioid expression modules.Conclusions:This pilot study shows that combining unsupervised and supervised feature selection with network modeling enables robust stratification of ovarian cancer subtypes.
背景/目的:卵巢癌是一种异质性恶性肿瘤,其分子亚型对预后和治疗具有重要影响。高维mRNA数据能够捕捉这种生物学多样性,但其复杂性和噪声限制了稳健的亚型特征刻画。此外,当前分类方法往往未能反映亚型特异性转录程序,这凸显了需要开发能够降维并识别判别性分子特征的计算策略。 方法:我们设计了一个针对高维转录组数据的多阶段特征选择与网络分析框架。从约65,000个mRNA特征出发,我们应用了基于方差的非监督过滤和相关性剪枝,以剔除低信息基因并减少冗余。随后采用有监督的Select-K Best过滤进一步优化特征空间。为增强稳健性,我们实施了结合递归特征消除(RFE)与随机森林及LASSO回归的混合选择策略,以识别判别性mRNA特征。最后,利用这些特征构建基因共表达相似性网络。 结果:该流程将约65,000个基因特征缩减至83个判别性转录本子集,并用于网络构建以揭示亚型特异性生物学特征。分析识别出四个不同亚组:一组呈现由TP53突变和同源重组缺陷定义的经典高级别浆液性特征;另一组富集PI3K/AKT和ARID1A相关信号通路,符合透明细胞癌和子宫内膜样癌生物学特性;第三组显示与受体酪氨酸激酶激活相关的耐药性转录程序;第四组则表现出连接浆液性与子宫内膜样表达模块的混合特征。 结论:本初步研究表明,将非监督与有监督特征选择结合网络建模,能够实现对卵巢癌亚型的稳健分层。