Background: Ovarian cancer is characterized by high mortality rates, primarily due to diagnosis at late stages. Current biomarkers, such as CA125, have demonstrated limited efficacy for early detection. While high-dimensional proteomics offers a more comprehensive view of systemic biology, the analysis of such data, where the number of features far exceeds the number of samples, presents a significant computational challenge.Methods: This study utilized a nested case–control cohort of longitudinal pre-diagnostic serum samples from the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) profiled for eight candidate ovarian cancer biomarkers (CA125, HE4, PEBP4, CHI3L1, FSTL1, AGR2, SLPI, DNAH17) and 92 additional cancer-associated proteins from the Olink Oncology II panel. We employed a Synolitic Graph Neural Network framework that transforms high-dimensional multi-protein data into sample-specific, interconnected graphs using a synolitic network approach. These graphs, which encode the relational patterns between proteins, were then used to train Graph Neural Network (GNN) models for classification. Performance of the network approach was evaluated together with conventional machine learning approaches via 5-fold cross-validation on samples collected within one year of diagnosis and a separate holdout set of samples collected one to two years prior to diagnosis.Results: In samples collected within one year of ovarian cancer diagnosis, conventional machine learning models—including XGBoost, random forests, and logistic regression—achieved the highest discriminative performance, with XGBoost reaching an ROC-AUC of 92%. Graph Convolutional Networks (GCNs) achieved moderate performance in this interval (ROC-AUC ~71%), with balanced sensitivity and specificity comparable to mid-performing conventional models. In the 1–2 year early-detection window, conventional model performance declined sharply (XGBoost ROC-AUC 46%), whereas the GCN maintained robust discriminative ability (ROC-AUC ~74%) with relatively balanced sensitivity and specificity. These findings indicate that while conventional approaches excel at detecting late pre-diagnostic signals, GNNs are more stable and effective at capturing subtle early molecular changes.Conclusions: The synolitic GNN framework demonstrates robust performance in early pre-diagnostic detection of ovarian cancer, maintaining accuracy where conventional methods decline. These results highlight the potential of network-informed machine learning to identify subtle proteomic patterns and pathway-level dysregulation prior to clinical diagnosis. This proof-of-concept study supports further development of GNN approaches for early ovarian cancer detection and warrants validation in larger, independent cohorts.
背景:卵巢癌死亡率高,主要原因是诊断时已处于晚期。现有生物标志物(如CA125)在早期检测方面效果有限。虽然高维蛋白质组学能更全面地反映系统生物学特征,但在特征数量远多于样本数量的情况下,此类数据的分析面临巨大的计算挑战。 方法:本研究采用巢式病例对照队列,样本来源于英国卵巢癌筛查协作试验(UKCTOCS)中收集的纵向诊断前血清样本。样本检测了八种候选卵巢癌生物标志物(CA125、HE4、PEBP4、CHI3L1、FSTL1、AGR2、SLPI、DNAH17)以及Olink Oncology II panel中的92种其他癌症相关蛋白。我们采用Synolitic图神经网络框架,通过Synolitic网络方法将高维多蛋白数据转化为样本特异性、相互关联的图。这些图编码了蛋白质之间的关联模式,随后用于训练图神经网络(GNN)模型进行分类。我们通过5折交叉验证,在诊断前一年内采集的样本以及诊断前一至两年采集的独立保留样本集上,评估了该网络方法与常规机器学习方法的性能。 结果:在卵巢癌诊断前一年内采集的样本中,常规机器学习模型(包括XGBoost、随机森林和逻辑回归)取得了最高的判别性能,其中XGBoost的ROC-AUC达到92%。图卷积网络(GCN)在此时间区间内表现中等(ROC-AUC约71%),其平衡的敏感性和特异性与中等性能的常规模型相当。在诊断前1-2年的早期检测窗口期,常规模型性能急剧下降(XGBoost ROC-AUC为46%),而GCN保持了稳健的判别能力(ROC-AUC约74%),且敏感性和特异性相对平衡。这些发现表明,虽然常规方法在检测晚期诊断前信号方面表现出色,但GNN在捕捉细微的早期分子变化方面更为稳定和有效。 结论:Synolitic GNN框架在卵巢癌早期诊断前检测中表现出稳健的性能,在常规方法效果下降时仍能保持准确性。这些结果凸显了基于网络的机器学习在临床诊断前识别细微蛋白质组学模式和通路水平失调的潜力。这项概念验证研究支持进一步开发GNN方法用于卵巢癌早期检测,并有必要在更大规模的独立队列中进行验证。