Despite significant advances in tumor biology and clinical therapeutics, metastasis remains the primary cause of cancer-related deaths. While RNA-seq technology has been used extensively to study metastatic cancer characteristics, challenges persist in acquiring adequate transcriptomic data. To overcome this challenge, we propose MetGen, a generative contrastive learning tool based on a deep learning model. MetGen generates synthetic metastatic cancer expression profiles using primary cancer and normal tissue expression data. Our results demonstrate that MetGen generates comparable samples to actual metastatic cancer samples, and the cancer and tissue classification yields performance rates of 99.8 ± 0.2% and 95.0 ± 2.3%, respectively. A benchmark analysis suggests that the proposed model outperforms traditional generative models such as the variational autoencoder. In metastatic subtype classification, our generated samples show 97.6% predicting power compared to true metastatic samples. Additionally, we demonstrate MetGen’s interpretability using metastatic prostate cancer and metastatic breast cancer. MetGen has learned highly relevant signatures in cancer, tissue, and tumor microenvironments, such as immune responses and the metastasis process, which can potentially foster a more comprehensive understanding of metastatic cancer biology. The development of MetGen represents a significant step toward the study of metastatic cancer biology by providing a generative model that identifies candidate therapeutic targets for the treatment of metastatic cancer.
尽管肿瘤生物学与临床治疗领域已取得显著进展,转移仍是癌症相关死亡的主要原因。虽然RNA测序技术已被广泛应用于转移性癌症特征研究,但获取足够的转录组数据仍面临挑战。为突破这一瓶颈,我们提出MetGen——一种基于深度学习模型的生成式对比学习工具。该工具利用原发性癌症与正常组织表达数据生成合成性转移癌表达谱。研究结果表明,MetGen生成的样本与实际转移癌样本具有可比性,其癌症分类与组织分类准确率分别达到99.8±0.2%和95.0±2.3%。基准分析显示该模型性能优于变分自编码器等传统生成模型。在转移亚型分类任务中,生成样本相较于真实转移样本展现出97.6%的预测效能。此外,我们通过转移性前列腺癌和转移性乳腺癌案例验证了MetGen的可解释性。该模型已成功捕获癌症、组织及肿瘤微环境中免疫应答、转移过程等高相关性特征标记,有望推动对转移性癌症生物学的更全面理解。MetGen的开发通过构建能够识别转移癌治疗候选靶点的生成模型,为转移性癌症生物学研究迈出了重要一步。