crossNN是一种可解释性框架,用于实现基于跨平台DNA甲基化数据的肿瘤分类
crossNN is an explainable framework for cross-platform DNA methylation-based classification of tumors
原文发布日期: 2025-06-06
英文摘要:
摘要翻译:
原文链接:
DNA methylation-based classification of (brain) tumors has emerged as a powerful and indispensable diagnostic technique. Initial implementations used methylation microarrays for data generation, while most current classifiers rely on a fixed methylation feature space. This makes them incompatible with other platforms, especially different flavors of DNA sequencing. Here, we describe crossNN, a neural network-based machine learning framework that can accurately classify tumors using sparse methylomes obtained on different platforms and with different epigenome coverage and sequencing depth. It outperforms other deep and conventional machine learning models regarding accuracy and computational requirements while still being explainable. We use crossNN to train a pan-cancer classifier that can discriminate more than 170 tumor types across all organ sites. Validation in more than 5,000 tumors profiled on different platforms, including nanopore and targeted bisulfite sequencing, demonstrates its robustness and scalability with 99.1% and 97.8% precision for the brain tumor and pan-cancer models, respectively.
基于DNA甲基化的(脑)肿瘤分类已成为一项强大且不可或缺的诊断技术。最初的实施方案采用甲基化微阵列生成数据,而当前大多数分类器依赖于固定的甲基化特征空间。这导致其无法兼容其他检测平台,尤其是各类DNA测序技术。在此,我们介绍crossNN——一种基于神经网络的机器学习框架,能够利用不同平台获取的、表观基因组覆盖度和测序深度各异的稀疏甲基化组数据实现精准肿瘤分类。该框架在准确性和计算效率方面均优于其他深度学习及传统机器学习模型,同时保持可解释性。我们运用crossNN训练了一个泛癌种分类器,可鉴别所有器官部位超过170种肿瘤类型。通过对纳米孔测序、靶向亚硫酸氢盐测序等不同平台检测的5,000余例肿瘤样本验证,证实其具有卓越的稳健性和可扩展性:脑肿瘤模型和泛癌模型的分类精确度分别达到99.1%和97.8%。
……