For humans, the parallel processing capability of visual recognition allows for faster comprehension of complex scenes and patterns. This is essential, especially for clinicians interpreting big data for whom the visualization tools play an even more vital role in transforming raw big data into clinical decision making by managing the inherent complexity and monitoring patterns interactively in real time. The Cancer Genome Atlas (TCGA) database’s size and data variety challenge the effective utilization of this valuable resource by clinicians and biologists. We re-analyzed the five molecular data types, i.e., mutation, transcriptome profile, copy number variation, miRNA, and methylation data, of ~11,000 cancer patients with all 33 cancer types and integrated the existing TCGA patient cohorts from the literature into a free and efficient web application: TCGAnalyzeR. TCGAnalyzeR provides an integrative visualization of pre-analyzed TCGA data with several novel modules: (i) simple nucleotide variations with driver prediction; (ii) recurrent copy number alterations; (iii) differential expression in tumor versus normal, with pathway and the survival analysis; (iv) TCGA clinical data including metastasis and survival analysis; (v) external subcohorts from the literature, curatedTCGAData, and BiocOncoTK R packages; (vi) internal patient clusters determined using an iClusterPlus R package or signature-based expression analysis of five molecular data types. TCGAnalyzeR integrated the multi-omics, pan-cancer TCGA with ~120 subcohorts from the literature along with clipboard panels, thus allowing users to create their own subcohorts, compare against existing external subcohorts (MSI, Immune, PAM50, Triple Negative,IDH1, miRNA, metastasis, etc.) along with our internal patient clusters, and visualize cohort-centric or gene-centric results interactively using TCGAnalyzeR.
对人类而言,视觉识别的并行处理能力有助于快速理解复杂场景与模式。这一特性至关重要,尤其对于需要解读大数据的临床医生而言——可视化工具通过实时交互管理固有复杂性并监测模式,在将原始大数据转化为临床决策过程中发挥着更为关键的作用。癌症基因组图谱(TCGA)数据库的庞大规模与数据多样性,对临床医生和生物学家有效利用这一宝贵资源构成了挑战。我们重新分析了约11,000名涵盖全部33种癌症类型患者的五种分子数据类型(包括突变、转录组谱、拷贝数变异、miRNA和甲基化数据),并将文献中现有的TCGA患者队列整合至一款免费高效的网络应用程序:TCGAnalyzeR。该平台通过六大创新模块提供预分析TCGA数据的集成可视化:(一)含驱动基因预测的简单核苷酸变异分析;(二)复发性拷贝数改变分析;(三)肿瘤与正常组织的差异表达分析(含通路与生存分析);(四)包含转移与生存分析的TCGA临床数据;(五)来自文献的外部亚队列、curatedTCGAData及BiocOncoTK R软件包数据;(六)通过iClusterPlus R软件包或基于特征表达分析确定的五种分子数据类型内部患者聚类。TCGAnalyzeR整合了多组学泛癌TCGA数据与约120个文献来源亚队列,配备剪贴板面板功能,支持用户创建自定义亚队列,与现有外部亚队列(MSI、免疫、PAM50、三阴性、IDH1、miRNA、转移等)及内部患者聚类进行对比,并通过交互式操作实现以队列为中心或以基因为中心的可视化分析。