Cancer in all its forms of expression is a major cause of death. To identify the genomic reason behind cancer, discovery of biomarkers is needed. In this paper, genomic data of bladder cancer are examined for the purpose of biomarker discovery. Genomic biomarkers are indicators stemming from the study of the genome, either at a very low level based on the genome sequence itself, or more abstractly such as measuring the level of gene expression for different disease groups. The latter method is pivotal for this work, since the available datasets consist of RNA sequencing data, transformed to gene expression levels, as well as data on a multitude of clinical indicators. Based on this, various methods are utilized such as statistical modeling via logistic regression and regularization techniques (elastic-net), clustering, survival analysis through Kaplan–Meier curves, and heatmaps for the experiments leading to biomarker discovery. The experiments have led to the discovery of two gene signatures capable of predicting therapy response and disease progression with considerable accuracy for bladder cancer patients which correlates well with clinical indicators such as Therapy Response and T-Stage at surgery with Disease Progression in a time-to-event manner.
癌症作为多种表现形式的主要致死原因,其基因组层面的致病机制亟待阐明。为实现这一目标,生物标志物的发现至关重要。本研究以膀胱癌基因组数据为研究对象,致力于发掘相关生物标志物。基因组生物标志物源于基因组学研究,既包括基于基因组序列的微观层面分析,也涵盖通过测量不同疾病组基因表达水平等更为抽象的方法。本研究采用后者作为核心研究方法,所用数据集包含经转录组测序转换的基因表达谱数据及多项临床指标数据。基于此,我们综合运用多种分析方法:通过逻辑回归与正则化技术(弹性网络)进行统计建模、聚类分析、基于卡普兰-迈耶曲线的生存分析以及热图可视化技术,系统开展生物标志物发现实验。实验成功鉴定出两组基因特征标记,能够以较高准确度预测膀胱癌患者的治疗反应与疾病进展。这些基因标记与临床指标(如治疗反应、手术时T分期)在时间-事件维度上呈现显著相关性,尤其与疾病进展过程高度契合。