Background: Previous studies have described sex-specific patient subtyping in glioblastoma. The cluster labels associated with these “legacy data” were used to train a predictive model capable of recapitulating this clustering in contemporary contexts. Methods: We used robust ensemble machine learning to train a model using gene microarray data to perform multi-platform predictions including RNA-seq and potentially scRNA-seq. Results: The engineered feature set was composed of many previously reported genes that are associated with patient prognosis. Interestingly, these well-known genes formed a predictive signature only for female patients, and the application of the predictive signature to male patients produced unexpected results. Conclusions: This work demonstrates how annotated “legacy data” can be used to build robust predictive models capable of multi-target predictions across multiple platforms.
背景:既往研究已描述了胶质母细胞瘤中性别特异性的患者亚型分类。这些“遗留数据”所关联的聚类标签被用于训练一个预测模型,该模型能够在当代背景下重现此类聚类分析。方法:我们采用稳健的集成机器学习方法,利用基因微阵列数据训练模型,以实现跨平台预测,包括RNA测序及潜在的单细胞RNA测序。结果:构建的特征集包含多个先前报道的与患者预后相关的基因。值得注意的是,这些已知基因仅对女性患者形成了有效的预测特征,而将该预测特征应用于男性患者时产生了预期之外的结果。结论:本研究展示了如何利用带注释的“遗留数据”构建稳健的预测模型,该模型能够实现跨多个平台的多目标预测。
Robust Cluster Prediction Across Data Types Validates Association of Sex and Therapy Response in GBM