Background and Objective:Prostate cancer remains one of the most prevalent and potentially lethal malignancies among men worldwide, and timely and accurate diagnosis, along with the stratification of patients by disease severity, is critical for personalized treatment and improved outcomes for this cancer. One of the tools used for diagnosis is bioinformatics. However, traditional biomarker discovery methods often lack transparency and interpretability, which means that clinicians find it difficult to trust biomarkers for their application in a clinical setting.Methods:This paper introduces a novel approach that leverages Explainable Machine Learning (XML) techniques to identify and prioritize biomarkers associated with different levels of severity of prostate cancer. The proposed XML approach presented in this study incorporates some traditional machine learning (ML) algorithms with transparent models to facilitate understanding of the importance of the characteristics for bioinformatics analysis, allowing for more informed clinical decisions. The proposed method contains the implementation of several ML classifiers, such as Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), and Bagging (Bg); followed by SHAPly values for the XML pipeline. In this study, for pre-processing of missing values, imputation was applied; SMOTE (Synthetic Minority Oversampling Technique) and the Tomek link method were applied to handle the class imbalance problem. The k-fold stratified validation of machine learning (ML) models and SHAP values (SHapley Additive explanations) were used for explainability.Results:This study utilized a novel tissue microarray data set that has 102 patient data comprising prostate cancer and healthy patients. The proposed model satisfactorily identifies genes as biomarkers, with highest accuracy obtained being 81.01% using RF. The top 10 potential biomarkers identified in this study are DEGS1, HPN, ERG, CFD, TMPRSS2, PDLIM5, XBP1, AJAP1, NPM1 and C7.Conclusions:As XML continues to unravel the complexities within prostate cancer datasets, the identification of severity-specific biomarkers is poised at the forefront of precision oncology. This integration paves the way for targeted interventions, improving patient outcomes, and heralding a new era of individualized care in the fight against prostate cancer.

摘要翻译：

背景与目的：前列腺癌仍是全球男性中最常见且具有潜在致命性的恶性肿瘤之一，及时准确的诊断及基于疾病严重程度的分层对实现个体化治疗、改善患者预后至关重要。生物信息学是诊断工具之一，但传统生物标志物发现方法往往缺乏透明度和可解释性，导致临床医生难以信任其在临床环境中的应用价值。方法：本文提出一种创新方法，利用可解释机器学习技术识别并优先筛选与前列腺癌不同严重程度相关的生物标志物。该研究提出的XML方法将传统机器学习算法与透明模型相结合，通过解析特征在生物信息学分析中的重要性，为临床决策提供更可靠的依据。该方法包含多种机器学习分类器的实现，如朴素贝叶斯、随机森林、决策树、支持向量机、逻辑回归和装袋算法，并采用SHAP值构建XML分析流程。研究通过插补法处理缺失值，运用合成少数类过采样技术和Tomek链接法解决类别不平衡问题，采用k折分层验证评估机器学习模型性能，并利用SHAP值实现模型可解释性。结果：本研究采用包含102例前列腺癌患者与健康对照的新型组织微阵列数据集。所提模型能有效识别基因生物标志物，其中随机森林算法取得最高准确率达81.01%。研究筛选出的前10位潜在生物标志物包括：DEGS1、HPN、ERG、CFD、TMPRSS2、PDLIM5、XBP1、AJAP1、NPM1和C7。结论：随着可解释机器学习技术不断揭示前列腺癌数据集的复杂性，针对疾病严重程度的特异性生物标志物识别已成为精准肿瘤学的前沿领域。这种技术融合为靶向干预开辟了新路径，不仅能改善患者预后，更标志着前列腺癌防治个体化医疗新纪元的到来。

原文链接：

Identification of Potential Biomarkers in Prostate Cancer Microarray Gene Expression Leveraging Explainable Machine Learning Classifiers

……

文章目录

文章：

基于可解释机器学习分类器的前列腺癌微阵列基因表达潜在生物标志物识别

Identification of Potential Biomarkers in Prostate Cancer Microarray Gene Expression Leveraging Explainable Machine Learning Classifiers

原文发布日期：30 November 2025

DOI: 10.3390/cancers17233853

类型: Article

开放获取: 是

英文摘要：

摘要翻译：

原文链接：

相关文章

关于我们

官方邮箱

商务合作