Background: The accurate staging of multiple myeloma (MM) is essential for optimizing treatment strategies, while predicting the progression of asymptomatic patients, also referred to as monoclonal gammopathy of undetermined significance (MGUS), to symptomatic MM remains a significant challenge due to limited data. This study aimed to develop machine learning models to enhance MM staging accuracy and stratify asymptomatic patients by their risk of progression. Methods: We utilized gene expression microarray datasets to develop machine learning models, combined with various data transformations. For multiple myeloma staging, models were trained on a single dataset and validated across five independent datasets, with performance evaluated using multiclass area under the curve (AUC) metrics. To predict progression in asymptomatic patients, we employed two approaches: (1) training models on a dataset comprising asymptomatic patients who either progressed or remained stable without progressing to multiple myeloma, and (2) training models on multiple datasets combining asymptomatic and multiple myeloma samples and then testing their ability to distinguish between asymptomatic and asymptomatic that progressed. We performed feature selection and enrichment analyses to identify key signaling pathways underlying disease stages and progression. Results: Multiple myeloma staging models demonstrated high efficacy, with ElasticNet achieving consistent multiclass AUC values of 0.9 across datasets and transformations, demonstrating robust generalizability. For asymptomatic progression, both modeling approaches yielded similar results, with AUC values exceeding 0.8 across datasets and algorithms (ElasticNet, Boosting, and Support Vector Machines), underscoring their potential in identifying progression risk. Enrichment analyses revealed key pathways, including PI3K-Akt, MAPK, Wnt, and mTOR, as central to MM pathogenesis. Conclusions: To the best of our knowledge, this is the first study to utilize gene expression datasets for classifying patients across different stages of multiple myeloma and to integrate multiple myeloma with asymptomatic cases to predict disease progression, offering a novel methodology with potential clinical applications in patient monitoring and early intervention.
背景:多发性骨髓瘤(MM)的准确分期对于优化治疗策略至关重要,而预测无症状患者(也称为意义未明的单克隆丙种球蛋白病,MGUS)向症状性MM的进展,由于数据有限,仍然是一个重大挑战。本研究旨在开发机器学习模型,以提高MM分期准确性,并根据进展风险对无症状患者进行分层。 方法:我们利用基因表达微阵列数据集,结合多种数据转换方法,开发了机器学习模型。对于多发性骨髓瘤分期,模型在单个数据集上训练,并在五个独立数据集上进行验证,使用多分类曲线下面积(AUC)指标评估性能。为了预测无症状患者的进展,我们采用了两种方法:(1)在包含进展或保持稳定未进展至多发性骨髓瘤的无症状患者的数据集上训练模型;(2)在结合无症状和多发性骨髓瘤样本的多个数据集上训练模型,然后测试其区分无症状患者与进展的无症状患者的能力。我们进行了特征选择和富集分析,以确定疾病阶段和进展的关键信号通路。 结果:多发性骨髓瘤分期模型表现出高效能,ElasticNet模型在不同数据集和转换方法中均实现了0.9的多分类AUC值,显示出强大的泛化能力。对于无症状进展,两种建模方法均取得了相似的结果,在不同数据集和算法(ElasticNet、Boosting和支持向量机)中AUC值均超过0.8,突显了其在识别进展风险方面的潜力。富集分析揭示了关键通路,包括PI3K-Akt、MAPK、Wnt和mTOR通路,这些通路在多发性骨髓瘤发病机制中起核心作用。 结论:据我们所知,这是首次利用基因表达数据集对不同阶段的多发性骨髓瘤患者进行分类,并将多发性骨髓瘤与无症状病例整合以预测疾病进展的研究,提供了一种具有潜在临床应用价值的新方法,可用于患者监测和早期干预。