职业与健康 ›› 2025, Vol. 41 ›› Issue (15): 2098-2106.

• 论著 • 上一篇    下一篇

基于GEO和TCGA数据库筛选肺结核合并肺腺癌的关键基因及预后预测模型构建与评价

魏怡凡1,2, 李天心1,3, 李雨宸4, 杨欣雨4, 刘嘉亮4, 伊娜1   

  1. 1.新疆医科大学基础医学院生物化学与分子生物学教研室,省部共建中亚高发病成因与防治国家重点实验室,新疆地方病分子生物学重点实验室,新疆 乌鲁木齐 830017;
    2.新疆医科大学第五临床医学院,新疆 乌鲁木齐 830017;
    3.新疆医科大学药学院,新疆 乌鲁木齐 830017;
    4.新疆医科大学医学工程技术学院,新疆 乌鲁木齐 830017
  • 收稿日期:2024-11-03 修回日期:2024-11-12 出版日期:2025-08-15 发布日期:2025-12-12
  • 通信作者: 伊娜,讲师,E-mail:124504195@qq.com
  • 作者简介:魏怡凡,女,在读本科生;李天心,女,在读硕士研究生,研究方向为呼吸系统疾病。魏怡凡和李天心为共同第一作者。
  • 基金资助:
    省部共建中亚高发病成因与防治国家重点实验室开放课题资助项目(SKL-HIDCA-2021-46)

Construction and evaluation of key genes and prognosis prediction model for screening of pulmonary tuberculosis combined with lung adenocarcinoma based on GEO and TCGA databases

WEI Yifan1,2, LI Tianxin1,3, LI Yuchen4, YANG Xinyu4, LIU Jialiang4, YI Na1   

  1. 1. Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Xinjiang Medical University;State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia;Xinjiang Key Laboratory of Molecular Biology for Endemic Diseases, Urumqi, Xinjiang 830017, China;
    2. The Fifth Clinical Medical College, Xinjiang Medical University, Urumqi, Xinjiang 830017, China;
    3. School of Pharmacy, Xinjiang Medical University, Urumqi, Xinjiang 830017, China;
    4. Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, Xinjiang 830017, China
  • Received:2024-11-03 Revised:2024-11-12 Online:2025-08-15 Published:2025-12-12
  • Contact: YI Na,Lecturer,E-mail:124504195@qq.com

摘要: 目的 长期肺结核病史容易诱发肺癌,而肺癌患者更易感染肺结核。本研究旨在识别和验证与肺结核相关的基因,可作为肺腺癌患者预后的潜在生物标志物,并构建预后预测模型。方法 对肺结核GSE126614数据集进行差异表达及加权基因共表达网络(weighted gene co-expression network analysis,WGCNA)分析,交集获取候选基因,进一步在癌症基因组图谱计划(the cancer genome atlas,TCGA)肺腺癌数据库中进行生存分析,采用单因素、Lasso-Cox多因素分析最终选出5个核心基因,进一步构建预测模型并进行外部验证。结果 对肺结核GSE126614数据集进行差异表达及WGCNA分析,交集获取241个基因,进一步进行生存分析,采用单因素、Lasso-Cox多因素分析最终选出5个基因并构建风险模型,风险模型如下:风险评分=(0.000 67×PPTC7表达量)+(-0.000 5×RHOQ表达量)+(0.000 1×TRIM28表达量)+(-0.031×USP49表达量)+(-0.000 2×ZNF710表达量)。内部验证结果显示,在TCGA-LUAD训练队列中1年、3年和5年的AUC值分别为0.755、0.747、0.723。在TCGA-LUAD验证队列中也表现出一致的预测性能。此外,外部验证显示ROC曲线1年、3年和5年AUC值分别为0.732、0.68、0.646,也表明该模型具有较好的预测性能。多因素Cox分析结果显示,风险评分(P<0.001)是独立预后因素。在基因转录水平,TRIM28、PPTC7、ZNF710、USP49在肺腺癌组织中存在较高表达。在翻译水平,TRIM28、PPTC7、ZNF710在肺腺癌组织中表达量相对较高。结论 该5个基因组成的预后模型对于肺腺癌患者预后的预测具有一定的价值,可为临床诊疗工作提供参考。

关键词: 肺结核, 肺腺癌, 核心基因, 预后预测模型

Abstract: Objective Long-term history of pulmonary tuberculosis is easy to induce lung cancer,and lung cancer patients are more likely to be infected with tuberculosis. The purpose of this study is to identify and verify the genetic characteristics associated with pulmonary tuberculosis,which can be used as a potential biomarker for the prognosis of patients with lung adenocarcinoma,and to construct a prognostic prediction model. Methods Differential expression and weighted gene co-expression network analysis(WGCNA) were performed on the GSE126614 datasets of pulmonary tuberculosis,candidate genes were obtained by intersection,and further survival analysis was performed in the cancer genome atlas(TCGA) lung adenocarcinoma database. Five core genes were finally selected by univariate and Lasso-Cox multivariate analysis,and a prediction model was further constructed and externally validated. Results Differential expression and WGCNA analysis were performed on the tuberculosis GSE126614 datasets,and 241 genes were obtained from the intersection for further survival analysis. Single factor and Lasso-Cox multivariate analysis were used to finally select 5 genes and construct a risk model. The risk model was as follows:risk score =(0.000 67×PPTC7 expression) +(-0.000 5×RHOQ expression)+(0.000 1×TRIM28 expression)+(-0.031×USP49 expression)+(-0.000 2×ZNF710 expression). The internal validation results showed that the AUC values of 1-year,3-years and 5-years in the TCGA-LUAD training cohort were 0.755,0.747 and 0.723,respectively,and consistent predictive ability was also demonstrated in the TCGA-LUAD validation queue. In addition,external validation showed that the 1-year,3-year and 5-year AUC values of the ROC curve were 0.732,0.68 and 0.646,respectively,which also showed that the model had good predictive ability. Multivariate Cox analysis showed that risk score(P<0.001) was an independent prognostic factor. At the level of gene transcription,TRIM28,PPTC7,ZNF710 and USP49 were relatively highly expressed in lung adenocarcinoma tissues. At the translation level,TRIM28,PPTC7 and ZNF710 were relatively highly expressed in lung adenocarcinoma tissues. Conclusion The prognostic model composed of these five genes has certain value in predicting the prognosis of patients with lung adenocarcinoma,and can provide reference for clinical diagnosis and treatment.

Key words: Pulmonary tuberculosis, Lung adenocarcinoma, Core genes, Prognosis prediction model

中图分类号: