
OCCUPATION AND HEALTH ›› 2023, Vol. 39 ›› Issue (12): 1719-1725.
• Overview • Previous Articles Next Articles
ZOU Qiong1,2, WANG Chong3
Received:2022-10-18
Revised:2022-11-21
Published:2026-03-15
Contact:
WANG Chong,Associate professor,E-mail:w-goahead@163.com
CLC Number:
ZOU Qiong, WANG Chong. Research progress of unbalanced data classification and its application in disease diagnosis. [J]OCCUPATION AND HEALTH, 2023, 39(12): 1719-1725.
| [1] COLLINS G S,REITSMA J B,ALTMAN D G,et al.Transparent reporting of a multivariable prediction model for indiv-idual prognosis or diagnosis(TRIPOD):The TRIPOD statement[J].Br J Cancer,2015,112(2):251-259. [2] JAYANTHI N,BABU B V,RAO N S.Survey on clinical prediction models for diabetes prediction[J].J Big Data,2017,4(1):1-15. [3] 张灵婕,尤添革.基于R语言对不平衡数据分类的研究[J].福建电脑,2018,34(1):10-11,32. [4] RAY A,CHAUDHURI A K.Smart healthcare disease diagnosis and patient management:Innovation,improvement and skill development[J].Mach Learn Appl,2021,3:100011. [5] XIAO Y,WU J,LIN Z.Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data[J].Comput Biol Med,2021,135:104540. [6] SRINIVAS K,RAO G R,GOVARDHAN A.Adapting rough-fuzzy classifier to solve class imbalance problem in heart disease prediction using FCM[J].Int J Med Eng Inform,2014,6(4):297-318. [7] PANDEY S K,JANGHEL R R.Automatic detection of arrhythmia from imbalanced ECG database using CNN model with SMOTE[J].Australas Phys Eng Sci Med,2019,42(4):1129-1139. [8] SHI M,TANG Y,ZHU X,et al.Multiclass imbalanced graph convolutional network learning[C].Proceedings of the Twenty Ninth International Joint Conference on Artificial Intelligence(IJCAI-20).International Joint Conferences on Artificial Int elligence Organization.Yokohama,Japan,2020:2879-2885. [9] JAIN A,RATNOO S,KUMAR D.A novel multiobjective genetic algorithm approach to address class imbalance for disease diagnosis[J].Int J Inf Tecnol,2020:1-16. [10] KOZIARSKI M.Radial-based undersampling for imbalanced data classification[J].Pattern Recognit,2020,102:107262. [11] LIANG T,XU J,ZOU B,et al.LDAMSS:Fast and efficient undersampling method for imbalanced learning[J].Appl Int-ell,2022,52(6):6794-6811. [12] TRIGUERO I,GALAR M,VLUYMANS S,et al.Evolutionary undersampling for imbalanced big data classification[C].2015 IEEE Congress on Evolutionary Computation(CEC).Sendai,Japan,IEEE,2015:715-722. [13] 周玉,孙红玉,房倩,等.不平衡数据集分类方法研究综述[J].计算机应用研究,2022,39(6):1-7. [14] NG W W Y,XU S,ZHANG J,et al.Hashing-based undersampling ensemble for imbalanced pattern classification problems[J].IEEE Trans Cybern,2022,52(2):1269-1279. [15] LIN W C,TSAI C F,HU Y H,et al.Clustering based undersampling in class imbalanced data[J].Inf Sci,2017,409:17-26. [16] GALAR M,FERNANDEZ A,BARRENECHEA E,et al.A review on ensembles for the class imbalance problem:Bagging,boosting,and hybrid based approaches[J].IEEE Trans Syst Man Cyber Part C Rev,2011,42(4):463-484. [17] WANG S,YAO X.Diversity analysis on imbalanced data sets by using ensemble models[C].2009 IEEE symposium on computational intelligence and data mining.Nashville,TN,USA,IEEE,2009:324-331. [18] CHAWLA N V,LAZAREVIC A,HALL L O,et al.SMOTEBoost:Improving prediction of the minority class in boosting[C].European conference on principles of data mining and knowledge discovery.Springer,Berlin,Heidelberg,2003:107-119. [19] GNIP P,VOKOROKOS L,DROTÁR P.Selective oversampling approach for strongly imbalanced data[J].Peer J Comput Sci,2021,7:e604. [20] GAZZAH S,HECHKEL A,AMARA N E B.A hybrid sampling method for imbalanced data[C].2015 IEEE 12th International Multi Conference on Systems,Signals & Devices (SSD15).Mahdia,Tunisia,IEEE,2015:1-6. [21] DONGDONG L,ZIQIU C,BOLU W,et al.Entropybased hybrid sampling ensemble learning for imbalanced data[J].Int J Intell Syst,2021,36(7):3039-3067. [22] LI X,ZHANG L.Unbalanced data processing using deep sparse learning technique[J].Future Gener Comput Syst,2021,125:480-484. [23] HASIB K M,TOWHID N A,ISLAM M R.HSDLM:A hybrid sampling with deep learning method for imbalanced data classification[J].Int J Cloud Appl Com,2021,11(4):1-13. [24] PARK S,PARK H.Combined oversampling and undersampling method based on slowstart algorithm for imbalanced net work traffic[J].Computing,2020,103(3):401-424. [25] HOU Y,FAN H,L I L,et al.Adaptive learning costsensitive convolutional neural network[J].IET Comput Vis,2021,15(5):346-355. [26] WU X,KUMAR V,ROSS QUINLAN J,et al.Top 10 algorithms in data mining[J].Knowl Inf Syst,2008,14(1):1-37. [27] FARQUAD M A H,BOSE I.Preprocessing unbalanced data using support vector machine[J].Decis Support Syst,2012,53(1):226-233. [28] WANG C,ZHOU J,HUANG H,et al.Classification algorithms for unbalanced high-dimensional data with hyperbox vertex over-sampling iterative support vector machine approach[C].2020 Chinese Control And Decision Conference(CCD C).Hefei,China,IEEE,2020:2294-2299. [29] 姜飞,杨明,刘雨欣.基于支持向量机混合采样的不平衡数据分类方法[J].数学的实践与认识,2021,51(1):88-96. [30] BERNARDINI M,ROMEO L,MISERICORDIA P,et al.Discovering the Type 2 Diabetes in electronic health records using the sparse balanced support vector machine[J].IEEE J Biomed Health Inform,2020,24(1):235-246. [31] YUAN F,GUO J,XIAO Z,et al.A Transformer fault diagnosis model based on chemical reaction optimization and twin support Vector Machine[J].Energies,2019,12(5):960. [32] KHEMCHANDANI R,CHANDRA S.Twin support vector machines for pattern classification[J].IEEE Trans Pattern Anal Mach Intell,2007,29(5):905-910. [33] PANT H,SHARMA M,SOMAN S.Twin neural networks for the classification of large unbalanced datasets[J].Neurocomputing,2019,343:34-49. [34] PES B,LAI G.Cost-sensitive learning strategies for high-dimensional and imbalanced data:A comparative study[J].Peer J Comput Sci,2021,7:e832. [35] 李艳霞,柴毅,胡友强,等.不平衡数据分类方法综述[J].控制与决策,2019,34(4):673-688. [36] CORNEJO-BUENO L,CAMACHO-GÓMEZ C,AYBAR-RUÍZ A,et al.Wind power ramp event detection with a hybrid neuroevolutionary approach[J].Neural Comput Appl,2020,32(2):391-402. [37] HAYASHI T,FUJITA H.One-class ensemble classifier for data imbalance problems[J].Appl Intell,2021:1-17. [38] SCHÖLKOPF B,PLATT J C,SHAWE-TAYLOR J,et al.Estimating the support of a highdimensional distribution[J].Neural Comput,2001,13(7):1443-1471. [39] TAX D M J,DUIN R P W.Support vector data description[J].Mach Learn,2004,54(1):45-66. [40] DE SOUZA M C,NOGUEIRA B M,ROSSI R G,et al.A networkbased positive and unlabeled learning approach for fake news detection[J].Mach Learn,2021:1-44. [41] ITANI S,LECRON F,FORTEMPS P.A oneclass classification decision tree based on kernel density estimation[J].Appl Soft Comput,2020,91:106250. [42] LEE J,LEE Y C,KIM J T.Fault detection based on oneclass deep learning for manufacturing applications limited to an imbalanced database[J].J Manuf Syst,2020,57:357-366. [43] DEVI D,BISWAS S K,PURKAYASTHA B.Learning in presence of class imbalance and class overlapping by using oneclass SVM and undersampling technique[J].Conn Sci,2019,31(2):105-142. [44] QIU K,SONG W,WANG P.Abnormal data detection for industrial processes using adversarial autoencoders support vector data description[J].Meas Sci Technol,2022,33(5):055110. [45] TSAI C F,LIN W C.Feature selection and ensemble learning techniques in one-class classifiers:An empirical study of two-class imbalanced datasets[J].IEEE Access,2021,9:13717-13726. [46] PETRIDES G,VERBEKE W.Cost sensitive ensemble learning:A unifying framework[J].Data Min Knowl Discov,2021,36(1):1-28. [47] HE H,GARCIA E A.Learning from Imbalanced Data[J].IEEE Trans Knowl Data Eng,2009,21(9):1263-1284. [48] YEN S J,LEE Y S.Clusterbased undersampling approaches for imbalanced data distributions[J].Expert Syst Appl,2009,36(3):5718-5727. [49] LE H L,LANDA-SILVA D,GALAR M,et al.EUSC:A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification[J].Appl Soft Comput,2021,101:107033. [50] LING C X,SHENG V S.Cost-sensitive learning and the class imbalance problem[M].SAMMUT C.Encyclopedia of machine learning.Berlin:Springer,2008:231-235. [51] KIM K H,SOHN S Y.Hybrid neural network with costsensitive support vector machine for class-imbalanced multimodal data[J].Neural Netw,2020,130:176-184. [52] ZHANG C,TAN K C,LI H,et al.A costsensitive deep belief network for imbalanced classification[J].IEEE Trans Neural Netw Learn Syst,2018,30(1):109-122. [53] RAMYACHITRA D,MANIKANDAN P.Imbalanced dataset classification and solutions:A review[J].Int J Comput Bus Res,2014,5(4):1-29. [54] ZHAO J,JIN J,CHEN S,et al.A weighted hybrid ensemble method for classifying imbalanced data[J].Knowl Based Syst,2020,203:106087. [55] JUNG I,J I J,CHO C.EmSM:Ensemble mixed sampling method for classifying imbalanced intrusion detection data[J].Electronics,2022,11(9):1346. [56] YANG G,QICHENG L.An Over Sampling method of unbalanced data based on ant colony clustering[J].IEEE Access,2021,9:130990-130996. [57] CARRILLO-ALARCÓN J C,MORALES-ROSALES L A,RODRÍGUEZ-RÁNGEL H,et al.A metaheuristic optimization approach for parameter estimation in arrhythmia classification from unbalanced data[J].Sensors,2020,20(11):3139. [58] KIPF T N,WELLING M.Semi-supervised classification with graph convolutional networks[J].arXiv preprint arXiv:1609.02907,2016. [59] 仝宗和,袁立宁,王洋.图卷积神经网络理论与应用[J].信息技术与信息化,2020(2):187-192. [60] BUDA M,MAKI A,MAZUROWSKI M A.A systematic study of the class imbalance problem in convolutional neural networks[J].Neural Netw,2018,106:249-259. [61] 向鸿鑫,杨云.不平衡数据挖掘方法综述[J].计算机工程与应用,2019,55(4):1-16. [62] GHORBANI M,KAZI A,BAGHSHAH M S,et al.RA-GCN:Graph convolutional network for disease prediction proble-ms with imbalanced data[J].Med Image Anal,2022,75:102272. [63] WANG Y,ZHAO Y,SHAH N,et al.Imbalanced graph classification via graph-of-graph neural networks[J].arXiv preprint arXiv:2112.00238,2021. [64] DEVARRIYA D,GULATI C,MANSHARAMANI V,et al.Unbalanced breast cancer data classification using novel fitness functions in genetic programming[J].Expert Syst Appl,2020,140:112866. [65] ZHANG J,CHEN L,ABID F.Prediction of breast cancer from imbalance respect using cluster-based undersampling method[J].J Healthc Eng,2019,2019:7294582. [66] TRAN T,LE U,SHI Y.An effective up-sampling approach for breast cancer prediction with imbalanced data:A machine learning model-based comparative analysis[J].PLoS One,2022,17(5):e0269135. [67] 刘梓剑.基于转录组数据不平衡数据的乳腺癌分类预测模型[J].现代计算机,2020(10):81-84. [68] SHEN J,WU J,XU M,et al.A hybrid method to predict postoperative survival of lung cancer using improved SM-OTE and adaptive SVM[J].Comput Math Method M,2021,2021:2213194. [69] ALAM T M,SHAUKAT K,MAHBOOB H,et al.A machine learning approach for identification of malignant mesothelioma etiological factors in an imbalanced dataset[J].Comput J,2022,65(7):1740-1751. [70] ISHAQ A,SADIQ S,UMER M,et al.Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques[J].IEEE Access,2021,9:39707-39716. [71] RATH A,MISHRA D,PANDA G,et al.Heart disease detection using deep learning methods from imbalanced ECG samples[J].Biomed Signal Process Control,2021,68:102820. [72] CHICCO D,ONETO L.An Enhanced random forests approach to predict heart failure from small imbalanced gene expression data[J].IEEE/ACM Trans Comput Biol Bioinform,2021,18(6):2759-2765. [73] WANG M,YAO X,CHEN Y.An Imbalanced-data processing algorithm for the prediction of heart attack in stroke patients[J].IEEE Access,2021,9:25394-25404. [74] KETU S,MISHRA P K.Empirical Analysis of machine learning Algorithms on imbalance electrocardiogram based arrhythmia dataset for heart disease detection[J].Arab J Sci Eng,2021,47(2):1447-1469. [75] LARABI-MARIE-SAINTE S,ABURAHMAH L,ALMOHAINI R,et al.Current techniques for diabetes prediction:Review and case study[J].Appl Sci,2019,9(21):4604. [76] RACHMAWANTO E H,RIJATI N,SUSANTO A,et al.Attribute selection analysis for the random forest classification in unbalanced diabetes dataset[C].2021 International Seminar on Application for Technology of Information and Communication(iSemantic).Semarangin,Indonesia,IEEE,2021:82-86. [77] 张涛.不平衡数据分类研究及在疾病诊断中的应用[J].黄河科技学院学报,2019,21(5):15-22. [78] PERVEEN S,SHAHBAZ M,KESHAVJEE K,et al.Metabolic syndrome and development of diabetes mellitus:Predictive modeling based on machine learning techniques[J].IEEE Access,2018,7:1365-1375. [79] CHO B H,YU H,KIM K W,et al.Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods[J].Artif Intell Med,2008,42(1):37-53. [80] BHATTACHARYA S,MADDIKUNTA P K R,HAKAK S,et al.Antlion resampling based deep neural network model for classification of imbalanced multimodal stroke dataset[J].Multimed Tools Appl,2020:1-25. [81] SANTOS L I,CAMARGOS M O,D’ANGELO M F S V,et al.Decision tree and artificial immune systems for stroke prediction in imbalanced data[J].Expert Syst Appl,2022,191:116221. [82] LIU T,FAN W,WU C.A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset[J].Artif Intell Med,2019,101:101723. [83] YILDIRIM P.Chronic kidney disease prediction on imbalanced data by multilayer perceptron:Chronic kidney disease prediction[C].2017 IEEE 41st annual computer software and applications conference (COMPSAC).Turin,Italy,IEEE,2017,2:193-198. [84] SAJANA T,NARASINGARAO M R.Classification of imbalanced malaria disease using naÏve bayesian algorithm[J].Int J Eng Technol,2018,7(2.7):786-790. |
| [1] | LI Yubo, YE Lixiang, XU Zhaozhao, ZHANG Shuxiang. Research advances in application of artificial intelligence techniques in predicting outcomes in patients with chronic disease multimorbidities [J]. OCCUPATION AND HEALTH, 2026, 42(4): 566-570. |
| [2] | HUO Rongrui, TANG Chunli, LI Xuan, CHEN Peiqin, LUO Huiyu, YOU Xuemei. Predicting the development trend of job burnout in editors of Chinese science and technology journal based on artificial intelligence algorithms [J]. OCCUPATION AND HEALTH, 2024, 40(15): 2064-2070. |
| [3] | GUO Xuening, GAO Yi, LI Jing, ZHANG Min, ZHAO Liang. Study oncost and DRG of discharged patients with herpes zoster in agrade-A tertiary TCM hospital of Tianjin [J]. OCCUPATION AND HEALTH, 2023, 39(19): 2719-2723. |
| [4] |
KOU Lei, XU Chang-nan, DENG Jie.
Construction of college students' physique early warning model based on multi-dimensional data fusion
[J]. OCCUPATION AND HEALTH, 2021, 37(1): 92-96.
|
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||