Unbalanced Big Data Classification Algorithm Based on Whale Optimization and Deep Learning

SUN Er-hua; HU Yun-bing

doi:10.13718/j.cnki.xsxb.2021.05.019

2021 Issue 5

Article Contents

Previous Article Next Article

SUN Er-hua, HU Yun-bing. Unbalanced Big Data Classification Algorithm Based on Whale Optimization and Deep Learning[J]. Journal of Southwest China Normal University(Natural Science Edition), 2021, 46(5): 127-133. doi: 10.13718/j.cnki.xsxb.2021.05.019

Citation:

SUN Er-hua, HU Yun-bing. Unbalanced Big Data Classification Algorithm Based on Whale Optimization and Deep Learning[J]. Journal of Southwest China Normal University(Natural Science Edition), 2021, 46(5): 127-133. doi: 10.13718/j.cnki.xsxb.2021.05.019

Unbalanced Big Data Classification Algorithm Based on Whale Optimization and Deep Learning

SUN Er-hua¹,
HU Yun-bing²

1.
School of Information Engineering, Chongqing Real Estate College, Chongqing 401331, China
2.
College of Information Science and Technology, Xiamen University, Xiamen Fujian 361005, China

More Information

Received Date: 16/03/2020
Available Online: 20/05/2021
MSC: TP393

Abstract

Aiming at the problems of low classification accuracy, which is easy to fall into local optimal state in the current unbalanced data classification algorithm, an unbalanced big data classification algorithm based on whale optimization and deep learning has been proposed. The algorithm consists of three parts: feature selection, preprocessing and classification. Firstly, in order to improve the classification accuracy, the whale optimization algorithm (WOA) has been used to find the optimal feature subset in the unbalanced data to eliminate the irrelevant and redundant features. Secondly, the locality sensitive hashing synthetic minority oversampling technique (LSH-SMOTE) has been used to preprocess the dataset to solve the class imbalance problem. And, finally, the bidirectional recurrent neural networks (BRNN) optimized based on WOA algorithm been used to classify the preprocessed dataset. The experimental results show that the proposed algorithm can effectively solve the classification problem of unbalanced data sets. Compared with other algorithms, it has obvious advantages in classification accuracy and local optimal avoidance rate.
- unbalanced big data classification,
- whale optimization algorithm,
- deep learning,
- synthetic minority oversampling technique

References

[1]	ARIYALURAN HABEEB R A, NASARUDDIN F, GANI A, et al. Real-time Big Data Processing for Anomaly Detection: a Survey[J]. International Journal of Information Management, 2019, 45: 289-307. doi: 10.1016/j.ijinfomgt.2018.08.006 CrossRef Google Scholar
[2]	HASANIN T, KHOSHGOFTAAR T M, LEEVY J L, et al. Examining Characteristics of Predictive Models with Imbalanced Big Data[J]. Journal of Big Data, 2019, 6(1): 1-21. doi: 10.1186/s40537-018-0162-3 CrossRef Google Scholar
[3]	HASANIN T, KHOSHGOFTAAR T M, LEEVY J L, et al. Severely Imbalanced Big Data Challenges: Investigating Data Sampling Approaches[J]. Journal of Big Data, 2019, 6(1): 1-21. doi: 10.1186/s40537-018-0162-3 CrossRef Google Scholar
[4]	ABDEL-HAMID N B, ELGHAMRAWY S, DESOUKY A E, et al. A Dynamic Spark-based Classification Framework for Imbalanced Big Data[J]. Journal of Grid Computing, 2018, 16(4): 607-626. doi: 10.1007/s10723-018-9465-z CrossRef Google Scholar
[5]	JING X Y, ZHANG X Y, ZHU X K, et al. Multiset Feature Learning for Highly Imbalanced Data Classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 29: 1-2. Google Scholar
[6]	LU C B, KE H F, ZHANG G Y, et al. An Improved Weighted Extreme Learning Machine for Imbalanced Data Classification[J]. Memetic Computing, 2019, 11(1): 27-34. doi: 10.1007/s12293-017-0236-3 CrossRef Google Scholar
[7]	ZHANG X G, WANG D X, ZHOU Y C, et al. Kernel Modified Optimal Margin Distribution Machine for Imbalanced Data Classification[J]. Pattern Recognition Letters, 2019, 125: 325-332. doi: 10.1016/j.patrec.2019.05.005 CrossRef Google Scholar
[8]	LI K W, ZHOU G Y, ZHAI J N, et al. Improved PSO_AdaBoost Ensemble Algorithm for Imbalanced Data[J]. Sensors, 2019, 19(6): 1476. doi: 10.3390/s19061476 CrossRef Google Scholar
[9]	闫旭, 叶春明, 姚远远. 量子鲸鱼优化算法求解作业车间调度问题[J]. 计算机应用研究, 2019, 36(4): 975-979. Google Scholar
[10]	DIALLO M, XIONG S W, COULIBALY M N, et al. Synthetic Minority Oversampling Technique in Stages for Unbalanced Climate and Rice Dataset: The Office Du Niger Case Study [C]//Proceedings of the 3rd International Conference on Telecommunications and Communication Engineering. Tokyo: ACM, 2019. Google Scholar
[11]	HASSIB E M, ELDESOKEY A E, LABIB L M, et al. LSH-SMOTE: A Modified SMOTE Algorithm for Imbalanced Data-Sets[J]. Ciência Técnica Vitivinícola, 2018, 33(4): 50-65. Google Scholar
[12]	SHASHIKUMAR S P, SHAH A J, CLIFFORD G D, et al. Detection of Paroxysmal Atrial Fibrillation Using Attention-based Bidirectional Recurrent Neural Networks[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Beijing: ACM, 2018. Google Scholar
[13]	PAING M P, CHOOMCHUAY S. Improved Random Forest (RF) Classifier for Imbalanced Classification of Lung Nodules[C]//2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST). Phuket: IEEE, 2018. Google Scholar
[14]	NURLAILY D, IRHAMAH, PURNAMI S W, et al. Support Vector Machine for Imbalanced Microarray Dataset Classification Using Ant Colony Optimization and Genetic Algorithm[C]//The 2nd International Conference on Science, Mathematics, Environment, and Education. Chongqing: AIP, 2019. Google Scholar
[15]	TSAI C F, LIN W C, HU Y H, et al. Under-sampling Class Imbalanced Datasets by Combining Clustering Analysis and Instance Selection[J]. Information Sciences, 2019, 477: 47-54. doi: 10.1016/j.ins.2018.10.029 CrossRef Google Scholar
[16]	HASANIN T, KHOSHGOFTAAR T M, LEEVY J, et al. Investigating Random Undersampling and Feature Selection on Bioinformatics Big Data[C]//2019 IEEE Fifth International Conference on Big Data Computing Service and Applications (BigDataService). San Francisco: IEEE, 2019. Google Scholar

Access History

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Tables(3)

Export Citation

PDF

XML

Article Metrics

Article views(1450) PDF downloads(97) Cited by(0)

Name
	Name cannot be empty!
E-mail
	Mailbox cannot be empty! Mailbox cannot be empty!
Telephone
	Mobile number cannot be empty! Please enter a valid mobile number!
Title

Content
Verification Code

Message Board

Unbalanced Big Data Classification Algorithm Based on Whale Optimization and Deep Learning

Abstract

References

Access History

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Access History

Other Articles By Authors