PENYEIMBANGAN DATA PADA KLASIFIKASI DENGAN SUPPORT VECTOR MACHINE TERHADAP DATA PEMBAYARAN PINJAMAN BANK

Authors

  • Delvin Wang Universitas Sanata Dharma
  • Paulina Heruningsih Prima Rosa Universitas Sanata Dharma

DOI:

https://doi.org/10.34151/prosidingsnast.v1i1.5091

Keywords:

Balancing, Loan Default, Near Miss, Radial Basis Function, Random Over Sampling, Support Vector Machine

Abstract

Loan default in banking can cause losses. Therefore, lenders need to predict the criteria of customers who fail to pay their loans. In this study, a classification model was built to predict customers who fail to pay bank loans by applying the Support Vector Machine algorithm, especially with the Radial Basis Function (RBF) kernel. Because the occurrence of default is not balanced with the occurrence of smooth payments, a data balancing process was carried out. This study also compared the effect of data balancing methods using Random Over Sampling and Near Miss techniques on the performance of the SVM algorithm. The dataset used is the Loan Default Prediction Dataset taken from the kaggle site, which consists of 255,347 records and 18 attributes. The results showed that the SVM model trained without data balancing had the highest accuracy of 88.49%, but with a recall of only 10% and an F1-score of 17%. After using ROS, the model accuracy decreased slightly to 83.52%, but the recall increased significantly to 94% and the F1-score to 89%. With Near Miss, the model accuracy drops further by 65.29%, but produces better precision and recall compared to without balancing. It can be concluded that balancing with ROS provides the best performance in terms of the balance between precision and recall, as seen from the highest F1-score value among the three methods.

References

Alam, T. M., Shaukat, K., Hameed, I. A., Luo, S., Sarwar, M. U., Shabbir, S., Li, J., & Khushi, M. (2020). An investigation of credit card default prediction in the imbalanced datasets. IEEE Access, 8, 201173–201198.

Alamri, M., & Ykhlef, M. (2022). Survey of credit card anomaly and fraud detection using sampling techniques. Electronics, 11(23), 4003.

Botchey, F. E., Qin, Z., & Hughes-Lartey, K. (2020). Mobile money fraud prediction—a cross-case analysis on the efficiency of support vector machines, gradient boosted decision trees, and naïve bayes algorithms. Information, 11(8), 383.

Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.

Dina, A. S., Siddique, A. B., & Manivannan, D. (2022). Effect of balancing data using synthetic data on the performance of machine learning classifiers for intrusion detection in computer networks. IEEE Access, 10, 96731–96747.

Ghorbani, R., & Ghousi, R. (2020). Comparing different resampling methods in predicting students’ performance using machine learning techniques. IEEE Access, 8, 67899–67911.

Hayder, I. M., Al Ali, G. A. N., & Younis, H. A. (2023). Predicting reaction based on customer’s transaction using machine learning approaches. International Journal of Electrical and Computer Engineering, 13(1), 1086.

Mallidi, M. K. R., & Zagabathuni, Y. (2021). Analysis of Credit Card Fraud detection using Machine Learning models on balanced and imbalanced datasets. International Journal of Emerging Trends in Engineering Research, 9(7).

Najadat, H., Altiti, O., Aqoulehm, A. A., & Younes, M. (2020). Credit card fraud detection based on machine and deep learning. 11th International Conference on Information and Communication Systems (ICICS), 204–208.

Nalatissifa, H., Gata, W., Diantika, S., & Nisa, K. (2021). Perbandingan Kinerja Algoritma Klasifikasi Naive Bayes, Support Vector Machine (SVM), dan Random Forest untuk Prediksi Ketidakhadiran di Tempat Kerja. Jurnal Informatika Universitas Pamulang, 5(4), 578–584.

Pahlevi, O., Amrin, A., & Handrianto, Y. (2023). Implementasi Algoritma Klasifikasi Random Forest Untuk Penilaian Kelayakan Kredit. Jurnal Infortech, 5(1), 71–76.

Sembiring, W. Y. M., Maulita, Y., & Ramadani, S. (2022). Pemamfaatan Metode Clustering Pada Nasabah Peminjaman Modal (Studi Kasus: PT. Faderal International Finance Binjai). Jurnal Sistem Informasi Kaputama (JSIK), 6(2), 346–356.

Singh, A., Ranjan, R. K., & Tiwari, A. (2022). Credit card fraud detection under extreme imbalanced data: a comparative study of data-level algorithms. Journal of Experimental & Theoretical Artificial Intelligence, 34(4), 571–598.

Tamami, M. K., & Kharisudin, I. (2023). Komparasi Metode Support Vector Machine dan Naive Bayes Classifier untuk Pemodelan Kualitas Pengajuan Kredit. Indonesian Journal of Mathematics and Natural Sciences, 46(1), 38–44.

Wibowo, P., & Fatichah, C. (2021). An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 7(1), 63–71.

Downloads

Published

23-11-2024

Issue

Section

Articles