Classification of Students Academic Achievement Using a Random Forest Algorithm Based on Educational Data Mining

Floricytha Sihombing; Sherlyta Sherlyta; Marihot Pandapotan Parhusip; Kevin Frans Samuel Gultom

Classification of Students Academic Achievement Using a Random Forest Algorithm Based on Educational Data Mining

Floricytha Sihombing, Sherlyta Sherlyta, Marihot Pandapotan Parhusip, Kevin Frans Samuel Gultom

Abstract

The categorization of students' academic success presents a significant challenge owing to the effects of various academic, behavioral, and social elements that interact intricately. Precisely determining the categories of student success is crucial for facilitating educational decision-making and early intervention methods. This research sought to create and assess a model for classifying student academic performance utilizing the Random Forest technique within a framework of Educational Data Mining. A supervised machine learning approach was utilized, employing the Student Performance dataset, which comprises 2,392 records of students along with 15 attributes concerning demographic details, study patterns, parental involvement, participation in extracurricular activities, and academic results. The suggested methodology included steps such as data preprocessing, exploratory data analysis, feature selection, splitting the dataset at an 80:20 ratio, training the model, and assessing performance through accuracy, precision, recall, F1-score, analysis of the confusion matrix, evaluation of feature importance, and five-fold cross-validation. The results from the experiments indicated that the Random Forest model reached an accuracy of 90.81% with the testing dataset and exhibited robust classification results across five distinct academic achievement categories. The model performed best in GradeClass 4 and GradeClass 2, whereas lesser performance was noted in the minority classes, likely due to class imbalance. Additionally, the analysis revealed that factors related to study habits and student engagement significantly influenced the classification results. The outcomes suggested that Random Forest is an effective method for classifying multi-class academic performance and could be a dependable resource for informing data-driven educational strategies, student monitoring, and targeted academic interventions.

Full Text:

PDF

References

S. Batool, J. Rashid, M. W. Nisar, J. Kim, H.-Y. Kwon, and A. Hussain, “Educational data mining to predict students’ academic performance: A survey study,” Educ. Inf. Technol. (Dordr)., vol. 28, no. 1, pp. 905–971, 2023.

M. Yağcı, “Educational data mining: prediction of students’ academic performance using machine learning algorithms,” Smart Learning Environments, vol. 9, no. 1, p. 11, 2022.

W. Xiao, P. Ji, and J. Hu, “A survey on educational data mining methods used for predicting students’ performance,” Engineering Reports, vol. 4, no. 5, p. e12482, 2022.

S. Sarker, M. K. Paul, S. T. H. Thasin, and M. A. M. Hasan, “Analyzing students’ academic performance using educational data mining,” Computers and Education: Artificial Intelligence, vol. 7, p. 100263, 2024.

G. Feng, M. Fan, and Y. Chen, “Analysis and prediction of students’ academic performance based on educational data mining,” IEEE Access, vol. 10, pp. 19558–19571, 2022.

M. Kumar, N. Singh, J. Wadhwa, P. Singh, G. Kumar, and A. Qtaishat, “Utilizing random forest and XGBoost data mining algorithms for anticipating students’ academic performance,” International Journal of Modern Education and Computer Science, vol. 16, no. 2, pp. 29–44, 2024.

E. Ahmed, “Student performance prediction using machine learning algorithms,” Applied computational intelligence and soft computing, vol. 2024, no. 1, p. 4067721, 2024.

P. K. Kumah, S. T. Baidoo, and H. Yusif, “Investigating the role of parental involvement in enhancing academic performance of tertiary students: evidence from the Kwame Nkrumah University of Science and Technology, Kumasi,” Cogent Education, vol. 11, no. 1, p. 2361997, 2024.

A. A. P. Sari and A. Buchori, “Penerapan Model Problem Based Learning Untuk Meningkatkan Kemampuan Pemecahan Masalah Matematis Siswa SMA Pada Materi SPLTV,” Supermat: Jurnal Pendidikan Matematika, vol. 8, no. 1, pp. 28–43, 2024.

Y. P. C. Dewi, A. Anas, and L. Lutfiyah, “The Influence of PBL Learning Model on High School Students’ Learning Outcomes in System of Linear Equations in Three Variables Material,” ETDC: Indonesian Journal of Research and Educational Review, vol. 4, no. 3, pp. 778–788, 2025.

S. Tosun and D. B. Kalaycıoğlu, “Data mining approach for prediction of academic success in open and distance education,” Journal of Educational Technology and Online Learning, vol. 7, no. 2, pp. 168–176, 2024.

R. Tertulino and R. Almeida, “A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata,” arXiv preprint arXiv:2510.22266, 2025.

S. D. A. Bujang et al., “Multiclass prediction model for student grade prediction using machine learning,” Ieee Access, vol. 9, pp. 95608–95621, 2021.

A. Villar and C. R. V. de Andrade, “Supervised machine learning algorithms for predicting student dropout and academic success: a comparative study,” Discover Artificial Intelligence, vol. 4, no. 1, p. 2, 2024.

J. Wang and Y. Yu, “Machine learning approach to student performance prediction of online learning,” PLoS One, vol. 20, no. 1, p. e0299018, 2025.

A. Palanivinayagam and R. Damaševičius, “Effective handling of missing values in datasets for classification using machine learning methods,” Information, vol. 14, no. 2, p. 92, 2023.

R. M. Kalita and S. Baruah, “Data Preprocessing and Missing Data Handling for Predicting High School Academic Outcomes,” INTERNATIONAL JOURNAL OF ADVANCES IN SIGNAL AND IMAGE SCIENCES, pp. 1762–1768, 2026.

M. Sivakumar, S. Parthasarathy, and T. Padmapriya, “Trade-off between training and testing ratio in machine learning for medical image processing,” PeerJ Comput. Sci., vol. 10, p. e2245, 2024.

O. S. Kalange, R. S. Kahat, A. S. Kale, T. R. Kale, and P. S. Joglekar, “Implementation of Various Machine Learning Algorithms for Traffic Sign Detection and Recognition,” 2022.

E. A. Yassine, K. Mohammed, and J. Youness, “Mathematical Modeling of Monetary Poverty by K-Nearest Neighbors Algorithm,” in The International Conference on Artificial Intelligence and Smart Environment, Springer, 2024, pp. 190–195.

K. M. Sujon, R. Hassan, K. Choi, and M. A. Samad, “Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI for evaluating business predictive models,” J. Big Data, vol. 12, no. 1, p. 268, 2025.

V. W. Lumumba, D. Kiprotich, M. Lemasulani Mpaine, N. Grace Makena, and M. Daniel Kavita, “Comparative analysis of cross-validation techniques: LOOCV, K-folds cross-validation, and repeated K-folds cross-validation in machine learning models,” K-folds Cross-Validation, and Repeated K-folds Cross-Validation in Machine Learning Models (June 01, 2024), 2024.

J. Sadaiyandi, P. Arumugam, A. K. Sangaiah, and C. Zhang, “Stratified sampling-based deep learning approach to increase prediction accuracy of unbalanced dataset,” Electronics (Basel)., vol. 12, no. 21, p. 4423, 2023.

Z.-H. Geng, Y. Zhu, P.-Y. Fu, Y.-F. Qu, Q.-L. Li, and P.-H. Zhou, “A comparative analysis of prognostic regression models and machine learning algorithms in surgical decision-making of cardial submucosal tumors,” Gastroenterology & Endoscopy, vol. 2, no. 1, pp. 19–24, 2024.

M. Thahiruddin, S. Khotijah, A. El Farras, and A. I. Hasan, “A Comparative Analysis of Deep Learning Architectures for The Classification of Madura Sliced Tobacco,” Jurnal Teknologi dan Open Source, vol. 9, no. 1, pp. 37–47, 2026.

Article Metrics

Abstract view : 31 times
PDF – 5 times

Refbacks

There are currently no refbacks.

JURNAL Excellent:International Journal of computational Intelligence and Sustainable Innovation by Universitas Dharmawangsa Medan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Username
Password
Remember me