PERFORMANCE EVALUATION ALGORITMA C 4.5 PADA KLASIFIKASI DATA

Zelvi Gustiana

doi:10.46576/djtechno.v5i2.4654

PERFORMANCE EVALUATION ALGORITMA C 4.5 PADA KLASIFIKASI DATA

Zelvi Gustiana

Abstract

Algoritma C4.5 merupakan salah satu algoritma yang populer digunakan dalam pengambilan keputusan dan klasifikasi data. Artikel ini mengevaluasi performa algoritma C4.5 dalam berbagai kondisi dataset, termasuk dataset dengan atribut numerik dan kategorikal, dataset dengan missing values, serta dataset yang tidak seimbang. Penelitian ini menggunakan beberapa dataset dari UCI Machine Learning Repository seperti Iris, Adult, Breast Cancer, dan Wine. Proses evaluasi meliputi preprocessing data, pembagian data menjadi set pelatihan dan pengujian, implementasi algoritma C4.5, serta evaluasi performa menggunakan metrik seperti akurasi, presisi, recall, dan F-measure. Hasil penelitian menunjukkan bahwa Algoritma C4.5 mampu memberikan performa yang baik dalam berbagai kondisi dataset, namun performanya dapat dipengaruhi oleh ketidakseimbangan data dan jumlah missing values. Selain itu, penelitian ini juga mengevaluasi pengaruh parameter-parameter seperti nilai minimum gain ratio dan ukuran minimum untuk simpul daun terhadap performa algoritma. Temuan ini memberikan wawasan yang berguna bagi para peneliti dan praktisi dalam mengoptimalkan penggunaan Algoritma C4.5 untuk berbagai aplikasi klasifikasi data.

Keywords

Algoritma C4.5, Klasifikasi, Decision Tree, Performa Algoritma, Evaluasi

Full Text:

PDF

References

Chen, T., & Guestrin, C. (2020). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.

Han, J., Pei, J., & Kamber, M. (2020). Data mining: concepts and techniques. Elsevier.

Kotsiantis, S. B. (2020). Decision trees: A recent overview. Artificial Intelligence Review, 39(4), 261-283.

Li, Y., Fu, X., Du, H., & Li, Y. (2021). Improved C4.5 algorithm for the analysis of breast cancer diagnosis. Journal of Medical Systems, 45(1), 1-10.

Li, Y., Zhang, H., & Liu, X. (2020). Enhanced decision tree algorithm for big data analysis. Journal of Big Data, 7(1), 1-21.

Liu, H., Hussain, F., Tan, C. L., & Dash, M. (2020). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4), 393-423.

Mingers, J. (2021). An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3(4), 319-342.

Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221-234.

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.

Rokach, L., & Maimon, O. (2020). Data mining with decision trees: theory and applications. World Scientific.

Song, Y., Liu, Y., & Wang, G. (2020). A review of decision tree pruning methods. Artificial Intelligence Review, 53(1), 323-344.

Tan, C. L., & Zhang, H. (2020). Decision tree algorithms for stream data classification: a survey. Journal of Software: Evolution and Process, 32(6), e2253.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2020). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

Wu, Q., Zhu, L., & Zeng, Z. (2021). An improved C4.5 decision tree algorithm based on feature selection and clustering. Journal of Intelligent & Fuzzy Systems, 40(1), 1533-1544.

Zhang, X., Wang, S., & Li, H. (2020). A novel decision tree algorithm for imbalanced data classification. Journal of Big Data, 7(1), 1-18.

Zhao, H., Zhu, X., & Liu, Y. (2020). Handling missing data in decision tree classifiers: A survey. Journal of Artificial Intelligence Research, 68, 239-270.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.

Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117.

Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

Van Der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579-2605.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.

Provost, F., & Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O'Reilly Media, Inc.

Kelleher, J. D., Namee, B. M., & D'Arcy, A. (2015). Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies. MIT press.

Zhao, H., Zhu, X., & Liu, Y. (2020). Handling missing data in decision tree classifiers: A survey. Journal of Artificial Intelligence Research, 68, 239-270.

DOI: https://doi.org/10.46576/djtechno.v5i2.4654