Comparative Analysis of Machine Learning Models for Predicting Employee Burnout Problem
DOI:
https://doi.org/10.31649/1997-9266-2023-170-5-25-31Keywords:
machine learning, Bayesian models, burnout syndrome, small data setsAbstract
The article explores the problem of predicting the emotional burnout syndrome of employees , which is relevant due to the high level of stress in the modern world. The study uses the publicly available dataset "Are your employees burning out" from the competition on the HackerEarth platform. A comparative analysis of three traditional machine learning models based on classical machine learning approaches (linear regression, Random Forest, XGBoost) and three Bayesian models (Bayesian linear regression, varying intercept model, varying intercept and slope model) was carried out in the study. The change in the quality of the models is studied for different sizes of data sets, ranging from 13,000 (i.e., the full training set, which accounted for 70% of all data) to 25 observations, including testing on the full data set. It is demonstrated that XGBoost is the best model for large data sets. However, when the training sample size is reduced to less than 5000 observations, the validation performance of the XGBoost model becomes significantly less accurate and becomes lower than the corresponding metrics for Bayesian models. After optimizing such hyperparameters as tree depth, number of trees, learning rate, and others, the quality of XGBoost improved significantly, but did not make it stable enough to demonstrate better results than Bayesian models on samples of less than 600 observations. Bayesian models, on the other hand, in addition to being better on small samples, also allow estimating the "confidence" in the predicted values, which is an important feature for a specific tasks. However, they also have a significant disadvantage in the form of much greater computational complexity, which leads to an increase in training time. In conclusion, results of this study emphasize the importance of careful selection of a model that considers the peculiarities of the amount and quality of available data. Bayesian models have proven to be highly effective with a small amount of data, due to their ability to consider uncertainty and insufficient information.
References
D. A. J. Salvagioni, F. N. Melanda, A. E. Mesas, A. D. González, F. L. Gabani and S. M. de Andrade, “Physical, psychological and occupational consequences of job burnout: A systematic review of prospective studies,” PLOS ONE, no. 12, pp. e0185781, October 2017.
М. С. & І. С., “The Role of the Stress in Development of the Diseases: Array,” Precarpathian Bulletin of the Shevchenko Scientific Society Pulse, pp. 25-32, October 2019.
М. Гурська, «Я вигорів і боюсь звільнення — що робити? Топові IT-компанії відповіли, як вони реагують на вигоряння у працівників та кандидатів,» DOU.ua, 15.11.2022. [Електронний ресурс]. Режим доступу: https://dou.ua/lenta/articles/emotional-burnout-at-work . Дата звернення: 20.09.2023.
“Hacker Earth Machine Learning Challenge: Are your employees burning out?” HackerEarth, 21.10.2021. [Online]. Available: https://www.hackerearth.com/challenges/new/competitive/hackerearth-machine-learning-challenge-predict-burnout-rate. Accessed on: 20.09.2023.
L. Breiman, “Random Forests,” Machine Learning, no. 45, pp. 5-32, 2001.
T. Chen, and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” в Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016.
O. Abril-Pla, et al. “PyMC: a modern, and comprehensive probabilistic programming framework in Python,” PeerJ Computer Science, no. 9, pp. e1516, September 2023.
A. Gelma, and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2006.
Downloads
-
PDF (Українська)
Downloads: 122
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).