Method of Local Anomalies Identification for Environmental Indicators Values Using Half-Wave Decomposition
DOI:
https://doi.org/10.31649/1997-9266-2024-172-1-88-100Keywords:
time series analysis, simulation, machine learning, time series anomalies, air quality, time series decomposition, EcoCityAbstract
In the era of mass digitalization of all existing spheres of human activity, the amount of data is constantly growing and it is crucial to be able to work with such volume of data for the solution of various problems. One of the most common data structures is a time series — a sequence of data points, collected over some period of time, usually in chronological order. The time series comprise various financial indicators, environmental monitoring data, medical information, etc. Wide range of application areas makes the problem of time series analysis important and relevant. The quality of the time series forecast greatly depends on the quality of the performed analysis, which may include data standardization, detection of significant indicators, correlation analysis, etc. Anomaly detection occupies very important place among these steps. Anomalies are data points that differ in some way from other values in the dataset or violate certain data behavior patterns. The presence of similar records greatly affects the ability of machine learning models make accurate predictions, is why it is necessary to have the possibility for the identification of these anomalies.
New method of local anomalies identification of the environment state indices using half-wave decomposition has been developed. Main idea of the method is to decompose the time series into half-waves, using trend points where the fall changes growth or vice versa and split the series into fragments. Each fragment is analyzed separately and is checked for anomalies by combining numerous methods. The accuracy of the methods is verified, applying the expert method. Main steps of the proposed method are described and the example of the method usage on real air quality monitoring data obtained from one of the stations of the EcoCity public monitoring network within the international program “Clean Air for Ukraine” is given.
The proposed method was implemented and tested on the Kaggle platform’s notebook. The result of the anomaly detection was used for the construction of the Facebook Prophet model and the accuracy of the time series approximation was compared with the results of the Prophet model operation with the default parameters. Tests have shown 11 % decrease of approximation error of time series for RMSE metric and 8 % decrease for MAE metric. This result confirms the effectiveness of the method.
References
Б. І. Мокін, О. Б. Мокін, і В. Б. Мокін, Методологія та організація наукових досліджень, підруч., вид.3-е, змін. та доп. Вінниця, Україна: ВНТУ, 2023, 230 с.
Terence C. Mills, Chapter 3, ARMA Models for Stationary Time Series, Terence C. Mills. Ed, Applied Time Series Analysis, Academic Press, 2019, pp. 31-56. ISBN 9780128131176. https://doi.org/10.1016/B978-0-12-813117-6.00003-X .
Omar Salima , Ngadi Md, Jebur Hamid, and Benqdara Salima, “Machine Learning Techniques for Anomaly Detection: An Overview,” International Journal of Computer Applications, 79, 2013, https://doi.org/10.5120/13715-1478 .
В. Б. Мокін, О. В. Слободянюк, О. М. Давидюк, і Д. О. Шмундяк, «Інформаційна технологія пошуку можливих джерел підвищеного забруднення річки з використанням моделі Prophet,» Вісник Вінницького політехнічного інституту, № 4, с. 15-24, Верес. 2020. https://doi.org/10.31649/1997-9266-2020-151-4-15-24 .
О. Б. Мокін, В. Б. Мокін, і Б. І. Мокін, «Алгоритм методу ідентифікації моделі авторегресії — ковзного середнього, який узагальнює методику Юла–Уокера, та його програмна Python-реалізація,» Вісник Вінницького політехнічного інституту, № 4, с. 41-55, 2022. https://doi.org/10.31649/1997-9266-2022-163-4-41-55 .
R. K. Pearson, et al., “Generalized Hampel Filters,” EURASIP J. Adv. Signal Process, 87, 2016. https://doi.org/10.1186/s13634-016-0383-6 .
Julien Lesouple, Cédric Baudoin, Marc Spigai, and Jean-Yves Tourneret, “Generalized isolation forest for anomaly detection,” Pattern Recognition Letters, vol. 149, 2021, pp, 109-119. ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2021.05.022 .
Yumin Chen, Duoqian Miao, and Hongyun Zhang, “Neighborhood outlier detection,” Expert Systems with Applications, vol. 37, issue 12, pp. 8745-8749, 2010. ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2010.06.040 .
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. SIGMOD Rec. 29, no. 2, pp. 93-104, June 2000. https://doi.org/10.1145/335191.335388 .
Vieira, Rafael G.; Leone Filho, Marcos A.; Semolini, Robinson, “An Enhanced Seasonal-Hybrid ESD Technique for Robust Anomaly Detection on Time Series,” in Simpósio Brasileiro De Redes De Computadores E Sistemas Distribuídos (SBRC), 36, 2018, Campos do Jordão. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018. pp. 281-294. ISSN 2177-9384. https://doi.org/10.5753/sbrc.2018.2422 .
А. В. Лосенко, «Інформаційна технологія прогнозування часового ряду кількості хворих на коронавірус на основі моделі Facebook Prophet,» Вісник Вінницького політехнічного інституту, вип. 5, с. 50-59, 2023. https://doi.org/10.31649/1997-9266-2023-170-5-50-59 .
В. Б. Мокін, А. В. Лосенко, і А. Р. Ящолт, «Інформаційна технологія аналізу та прогнозування кількості нових випадків хвороби на коронавірус SARS-COV-2 в Україні на основі моделі Prophet,» Вісник Вінницького політехнічного інституту, № 5, с. 71-83, 2020. https://doi.org/10.31649/1997-9266-2020-152-5-71-83 .
В. Б. Мокін, А. В. Лосенко, і А. Р. Ящолт, «Інформаційна технологія аналізу та прогнозування багатохвильової кількості нових випадків захворювань на коронавірус COVID-19 на основі моделі Prophet», Вісник Вінницького політехнічного інституту, № 6, с. 65-75, 2020. https://doi.org/10.31649/1997-9266-2020-153-6-65-75 .
Д. О. Шмундяк, і В. Б. Мокін, «Метод ідентифікації параметрів гармонік та аномалій періодичного часового ряду на основі адаптивної декомпозиції,» Вісник Вінницького політехнічного інституту, № 6, с. 46-56, 2023. https://doi.org/10.31649/1997-9266-2023-171-6-46-56 .
Dmytro Shmundiak, and Vitalii Mokin, “Adaptive decomposition for harmonics and anomalies,” Kaggle Notebook. [Electronic resource]. Available: https://www.kaggle.com/code/dimashmundiak/adaptive-decomposition-for-harmonics-and-anomalies . Accessed:20.12.2023.
Vitalii Mokin, and Arsen Losenko, “COVID-19 Ukraine daily cases – EDA,” Kaggle Notebook. [Electronic resource]. Available: https://www.kaggle.com/code/vbmokin/covid-19-ukraine-daily-cases-eda . Accessed:12.10.2023.
Sklearn. API Reference. [Electronic resource]. Available: https://scikit-learn.org/stable/modules/classes.html. Accessed: 07.12.2023.
Downloads
-
PDF (Українська)
Downloads: 40
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).