Russian Federation
The paper proposes an ML framework for classifying forest fires based on the synthesis of big data. ERA5 Copernicus data, historical data on fires of the Federal Forestry Agency and geodata of Siberia were used. Spatiotemporal filtering and resampling algorithms are applied. The model effectively detects fires, minimizing false alarms, and can be integrated into early warning systems for Russian forests.
forest fire, classification, machine learning, teacher training, dataset, big data, random forest, XGBoost, LightGBM, SMOTE, NearMiss, SMOTE-ENN
1. Rosleshoz. Otkrytye dannye o lesnyh pozharah za 2000–2018 gg. // https://rosleshoz.gov.ru. – Rezhim dostupa: 15.03.2024. – 120 s.
2. Tyukavina, A. Global trends of forest loss due to fire from 2001 to 2019 / A. Tyukavina, P. Potapov, M.C. Hansen // Frontiers in Remote Sensing. – 2022. – Vol. 3. – P. 825190. – 15 s.
3. Ghate, S.N. Forest wildfire detection and forecasting utilizing machine learning and image processing / S.N. Ghate, P. Sapkale, M. Mukhedkar // 2023 International Conference for Advancement in Technology (ICONAT). – IEEE, 2023. – P. 1–8. – 8 s.
4. Hersbach, H. ERA5 hourly data on single levels from 1940 to present / H. Hersbach et al. // Copernicus Climate Change Service (C3S) Climate Data Store (CDS). – 2023. – 280 s. – URL: https://cds.climate.copernicus.eu
5. Rosstat. Standartnaya klassifikaciya sub'ektov Rossiyskoy Federacii. // https://rosstat.gov.ru. – Rezhim dostupa: 10.02.2024. – 45 s.
6. Rosreestr. GIS-servis «Gidrografiya» (2022). // https://pkk.rosreestr.ru. – Rezhim dostupa: 12.02.2024. – 30 s.
7. Kaur, P. Data integration framework with multi-source big data for enhanced forest fire prediction / P. Kaur et al. // Manuscript under review. – 2023. – 25 s.
8. GOST R 57976-2017. Metodika ocenki pozharnoy opasnosti lesov. – M.: Izd-vo standartov, 2017. – 32 s.
9. Lemaître, G. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning / G. Lemaître, F. Nogueira, C.K. Aridas // Journal of Machine Learning Research. – 2017. – Vol. 18(17). – P. 1–5. – 5 s.
10. Chawla, N.V. SMOTE: Synthetic minority over-sampling technique / N.V. Chawla et al. // Journal of Artificial Intelligence Research. – 2002. – Vol. 16. – P. 321–357. – 37 s.
11. Batista, G.E. A study of the behavior of several methods for balancing machine learning training data / G.E. Batista et al. // ACM SIGKDD Explorations Newsletter. – 2004. – Vol. 6(1). – P. 20–29. – 10 s.
12. Rodriguez-Galiano, V.F. An assessment of the effectiveness of a random forest classifier for land-cover classification / V.F. Rodriguez-Galiano et al. // ISPRS Journal of Photogrammetry and Remote Sensing. – 2012. – Vol. 67. – P. 93–104. – 12 s.
13. Chen, T. XGBoost: A scalable tree boosting system / T. Chen, C. Guestrin // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. – 2016. – P. 785–794. – 10 s.
14. Ke, G. LightGBM: A highly efficient gradient boosting decision tree / G. Ke et al. // Advances in Neural Information Processing Systems. – 2017. – Vol. 30. – P. 3146–3154. – 9 s.



