SYNTHESIS OF BIG DATA AND CORRECTION OF CLASS IMBALANCE TO IMPROVE THE WILDFIRE CLASSIFICATION MODEL

Zarevich Anton Ivanovich; Makarenko F.; Poluektov Aleksandr Vladimirovich

doi:doi:10.58168/Forestry2025_773-785

Home / Conferences / FORESTRY IN THE CONTEXT OF GLOBAL CHALLENGES: NEW PARADIGMS OF SUSTAINABLE DEVELOPMENT. FORESTRY - 2025 / FORESTRY IN THE CONTEXT OF GLOBAL CHALLENGES: NEW PARADIGMS OF SUSTAINABLE DEVELOPMENT. FORESTRY - 2025 : MATERIALS OF THE INTERNATIONAL FORESTRY FORUM, DEDICATED TO THE CELEBRATION OF THE 95TH NNIVERSARY OF VSUFT, Voronezh, September 30th, 2025

SYNTHESIS OF BIG DATA AND CORRECTION OF CLASS IMBALANCE TO IMPROVE THE WILDFIRE CLASSIFICATION MODEL

Submit manuscript

To cite

SYNTHESIS OF BIG DATA AND CORRECTION OF CLASS IMBALANCE TO IMPROVE THE WILDFIRE CLASSIFICATION MODEL

Section: SECTION 5. FORESTRY AND FOREST MANAGEMENT. DIGITAL TECHNOLOGIES AND INNOVATIONS

Proceedings: FORESTRY IN THE CONTEXT OF GLOBAL CHALLENGES: NEW PARADIGMS OF SUSTAINABLE DEVELOPMENT. FORESTRY - 2025 : MATERIALS OF THE INTERNATIONAL FORESTRY FORUM, DEDICATED TO THE CELEBRATION OF THE 95TH NNIVERSARY OF VSUFT, VORONEZH, SEPTEMBER 30TH, 2025

UDC 004.9

Zarevich Anton Ivanovich ¹

Makarenko F. ²

Poluektov Aleksandr Vladimirovich ³

Author and publication information

Authors:

1. Voronezh State University of Forestry and Technologies named after G.F. Morozov

2. Voronezh State University of Forestry and Technologies named after G.F. Morozov
Russian Federation

3. Voronezh State University of Forestry and Technologies named after G.F. Morozov

Type:

Сonference article

DOI:

https://doi.org/10.58168/Forestry2025_773-785

Pages:

from 773 to 785

Published:

27.11.2025

Subject area:

UDC 004.9

Language:

Russian

Keywords:

forest fire, classification, machine learning, teacher training, dataset, big data, random forest, XGBoost, LightGBM, SMOTE, NearMiss, SMOTE-ENN

Abstract and keywords

Abstract (English):
The paper proposes an ML framework for classifying forest fires based on the synthesis of big data. ERA5 Copernicus data, historical data on fires of the Federal Forestry Agency and geodata of Siberia were used. Spatiotemporal filtering and resampling algorithms are applied. The model effectively detects fires, minimizing false alarms, and can be integrated into early warning systems for Russian forests.

Keywords:
forest fire, classification, machine learning, teacher training, dataset, big data, random forest, XGBoost, LightGBM, SMOTE, NearMiss, SMOTE-ENN

References

1. Rosleshoz. Otkrytye dannye o lesnyh pozharah za 2000–2018 gg. // https://rosleshoz.gov.ru. – Rezhim dostupa: 15.03.2024. – 120 s.

2. Tyukavina, A. Global trends of forest loss due to fire from 2001 to 2019 / A. Tyukavina, P. Potapov, M.C. Hansen // Frontiers in Remote Sensing. – 2022. – Vol. 3. – P. 825190. – 15 s.

3. Ghate, S.N. Forest wildfire detection and forecasting utilizing machine learning and image processing / S.N. Ghate, P. Sapkale, M. Mukhedkar // 2023 International Conference for Advancement in Technology (ICONAT). – IEEE, 2023. – P. 1–8. – 8 s.

4. Hersbach, H. ERA5 hourly data on single levels from 1940 to present / H. Hersbach et al. // Copernicus Climate Change Service (C3S) Climate Data Store (CDS). – 2023. – 280 s. – URL: https://cds.climate.copernicus.eu

5. Rosstat. Standartnaya klassifikaciya sub'ektov Rossiyskoy Federacii. // https://rosstat.gov.ru. – Rezhim dostupa: 10.02.2024. – 45 s.

6. Rosreestr. GIS-servis «Gidrografiya» (2022). // https://pkk.rosreestr.ru. – Rezhim dostupa: 12.02.2024. – 30 s.

7. Kaur, P. Data integration framework with multi-source big data for enhanced forest fire prediction / P. Kaur et al. // Manuscript under review. – 2023. – 25 s.

8. GOST R 57976-2017. Metodika ocenki pozharnoy opasnosti lesov. – M.: Izd-vo standartov, 2017. – 32 s.

9. Lemaître, G. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning / G. Lemaître, F. Nogueira, C.K. Aridas // Journal of Machine Learning Research. – 2017. – Vol. 18(17). – P. 1–5. – 5 s.

10. Chawla, N.V. SMOTE: Synthetic minority over-sampling technique / N.V. Chawla et al. // Journal of Artificial Intelligence Research. – 2002. – Vol. 16. – P. 321–357. – 37 s.

11. Batista, G.E. A study of the behavior of several methods for balancing machine learning training data / G.E. Batista et al. // ACM SIGKDD Explorations Newsletter. – 2004. – Vol. 6(1). – P. 20–29. – 10 s.

12. Rodriguez-Galiano, V.F. An assessment of the effectiveness of a random forest classifier for land-cover classification / V.F. Rodriguez-Galiano et al. // ISPRS Journal of Photogrammetry and Remote Sensing. – 2012. – Vol. 67. – P. 93–104. – 12 s.

13. Chen, T. XGBoost: A scalable tree boosting system / T. Chen, C. Guestrin // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. – 2016. – P. 785–794. – 10 s.

14. Ke, G. LightGBM: A highly efficient gradient boosting decision tree / G. Ke et al. // Advances in Neural Information Processing Systems. – 2017. – Vol. 30. – P. 3146–3154. – 9 s.

Submit manuscript

To cite

Citations:

Confirmation

Регистрация