May 10th, 2022
The H2O.ai Wildfire Challenge Winners Blog Series – Team Too Hot EncoderRSS Share Category: AI4Good, Community
By: H2O.ai Team
The aim of the project is to predict the probability of wildfire occurrence in Turkey for each month in 2020. As a result of these predictions, it is aimed to carry out more intensive monitoring studies in possible fire areas and to respond to fires very soon after they start. It is also aimed to derive generalizable relations by interpreting the model outputs and the importance attributed to each variable used by the model.
The wildfires in Turkey started in August 2021, spread over very large areas and resulted in the destruction of large areas and living things due to lack of intervention, have created a big agenda throughout the country. The public and politicians often complained about this technical inadequacy and suggested that improvements should be made in this regard. Within the scope of the project, it was desired to see whether an estimation could be made on this subject throughout the country, and if so, how successful the results would be.
The goal of the project is to estimate the probability of a wildfire occurrence for each month of 2020 for each grid segment by dividing the area of Turkey in latitude and longitude with 1 degree precision. It is defined in the H2O Competition Overview as “Predicting the behavior of wildfires”.
Since the probability of wildfire occurrence in certain areas in the future is being calculated, the following groups and individuals can benefit from this project:
- Firefighters: Fire departments can keep firefighter density higher in risky areas, this way faster response can be provided in case of a wildfire.
- Municipal Administrative Staff: The municipality administration can take protective and prohibitive precautions for various areas. Thus, the loss of life and property is minimized.
- Civil Society Organizations (CSOs): Civil society organizations can find the opportunity to strengthen their networks in advance to collect aid in possible disaster situations in risky areas.
Anıl Öztürk entered the competition alone. Anıl is a Machine Learning Engineer with a Master’s Degree in Computer Engineering from Istanbul, Turkey. He has mostly worked on tabular data, deep learning and deep reinforcement learning. He is passionate about following state-of-the-art, competing in Kaggle and whining about stochasticity. He is trying to gain experience in different domains by participating in local and global competitions as much as possible. This competition was also very interesting for Anil because he had no experience with geospatial data.
Historical active fire and temperature observations are used as features. LightGBM (an advanced decision tree algorithm) was used in the project. The following factors were effective in choosing this algorithm:
- Decision tree algorithms generally give better results than other statistical algorithms for tabular data. They are among the first algorithms to be tried.
- LightGBM is often faster to train than similar decision-tree-based algorithms.
- It has a library that makes it easy to use. Translation and transfer to different programming languages can be done.
- It contains various evaluation and analysis methods. Thus, when the user wants to measure model performance, s/he does not have to write code from scratch or search for it.
Within the scope of the competition, an interactive web application was requested. I designed an interface where users can make all the analyses and adjustments to the model. I tried to use self-explanatory visualizations whenever possible. Users can access non-technical details of the project, dataset explanations, dataset analysis graphics, model predictions and visualizations of that predictions, model evaluation and metrics screen from within the application.
The Key Takeaways
When I examined the feedback after the first-submission stage, I saw that the accuracy was one of the parts that had a low impact on the score. All scores and feedback statements were focused on usability, simplicity, and explainability. That’s why I paid the most attention to the following during the competition:
- The application is prepared with the ease of download&run (Unfortunately, there were problems when a few libraries used had critical changes with their updates during the evaluation period.)
- Non-technical personnel should be able to use the application easily, and the application should be easily deployed.
- The app should be easy to navigate.
- The application should hold the user’s hand as much as possible, and the explanation of even the things that a data scientist can easily understand should be added with the non-technical user in mind.
- The application should also present critical metrics that technical staff will understand.
This competition has been very beneficial for me in terms of the mentality of considering corporate and stakeholder requirements while designing a solution. In most competitions, these real-life requirements can be overlooked when trying to maximize a score metric.