May 10th, 2023
Insights from AI for Good Hackathon: Using Machine Learning to Tackle PollutionRSS Share Category: Hackathon
By: Parul Pandey and Shivam Bansal
At H2O.ai, we believe technology can be a force for good, and we’re committed to leveraging its power to create a positive impact in the world. As part of this commitment, we recently organized an AI for Good Hackathon during the H2O World India event, where participants had the opportunity to apply their data science skills to a real-world use case related to pollution in India.
The hackathon ran from April 8th to April 16th and saw over 250 participants submit innovative solutions to combat pollution. Participants were given access to data sets related to air pollution and were asked to develop solutions using machine learning models.
Use Case: Predicting the Air Quality Index of Indian Cities using Machine Learning
Air is what keeps humans alive. Monitoring it and understanding its quality is of immense importance to our well-being.
In this hackathon, participants were given the opportunity to use their data analysis and machine learning skills to forecast the AQI for major Indian AQI stations for the next 28 days. The dataset had information about several air pollutants that directly affect the Air Quality Index.
The dataset consisted of historical daily average pollutants, including SO, CO, PM2.5, and other important factors that affect the air quality index. The challenge in this competition was to forecast average AQI levels across different stations in India for the next 28 days. The training data consisted of 2 years of historical data for 40 Indian AQI stations and consisted of the following attributes:
- ID_Date: Unique identifier of state, stationid and date
- StateCode: State where the AQI station is located
- StationId: AQI station ID
- Date: Date when the observations where recorded
- PM2.5: Average PM2.5 pollutant level
- PM10: Average PM10 pollutant level
- O3: Average O3 pollutant level
- CO: Average CO pollutant level
- SO2: Average SO2 pollutant level
- AQI: Average Air Quality Index – target variable
Additionally, a sample submission file was also made available to specify the submission format. We used our inhouse competition platform called – H2O Olympics, to host the competition.
The solutions were evaluated based on their performance, completeness, and storytelling. The performance metric was based on the model’s final performance on an unseen test dataset. Completeness assessed the overall solution and its components, including pre-processing, visualizations, feature engineering, model tuning, and model explainability. Storytelling focused on the business impact and top insights derived from the dataset and the model. Bonus points were also awarded for using H2O.ai libraries during the competition.
The hackathon was judged by a panel of H2O.ai data scientists who provided valuable insights and feedback to help select the top-performing teams. It inspired participants to showcase their best work.
After a rigorous evaluation process, the top ten teams demonstrated the potential for machine learning to combat pollution and contribute to a cleaner environment. We were impressed by the participants’ creativity and knowledge in developing end-to-end solutions.
The winners were announced during the H2O World India event.
Interview with the Winners
We had the pleasure of interviewing the top three hackathon winners, who shared their motivation for participating and their experiences during the hackathon.
1st Place Winner: Dipayan Sarkar
Dipayan Sarkar, who finished first, mentioned that participating in the hackathon allowed him to push himself out of his comfort zone, network with other professionals, and stay up-to-date on the latest trends and best practices. Following is the overview of Dipayan’s approach:
The challenge exposed him to new ideas and approaches and motivated him to participate in more challenges in the future.
2nd Place Winner: Sagar Thackar and Shuchita Mishra
The team used comprehensive data preprocessing, exploratory analysis, analysis of time series features, feature engineering, and trained a series of different machine learning models to predict the AQI.
The team also put together a really impressive H2O Wave Application to show the results and predictions. Live Demo: https://h2o-pipeline-aqi.herokuapp.com/site
3rd Place Winner: Nikhil Mishra and Nishchay Dhankar
The team used a really straightforward approach, which was a rule-based model using a mixture of the last seven days’ mean and median. The team also did feature engineering, such as the month of the year, lag features, and rolling features for all pollutants. And also set up the right validation strategy – using the last 28 days as the holdout and running the setup by removing the last 1, 2, and 3 months. The simple model worked well and gave impressive results. The team also developed an H2O Wave application.
You can watch the winner’s Interview Panel to understand their solutions in depth.
The AI for Good Hackathon allowed participants to showcase their skills and contribute to a worthy cause. We hope the solutions developed during the hackathon will inspire others to use machine learning to address environmental issues and create a better future for all of us.