Return to page

BLOG

Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

 headshot

By Parul Pandey | minute read | May 10, 2023

Blog decorative banner image

At H2O.ai, we believe technology can be a force for good, and we’re committed to leveraging its power to create a positive impact in the world. As part of this commitment, we recently organized an AI for Good Hackathon during the H2O World India event, where participants had the opportunity to apply their data science skills to a real-world use case related to pollution in India.

H2o.ai world India olympics hackathon H2o.ai world India olympics hackathon

The hackathon ran from April 8th to April 16th and saw over 250 participants submit innovative solutions to combat pollution. Participants were given access to data sets related to air pollution and were asked to develop solutions using machine learning models.

Use Case: Predicting the Air Quality Index of Indian Cities using Machine Learning

Air Quality Index of Indian Cities graph Air Quality Index of Indian Cities graph

Air is what keeps humans alive. Monitoring it and understanding its quality is of immense importance to our well-being. 

In this hackathon, participants were given the opportunity to use their data analysis and machine learning skills to forecast the AQI for major Indian AQI stations for the next 28 days. The dataset had information about several air pollutants that directly affect the Air Quality Index.

Dataset Details

The dataset consisted of historical daily average pollutants, including SO, CO, PM2.5, and other important factors that affect the air quality index. The challenge in this competition was to forecast average AQI levels across different stations in India for the next 28 days. The training data consisted of  2 years of historical data for 40 Indian AQI stations and consisted of the following attributes: 

  • ID_Date: Unique identifier of state, stationid and date
  • StateCode: State where the AQI station is located
  • StationId: AQI station ID
  • Date: Date when the observations where recorded
  • PM2.5: Average PM2.5 pollutant level
  • PM10: Average PM10 pollutant level
  • O3: Average O3 pollutant level
  • CO: Average CO pollutant level
  • SO2: Average SO2 pollutant level
  • AQI: Average Air Quality Index – target variable

Additionally, a sample submission file was also made available to specify the submission format. We used our inhouse competition platform called – H2O Olympics, to host the competition.

H2O Olympics Dashboard H2O Olympics Dashboard

Evaluation Process

The solutions were evaluated based on their performance, completeness, and storytelling. The performance metric was based on the model’s final performance on an unseen test dataset. Completeness assessed the overall solution and its components, including pre-processing, visualizations, feature engineering, model tuning, and model explainability. Storytelling focused on the business impact and top insights derived from the dataset and the model. Bonus points were also awarded for using H2O.ai libraries during the competition. 

Evaluation Criteria Evaluation Criteria

The hackathon was judged by a panel of H2O.ai data scientists who provided valuable insights and feedback to help select the top-performing teams. It inspired participants to showcase their best work.

Judges Panel Judges Panel

Top Teams

After a rigorous evaluation process, the top ten teams demonstrated the potential for machine learning to combat pollution and contribute to a cleaner environment. We were impressed by the participants’ creativity and knowledge in developing end-to-end solutions.

winners list winners list

The winners were announced during the H2O World India event.

Interview with the Winners

We had the pleasure of interviewing the top three hackathon winners, who shared their motivation for participating and their experiences during the hackathon. 

1st Place Winner: Dipayan Sarkar

Dipayan Sarkar, who finished first, mentioned that participating in the hackathon allowed him to push himself out of his comfort zone, network with other professionals, and stay up-to-date on the latest trends and best practices. Following is the overview of Dipayan’s approach:

1st Place Winner: Dipayan Sarkar approach workflow 1st Place Winner: Dipayan Sarkar approach workflow

The challenge exposed him to new ideas and approaches and motivated him to participate in more challenges in the future.

2nd Place Winner: Sagar Thackar and Shuchita Mishra

The team used comprehensive data preprocessing, exploratory analysis, analysis of time series features, feature engineering, and trained a series of different machine learning models to predict the AQI.

model results model results

The team also put together a really impressive H2O Wave Application to show the results and predictions. Live Demo: https://h2o-pipeline-aqi.herokuapp.com/site

3rd Place Winner: Nikhil Mishra and Nishchay Dhankar

The team used a really straightforward approach, which was a rule-based model using a mixture of the last seven days’ mean and median. The team also did feature engineering, such as the month of the year, lag features, and rolling features for all pollutants. And also set up the right validation strategy – using the last 28 days as the holdout and running the setup by removing the last 1, 2, and 3 months. The simple model worked well and gave impressive results. The team also developed an H2O Wave application.

data aggregation data aggregation

You can watch the winner’s Interview Panel to understand their solutions in depth.

Conclusion

The AI for Good Hackathon allowed participants to showcase their skills and contribute to a worthy cause. We hope the solutions developed during the hackathon will inspire others to use machine learning to address environmental issues and create a better future for all of us.

 headshot

Parul Pandey

Parul focuses on the intersection of H2O.ai, data science and community. She works as a Principal Data Scientist and is also a Kaggle Grandmaster in the Notebooks category.

 headshot

Shivam Bansal

Shivam is the 3x Kaggle Grandmaster, 5 times winner of Kaggle’s Analytics / Data Science for Good Competition, and the winner of several other offline and online competitions. He holds a master's degree from the National University of Singapore and was a Valedictorian. He has extensive cross-industry and hands-on experience in building data science products and applications. He brings a strong blend of technical and business skills with a practical and solution-driven approach. He supports various functions within the company which include - engineering, pre-sales, and customer success. His LinkedIn profile can be found here.