March 26th, 2020

COVID-19: Doing Good with Data + AI

RSS icon RSS Category: AI4Good, Data Science, Healthcare, Machine Learning, Time Series

During times of severe societal strain, individuals have historically shown an inclination to offer aid and assistance. Often these sacrifices have been at great cost to life or livelihood. In other cases, the efforts have been seemingly more mundane but nevertheless still essential. The efforts of the over 10,000 women code breakers of World War II is one such example. From 1941 to 1945, these women, recruited because of their math, science and foreign language abilities, worked tirelessly to break down and understand constantly mutating code systems. On any given day, a single individual’s efforts likely seemed minor. But in the collective, the results were substantial. At the conclusion of the war, Major General Chamberlin noted that these efforts “saved us many thousands of lives” and “shortened the war by no less than two years.” As data scientists, we currently have the ability to, in our own small way, contribute significantly to a contemporary battle: understand and prevent the spread of COVID-19.

Of note, it does seem clear that our most productive work on this topic will be in coordination with healthcare facilities and researchers. Just as the work of the WWII code breakers was collaborative and coordinated, so too should our efforts be collaborative ones with those on the medical front line. That said, there are a growing number of opportunities for interested data scientists. These include:

Moreover, there are increasingly a number of open-source data sets available for those willing to contribute to the effort. In our own efforts, for example, we have made use of the following data:

  • There is the popular 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE which contains confirmed, recovered and deceased cases of COVID-19 around the world. For the USA, it can provide some of this information at state and county level too. The same information could also be retrieved from the following source in different formats. Bear in mind the data sets are not perfect. They contain inaccuracies and duplicated entries, but they should provide a good basis for getting a reasonable understanding of how the virus spreads around the globe.
  • The following website has information regarding total beds and ICU units from multiple hospitals across the USA. It also estimates their current capacity. Similar information can also be retrieved from the following online spreadsheet.
  • The COVID Tracking Project has information regarding COVID-19 tests for multiple states in the USA, along with a breakdown of whether they were positive or negative.
  • In the interest of comparing COVID-19’s average days of staying in hospital against other diseases, the OECD website contains very useful information for multiple diseases and for many countries.
  • Hospital admission rates for the USA can be retrieved from here. For state level hospital admission rates, there is a breakdown here.

Using these (and other such data), construction of time series models that predict future cases of COVID-19 for different geographic regions, as well as forecast hospital admissions and assess when maximum capacity will be reached for a given region.

For example consider the following SEIR (Susceptible-Exposed-Infected-Resistant) dashboarding application developed with H2O Q and H2O Driverless AI that is automatically updated as new daily data is made available.

The application first takes as input (in addition to the available data) selected hospital and demographic input for a given hospital system. Then, using the selected parameters, new cases can be forecast for a given region with daily updates:


Second, using publically-available hospital bed data for a given region, capacity assessment for both overall hospital bed usage and ICU bed usage can be made:

Then, based on the latest data, flags and warnings can be designed and implemented.

Other simple, but useful applications are also possible. In some areas, substantial progress has already been made. Image processing, for example, has been found to be useful in the effective diagnosis of COVID-19. Likewise, using EHR (electronic health record) data, it is possible to identify variables associated with severe complications. Currently, there are a number of pharmaceutical research firms using AI for COVID-19 drug development. Further applications might include assessment of the impact of the virus against economic indicators and/or understanding the impact of weather in the spreading of COVID-19.

In the end, it seems fruitful to explore areas of application where data science can contribute to the efforts to understand and combat COVID-19. Our hope is that, by joining forces, data scientists and medical practitioners can make effective and significant progress in these efforts.


About the Authors

David Engler

David Engler is a Senior Data Scientist and the Director of Customer Success at H2O. He has 15 years of experience leading data science teams in healthcare research and analytics and has over 20 publications in medical analytics as a primary author. He most recently built and led the analytics team for healthcare strategy at the University of Utah hospitals and clinics. David obtained his PhD in Biostatistics from Harvard University.

Marios Michailidis

Marios Michailidis is a competitive data scientist at and a Kaggle Grandmaster (ex World #1 out of 500,000 members) . He holds a Bsc in accounting Finance from the University of Macedonia in Greece and an Msc in Risk Management from the University of Southampton. He has obtained  his PhD in machine learning at University College London (UCL) with a focus on ensemble modelling. He has worked in both marketing and credit sectors in the UK Market and has led many analytics’ projects with various themes including: Acquisition, Retention, Recommenders, Uplift, fraud detection, portfolio optimisation and more. He is the creator of KazAnova, a project made in Java for quick credit scoring  as well as is the creator of StackNet Meta-Modelling Framework.  Marios’ LinkedIn profile can be found here with more information about what he is working on now or past projects.

Leave a Reply

Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders

On April 19th, the H2O World  made its debut in India, marking yet another milestone

May 29, 2023 - by Parul Pandey
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
Building the World’s Best Open-Source Large Language Model:’s Journey

At, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More