During times of severe societal strain, individuals have historically shown an inclination to offer aid and assistance. Often these sacrifices have been at great cost to life or livelihood. In other cases, the efforts have been seemingly more mundane but nevertheless still essential. The efforts of the over 10,000 women code breakers of World War II is one such example. From 1941 to 1945, these women, recruited because of their math, science and foreign language abilities, worked tirelessly to break down and understand constantly mutating code systems. On any given day, a single individual’s efforts likely seemed minor. But in the collective, the results were substantial. At the conclusion of the war, Major General Chamberlin noted that these efforts “saved us many thousands of lives” and “shortened the war by no less than two years.” As data scientists, we currently have the ability to, in our own small way, contribute significantly to a contemporary battle: understand and prevent the spread of COVID-19.
Of note, it does seem clear that our most productive work on this topic will be in coordination with healthcare facilities and researchers. Just as the work of the WWII code breakers was collaborative and coordinated, so too should our efforts be collaborative ones with those on the medical front line. That said, there are a growing number of opportunities for interested data scientists. These include:
Moreover, there are increasingly a number of open-source data sets available for those willing to contribute to the effort. In our own efforts, for example, we have made use of the following data:
Using these (and other such data), construction of time series models that predict future cases of COVID-19 for different geographic regions, as well as forecast hospital admissions and assess when maximum capacity will be reached for a given region.
For example consider the following SEIR (Susceptible-Exposed-Infected-Resistant) dashboarding application developed with H2O Q and H2O Driverless AI that is automatically updated as new daily data is made available.
The application first takes as input (in addition to the available data) selected hospital and demographic input for a given hospital system. Then, using the selected parameters, new cases can be forecast for a given region with daily updates:
Second, using publically-available hospital bed data for a given region, capacity assessment for both overall hospital bed usage and ICU bed usage can be made:
Then, based on the latest data, flags and warnings can be designed and implemented.
Other simple, but useful applications are also possible. In some areas, substantial progress has already been made. Image processing, for example, has been found to be useful in the effective diagnosis of COVID-19. Likewise, using EHR (electronic health record) data, it is possible to identify variables associated with severe complications . Currently, there are a number of pharmaceutical research firms using AI for COVID-19 drug development. Further applications might include assessment of the impact of the virus against economic indicators and/or understanding the impact of weather in the spreading of COVID-19.
In the end, it seems fruitful to explore areas of application where data science can contribute to the efforts to understand and combat COVID-19. Our hope is that, by joining forces, data scientists and medical practitioners can make effective and significant progress in these efforts.