In 1846, a physician named Ignatz Semmelweis, located at the Allgemeine Krankenhaus in Vienna, faced a dire healthcare crisis. He observed that the maternity ward in his own hospital (as well as those in other area hospitals) had a maternal mortality rate of over 15%. That is, one out of every six mothers who came to his hospital to give birth either died during childbirth or shortly thereafter. Mothers would arrive at the hospital healthy, give birth, and then contract a high fever and die. The primary explanation for these deaths was referred to as puerperal fever, but its underlying causes were unknown. Substantial speculation ensued and factors including but not limited to stale air, tight petticoats, cosmic influences, errors in diet and personal disposition were all blamed. Fear and uncertainty gripped the community.
Dr. Semmelweis began by focusing on the available data. He was immediately able to narrow his focus by comparing outcomes between mothers with equivalent underlying health who gave birth in different locations (e.g., at home, in a midwives ward in his own hospital and in the doctors ward). In the end, he was able to determine that the sickness was introduced to mothers by physicians who moved between different areas of the hospital without sanitizing their tools and hands. By ordering all physicians and students to briefly disinfect their hands in a chlorinated wash before visiting the maternity ward, the mortality rate fell to 1%.
The use of observational data to understand and combat both the spread of disease and the spread of fear and uncertainty stretches back well over a century. In addition to Dr. Semmelweis, John Snow and Florence Nightingale are two well-known early pioneers in this use of observational data. Many have followed in their footsteps.
Here at H2O.ai, like much of the world, we have been deeply concerned by the spread of the Coronavirus (COVID-19). As data scientists, we feel an added responsibility to contribute to the multiple efforts currently underway to utilize available data and build models that will to help the medical community understand, prepare for and prevent the further spread of this virus. We are grateful for those who have made efforts to make timely and reliable data available to the broader data science community. All efforts are predicated on access to these data. We are likewise grateful for the many medical researchers and data practitioners who are making headway in Coronavirus testing, tracking, forecasting and the beginnings of available treatment. We ourselves, in partnership with numerous healthcare facilities and providers, are engaged in several efforts across these different areas of data research. We look forward to continued, successful work in the fight against this disease.