Return to page

BLOG

Laying a Strong Foundation for Data Science Work

 headshot

By H2O.ai Team | minute read | November 24, 2017

Blog decorative banner image


By William Merchan, CSO, DataScience.com
 
In the past few years, data science  has become the cornerstone of enterprise companies’ efforts to understand how to deliver better customer experiences. Even so, when DataScience.com commissioned Forrester to survey over 200 data-driven businesses last year, only 22% reported they were leveraging big data well enough to get ahead of their competition.
That’s because there’s a big difference between building predictive models and putting them into production effectively. Data science teams need the support of IT  from the very beginning to ensure that issues with large-scale data management, governance, and access don’t stand in the way of operationalizing  key insights about your customers. However, many enterprise companies are still treating IT involvement as an afterthought, which ultimately delays the timeline for seeing value from their data science efforts.
There are many ways that better IT management can help scale the impact of data science  at your organization. Three best practices include using containers  for data science environments, managing compute resources effectively, and putting work into production faster with the help of tools. Here’s how it’s done.
1. Using software containers  is one of the most impactful steps you can take to implement IT management best practices . These standardized development environments ensure that the hard work your data scientists put into building predictive models won’t go to waste when it’s time to deploy their code. Without a container-based workflow, a data scientist starting a new analysis must either wait for IT to build an environment from scratch, or build one themselves using the unique combination of packages and resources they prefer — and waiting for those to install or compile.
There are two major issues associated with both of these approaches: they don’t scale, and they’re slow. When data scientists are individually responsible for configuring environments as needed, their work isn’t reproducible — if it’s used in a different environment, it might not even run. Containers put the power in the hands of IT to standardize environment configuration in advance using images, which are snapshots of containers. Data scientists can launch environments from those images — which have already been vetted by IT — saving a lot of time in the long run.
2. Provide ample computing power  to support your data scientists’ analysis from start to finish. Empowering them to spin up compute resources in the cloud as needed ensures they never get held up by limited computing power. It also eliminates the potential additional cost of maintaining unnecessary nodes. The same idea applies to on-prem data centers. IT must carefully monitor the expansion of data science work and scale resources accordingly. It may seem obvious, but IHS Markit reports that companies not anticipating this need lose approximately $700 billion a year to IT downtime.
3. Put data science work into production  right away to start seeing its value earlier on. Imagine your data science team has built a recommender system  to predict what products a customer is likely to enjoy based on the products he or she has already purchased. Even if you’re satisfied with the model’s accuracy and have identified some unexpected relationships that should inform your targeting strategies, this information still needs to be integrated into your application or website for it to be valuable.
Traditionally, the pipeline that delivers those recommendations to your customers would be built by engineers and require extensive support from IT. The rise of microservices, however, gives data scientists the opportunity to deploy models as APIs that can be integrated directly into an application.
If you’re among the 78% of companies not fully realizing the return on your data science investment, chances are there’s room to improve the IT foundation you’ve laid. To learn more about the next steps, find out how to take an agile approach to data science .
About the Author 
William Merchan  leads business and corporate development, partner initiatives, and strategy at DataScience.com  as chief strategy officer. He most recently served as SVP of Strategic Alliances and GM of Dynamic Pricing at MarketShare, where he oversaw global business development and partner relationships, and successfully led the company to a $450 million acquisition by Neustar.

 headshot

H2O.ai Team

At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI. Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.