November 24th, 2017

Laying a Strong Foundation for Data Science Work

RSS icon RSS Category: Data Science, IT
Fallback Featured Image


By William Merchan, CSO, DataScience.com

In the past few years, data science has become the cornerstone of enterprise companies’ efforts to understand how to deliver better customer experiences. Even so, when DataScience.com commissioned Forrester to survey over 200 data-driven businesses last year, only 22% reported they were leveraging big data well enough to get ahead of their competition.
That’s because there’s a big difference between building predictive models and putting them into production effectively. Data science teams need the support of IT from the very beginning to ensure that issues with large-scale data management, governance, and access don’t stand in the way of operationalizing key insights about your customers. However, many enterprise companies are still treating IT involvement as an afterthought, which ultimately delays the timeline for seeing value from their data science efforts.
There are many ways that better IT management can help scale the impact of data science at your organization. Three best practices include using containers for data science environments, managing compute resources effectively, and putting work into production faster with the help of tools. Here’s how it’s done.
1. Using software containers is one of the most impactful steps you can take to implement IT management best practices. These standardized development environments ensure that the hard work your data scientists put into building predictive models won’t go to waste when it’s time to deploy their code. Without a container-based workflow, a data scientist starting a new analysis must either wait for IT to build an environment from scratch, or build one themselves using the unique combination of packages and resources they prefer — and waiting for those to install or compile.
There are two major issues associated with both of these approaches: they don’t scale, and they’re slow. When data scientists are individually responsible for configuring environments as needed, their work isn’t reproducible — if it’s used in a different environment, it might not even run. Containers put the power in the hands of IT to standardize environment configuration in advance using images, which are snapshots of containers. Data scientists can launch environments from those images — which have already been vetted by IT — saving a lot of time in the long run.
2. Provide ample computing power to support your data scientists’ analysis from start to finish. Empowering them to spin up compute resources in the cloud as needed ensures they never get held up by limited computing power. It also eliminates the potential additional cost of maintaining unnecessary nodes. The same idea applies to on-prem data centers. IT must carefully monitor the expansion of data science work and scale resources accordingly. It may seem obvious, but IHS Markit reports that companies not anticipating this need lose approximately $700 billion a year to IT downtime.
3. Put data science work into production right away to start seeing its value earlier on. Imagine your data science team has built a recommender system to predict what products a customer is likely to enjoy based on the products he or she has already purchased. Even if you’re satisfied with the model’s accuracy and have identified some unexpected relationships that should inform your targeting strategies, this information still needs to be integrated into your application or website for it to be valuable.
Traditionally, the pipeline that delivers those recommendations to your customers would be built by engineers and require extensive support from IT. The rise of microservices, however, gives data scientists the opportunity to deploy models as APIs that can be integrated directly into an application.
If you’re among the 78% of companies not fully realizing the return on your data science investment, chances are there’s room to improve the IT foundation you’ve laid. To learn more about the next steps, find out how to take an agile approach to data science.
About the Author
William Merchan leads business and corporate development, partner initiatives, and strategy at DataScience.com as chief strategy officer. He most recently served as SVP of Strategic Alliances and GM of Dynamic Pricing at MarketShare, where he oversaw global business development and partner relationships, and successfully led the company to a $450 million acquisition by Neustar.

Leave a Reply

+
Developing and Retaining Data Science Talent

It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is

May 12, 2022 - by Jon Farland
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team Too Hot Encoder

Note: this is a community blog post by Team Too Hot Encoder - one of

May 10, 2022 - by H2O.ai Team
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team HTB

Note: this is a community blog post by Team HTB - one of the H2O.ai

May 10, 2022 - by H2O.ai Team
+
Bias and Debiasing

An important aspect of practicing machine learning in a responsible manner is understanding how models

April 15, 2022 - by Kim Montgomery
+
Comprehensive Guide to Image Classification using H2O Hydrogen Torch

In this article, we will learn how to build state-of-the-art models in computer vision and

March 29, 2022 - by H2O.ai Team
+
H2O Wave Snippet Plugin for PyCharm

Note: this blog post by Shamil Dilshan Prematunga was first published on Medium. What is PyCham? PyCharm

March 24, 2022 - by Shamil Prematunga

Start Your Free Trial