Return to page


Boosting your ROI with AutoML & Automatic Feature Engineering


By Karthik Guruswamy | minute read | February 25, 2019

Blog decorative banner image

If your business has started using AI/ML tools or just started to think about it, this blog is for you. Whether you are a data scientist, VP of data science  or a line of a business owner, you are probably wondering how AI will impact your organization in various ways or why your current strategies are not working somehow. If you are not using AI/ML, very good chances are that your vendors, customers etc., are using it in some way already.

Machine Learning/AI

Machine Learning was developed in the last decade(s) to build great expert systems – to learn patterns from historical data and build models, so they can embed and operate in the real world without the need to construct exhaustive manual rules/code to take care of every possibility. Standard Machine Learning techniques requires a data scientist to try algorithms such as Deep Learning , XGBoost, Random Forest , etc., on historical data to extract the features that influence some known outcome, such as churn, adoption of a product, etc., The data scientists cleanses the data, does manual feature engineering  (modify the columns of data to do some binning, encoding etc.,), tunes algorithms to maximize the accuracy of the models. Common uses of Machine Learning includes:

  • To build high accuracy models such that models perform reasonably well on the data it has not seen before in production.
  • In some cases, models are just constructed to explain away the relationship of features to outcomes.

Artificial Intelligence (AI) is a much broader umbrella than Machine Learning – you can think of Machine Learning as an application of Artificial Intelligence. The lines blur all the time given Machine Learning is the foundation for AI, which keeps evolving every single day pushing the boundaries. For now, I’d like to call it AI/ML   in this blog.

Why AI/ML?

AI/ML can automate your business, remove inefficiencies and grease the wheels. Works well in the era of big data where you need to make expert decisions with a lot of data that comes in each day. Loan approvals, Churn prevention, Fraud detection, Product Recommendations, Targeted Marketing Campaigns, etc., can be automated and be explained away and accounted for using advanced tools like MLI (Machine Learning Interpretability).

Costs & Challenges associated with AI/ML in production today

To deploy AI/ML in your enterprise, a business needs to at-least invest in the following:

  •  Cloud provisioning costs or On-Prem infrastructure with GPUs etc.,
  • Data Scientists
  • Open Source or Proprietary AI/ML Software
  • Production Application software/integration /Dev Ops Personnel etc.,
  • Training

So, if this looks so cut and dry, why don’t most enterprises deploy 10s and 100s and 1000s of AI/ML projects in production – in the likes of the big players?

Delay in AI/ML projects are often misconstrued today to be primarily related to data cleansing, data prep, data availability, etc., These are just means to the end. There are a lot of instances where enterprises have already figured out a way to create data lakes/warehouses for ‘good enough data’ and also by investing in data cleansing/curating tools – yet only a fraction of these enterprises have quality models in production that’s performing well.

What’s stopping or slowing production AI deployment?

The real challenge today is that data scientists spend a lot of time struggling with feature engineering (not sure which features are important or how to transform it in a way that’s suitable) to build good models without overfitting. They also have to balance the fact that they’ve to explain to the business what’s going on in their models.

Even if the architecture supports GPUs, not all algorithms run on GPUs and also no way to figure out manually which algorithm or combinations of algorithms and its parameters are suitable for models – weeks or months are spent on all of this, basically making your highly paid data scientists focus on just a handful of problems, while data is accumulating in the data lake!

Why you shouldn’t deploy a low accuracy model in production …

Why not just deploy a low accuracy model that was found in the first week of the project? Low accuracy models create what is known as the misclassification cost for the business. If your model doesn’t predict right, here are some scenarios that can happen. It’s not difficult to see how the costs add up on wrong decisions:

  • Missed predicting potential fraud/hacking: $ stolen, Costs to customers, Investigation/follow up costs, PR
  • Wrong diagnosis/prognosis: Leading to higher hospital readmission costs as the disease progressed further.
  • Missed non-compliance cases: Fines from regulatory agencies
  • Incorrect Recommendations: The customer doesn’t feel his/her needs are met. Low stickiness. They’ll know you don’t get them or making an effort + you won’t sell what you recommend
  • Missed Churn Signal: Cost of customer reacquisition, Asset diminishment (customer moves assets away), Lost to competition.
  • Promotional email to a wrong customer: The customer will ignore it and lost opportunity from a more relevant customer – revenue lost

In general, every false positive and negative with an AI/ML assisted decision has a cost associated with it. It can affect your brand , customer perception about your business , missed opportunities of capturing wallet share of a potential customer.

If you see a data scientist struggling for weeks or hesitant about the productionizing models, there is a pretty good chance he or she is trying their best to minimize the misclassification cost for the model!

How can Auto ML/Automatic Feature Engineering help?

The goal of Automatic ML/Feature Engineering is to give your business, the highest accurate model there is and make the data scientist super-productive. It’s like strapping a powered exoskeleton or rocket engines to your data scientist to go faster in the AI projects and finish ahead of time …

www.h2o.ai2019/02/Screen-Shot-2019-02-25-at-3.29.46-PM-1024x854.png www.h2o.ai2019/02/Screen-Shot-2019-02-25-at-3.29.46-PM-1024x854.png

With a lot less effort, your data scientist can create 10s, 100s of models automatically and let AI decide the model with the highest accuracy  possible – So better utilization of data scientist’s time  with lower misclassification costs . Going beyond one project for weeks and months of model development, the data scientist can work on multiple projects with high accuracy models built within hours and days. As new data comes in, models can be retrained easily in a short period of time to keep it fresh in production. Part of the automation is also auto-deploying the model in production once it tested in the framework.

In the end, across many production applications, every single yes/no decision or categorization or an estimate made is done with fewer errors with context and relevance to your customer boosting your ROI.

Want to try Auto ML and Automatic Feature Engineering with your data? Request a demo  of H2O’s AI Cloud.


Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition. Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.