July 25th, 2019

A Driverless Approach to Make Forecasting Easy — Part 1

RSS icon RSS Category: H2O Driverless AI

You are from the supply chain department or in a role in charge of creating future estimates on Product Sales, Patient admission, Retail Store Staffing, Energy use, Ticket sales, etc., based on historical data. A common problem is to forecast numbers one week, 4 weeks, 6 months or 1–5 year, etc., in future — basically short term & long term forecasts. Needless to say, well-informed forecasting allows creating optimized budgets, avoid excess inventory & wasteful expenditure, and in general planning for success & profitability.

For data scientists & business analysts, who have done time-series forecasting, some of this may sound familiar on why it causes an inordinate amount of time to get to an acceptable & repeatable solution.

  • Getting the data in a good format is always a common challenge. But that’s just the tip of the iceberg.
  • Understanding seasonality, short term and long term trends, holidays and factoring that in the model.
  • Dealing with categorical data.
  • Creating lag features — Which lag features should I build and use?
  • What algorithms and feature transformers to choose? How do I pick the best one for the business problem? How do I tune the models? How to slide in bleeding-edge algorithms into the prediction process, w/o much effort?
  • How do I explain my future point predictions to the business? My model is predicting that sales will be $1.5M 10 days from today and the rest of the days averaging $ 0.5M? Why is it deciding that way and what did it infer from historical data?
  • How do I constantly retrain my model and make it production deployable?

The above is a common list, if not all. In this blog series, I will try to address some of the above issues. The rest of the topics will be covered in Part 2 of the blog post.

I used H2O.ai’s latest version of Driverless AI 1.7.0 for forecasting Google’s Stock Price for the month of July 2019. It’s a simple univariate forecast and great to see how to do that first.

Disclaimer: I do not recommend that you trade stock or derivatives based on the example here. It’s for illustration only. Future values are never guaranteed or deemed to be reliable by any model, so …

Driverless AI is an Automatic Machine Learning/Automatic Feature Engineering tool that can do time series forecasting, besides regression, classification, etc., It not only has built in time series Kaggle Grandmaster recipes, you can also bring your own algorithms, feature engineering code to enhance the model building process, aka BYOR (Bring Your Own Recipe). Through the custom recipe feature, you can bring in additional algorithms, feature engineering code such as Auto ARIMA, FB Prophet, etc., and run the Automatic ML to predict a target value.

Here’s a link to the open-source BYOR GitHub (we will use some of this in Part2 of the blog post): https://github.com/h2oai/driverlessai-recipes 

For my experiment, I downloaded 5 years of Google Stock Price daily stock price data from Yahoo! Finance portal.

We split the downloaded CSV in an XLS to training and test data sets.

We will build the Driverless AI model on a training data set with daily closing stock price from:

07/23/14 to 06/28/19

Columns Dropped: Open, High, Low, Adj Close, Volume

Target Column: Close

Time Column was: Date

The test data set, has values from:

07/01/19 to 07/23/19

We choose a forecast horizon as 16 days (you can obviously play with this number) and choose SMAPE as the scorer. The Driverless recipe picked LightGBM and XGBoost along with potential feature engineering that could be used in the Automatic ML model/feature selection. Clicking the “Launch Experiment” (the horizontal yellow bar), you will see a screen similar to below after the experiment has finished. Clearly, the features for the final model are Exponential Moving Averages and some Derived Date Features.

We can then Click on “SCORE ON ANOTHER DATASET” and pick the Test Data Set to score and chart the actual vs predicted.

Here’s how my model predicted vs actual on the test data set.

[Close], blue is the original value.

[Close.predicted], green is what the model guessed for the test set, by only learning from the training set and not looking at the test set 🙂

It’s stunning to see with approximately $25 difference, it caught the upward trend for 3 weeks out!

So, how exactly the time series recipe in Driverless AI works? See link here.

The stock price example can be extended to any time series data to create future estimates!

What about Custom Recipes?

When Algorithms and Features compete, your forecast project(s) wins!

In this blog, we did not talk about adding other traditional and popular time series algorithms and transformers like FB Prophet, AutoArima, MACD, etc., Adding those would make Driverless AI try even more algorithms, do model tuning, do evolutionary feature engineering on new features — that would lead to more accuracy in your final model. I plan to cover the Custom Recipe settings for forecasting problems in A Driverless Approach to make Forecasting Easy -Part 2.

Want to play with Driverless AI on your time-series data? Here’s the 21-day trial link

About the Author

Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition.

Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.

Leave a Reply

Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders

On April 19th, the H2O World  made its debut in India, marking yet another milestone

May 29, 2023 - by Parul Pandey
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More