July 25th, 2019
A Driverless Approach to Make Forecasting Easy — Part 1RSS Share Category: H2O Driverless AI
By: Karthik Guruswamy
You are from the supply chain department or in a role in charge of creating future estimates on Product Sales, Patient admission, Retail Store Staffing, Energy use, Ticket sales, etc., based on historical data. A common problem is to forecast numbers one week, 4 weeks, 6 months or 1–5 year, etc., in future — basically short term & long term forecasts. Needless to say, well-informed forecasting allows creating optimized budgets, avoid excess inventory & wasteful expenditure, and in general planning for success & profitability.
What are the common challenges in AI/ML Forecasting?
For data scientists & business analysts, who have done time-series forecasting, some of this may sound familiar on why it causes an inordinate amount of time to get to an acceptable & repeatable solution.
- Getting the data in a good format is always a common challenge. But that’s just the tip of the iceberg.
- Understanding seasonality, short term and long term trends, holidays and factoring that in the model.
- Dealing with categorical data.
- Creating lag features — Which lag features should I build and use?
- What algorithms and feature transformers to choose? How do I pick the best one for the business problem? How do I tune the models? How to slide in bleeding-edge algorithms into the prediction process, w/o much effort?
- How do I explain my future point predictions to the business? My model is predicting that sales will be $1.5M 10 days from today and the rest of the days averaging $ 0.5M? Why is it deciding that way and what did it infer from historical data?
- How do I constantly retrain my model and make it production deployable?
The above is a common list, if not all. In this blog series, I will try to address some of the above issues. The rest of the topics will be covered in Part 2 of the blog post.
A simple stock price forecast model:
I used H2O.ai’s latest version of Driverless AI 1.7.0 for forecasting Google’s Stock Price for the month of July 2019. It’s a simple univariate forecast and great to see how to do that first.
Disclaimer: I do not recommend that you trade stock or derivatives based on the example here. It’s for illustration only. Future values are never guaranteed or deemed to be reliable by any model, so …
Driverless AI is an Automatic Machine Learning/Automatic Feature Engineering tool that can do time series forecasting, besides regression, classification, etc., It not only has built in time series Kaggle Grandmaster recipes, you can also bring your own algorithms, feature engineering code to enhance the model building process, aka BYOR (Bring Your Own Recipe). Through the custom recipe feature, you can bring in additional algorithms, feature engineering code such as Auto ARIMA, FB Prophet, etc., and run the Automatic ML to predict a target value.
Here’s a link to the open-source BYOR GitHub (we will use some of this in Part2 of the blog post): https://github.com/h2oai/driverlessai-recipes
For my experiment, I downloaded 5 years of Google Stock Price daily stock price data from Yahoo! Finance portal.
We split the downloaded CSV in an XLS to training and test data sets.
We will build the Driverless AI model on a training data set with daily closing stock price from:
07/23/14 to 06/28/19
Columns Dropped: Open, High, Low, Adj Close, Volume
Target Column: Close
Time Column was: Date
The test data set, has values from:
07/01/19 to 07/23/19
We choose a forecast horizon as 16 days (you can obviously play with this number) and choose SMAPE as the scorer. The Driverless recipe picked LightGBM and XGBoost along with potential feature engineering that could be used in the Automatic ML model/feature selection. Clicking the “Launch Experiment” (the horizontal yellow bar), you will see a screen similar to below after the experiment has finished. Clearly, the features for the final model are Exponential Moving Averages and some Derived Date Features.
We can then Click on “SCORE ON ANOTHER DATASET” and pick the Test Data Set to score and chart the actual vs predicted.
Here’s how my model predicted vs actual on the test data set.
[Close], blue is the original value.
[Close.predicted], green is what the model guessed for the test set, by only learning from the training set and not looking at the test set 🙂
It’s stunning to see with approximately $25 difference, it caught the upward trend for 3 weeks out!
So, how exactly the time series recipe in Driverless AI works? See link here.
The stock price example can be extended to any time series data to create future estimates!
What about Custom Recipes?
When Algorithms and Features compete, your forecast project(s) wins!
In this blog, we did not talk about adding other traditional and popular time series algorithms and transformers like FB Prophet, AutoArima, MACD, etc., Adding those would make Driverless AI try even more algorithms, do model tuning, do evolutionary feature engineering on new features — that would lead to more accuracy in your final model. I plan to cover the Custom Recipe settings for forecasting problems in A Driverless Approach to make Forecasting Easy -Part 2.
Want to play with Driverless AI on your time-series data? Here’s the 21-day trial link