ON DEMAND
Accuracy Masterclass Part 4 - Time Series Modeling
This webinar will dive deeper into workflows for time series modeling such as forecasting. We'll show how to make sure that temporal causality is preserved during modeling, how to automatically generated lag-based features, how to deal with trends, and how to measure the performance of the model as time advances.
Read Transcript
Hi, everybody, I'm Megan Kurka. I'm a data scientist at H2O. And I worked at H2O for about six years. And today we're going to be discussing time series and how to how to build time series models for with H2O. So with that, I'm going to start sharing my screen and go over the agenda and what we're going to do today. So today's talk, we're going to be first just covering what time series modelling is, how does it work? Why what may be important to you, then we're going to talk specifically about how to build time series models. We're going to focus on driverless AI and today's call, which is our automated machine learning platform, at H2O, then we're going to talk about how do we validate those models? And then finally, how do we deploy those models to use them in a real-life use case in production. So first, we're going to be talking about time series modeling.
So what is time series? At a very simple definition, time series is really a series of data points ordered by time. So just I have information and some time components that when that happens, and why this is important, is that learning from the past can provide insights into the future and helps to answer this important question of what will happen next. And there are many different use cases in real life where time series can be applicable. So a really common use case might be forecasting revenue. We have data over time, you know, how much revenue is being made historically? Can we use that to project sales going forward or revenue going forward?
Same thing with demand and the supply chain idea. We have information about how much was sold historically, can we use that information to project how much will be demanded in the future? And think about other use cases? For example, for healthcare, can we predict how many, how many beds will be utilized at a hospital in the next week, next month, next day next year. So all of this is pretty flexible. The use case, the time units that forecast horizon are very flexible and very dependent on the business use case. So that's what time series is at a high level. And we're going to talk more from a practical perspective of how to build time series models to help with that real life business use case. So here's this kind of a general steps for modeling. There are five high level steps, and we're gonna be talking about this from a time series perspective.
But these pretty much are the steps we would do no matter what. So the first thing to do is identify the business use case, why do I want to build a model? Why is that important to me? What am I trying to fix and solve in real life? The second step is to prepare the data. How do we set up the data set to match this business problem. The third step is to build models. We're going to be focusing today on driverless AI, which is our automatic machine learning platform, which uses best practices that have been innovated by our top data scientists and are baked into the product. So we're going to see how we can automate some of that more tedious model building process with driverless AI. The next step would be to validate these models, once I have a ton of models that I built. How do we see which one we want to use?
Which one does well over time, which one does well and more recent data? How accurate is our models? What inputs what influences the model’s decisions? So not just how well does the model perform? But what does the model learn about our use case, our data? And then finally, deployment? So how do we deploy this model to automatically start forecasting for us, and there's a lot of different ways to deploy and a lot of different meanings for deployment. But really deploy is kind of how I get this model, which is more theoretical into a setup where we can make actionable steps for business value. So we're going to talk more about that that as well. But these are the kind of top five steps for modeling in general. The first step is identifying this business use case. And this is the most important and, in my opinion, we really want to start with what is the pain point that we want to solve? What I'm saying is start with a why then work your way back to the how, what, what's the what do we need to fix?
What do we need to improve then let's figure out how to do it. So for today's talk, we're going to be doing a not a real-life use case, but we'll do like Nick pretend use case where I own a small chain of coffee shops. And I know that we waste about 60% of our inventory on average by over supplying. So the waste accounts for about $10,000 of unnecessary cost per month. So my pain point is I want to reduce waste because I want to reduce unnecessary spend. So have identified this first step, which is the business use case. And that's really the most important. And if you can put dollar values around that, that's even better. But I figured out this is what my issue is. Now how do I solve that issue, so we're going to work backwards. So I have the pain point. Now I'm going to define the use case, I want to forecast the quantity sold each day per store.
So I can limit waste, if I can accurately predict how much of each product is sold per day, then I can start, ideally, kind of customizing the amount of each product that we make to minimize this unnecessary cost. Now our second step is to prepare the data. This can be almost the hardest, the hardest part. And it's really tightly tied with your business use case. Once you know your use case and your pain points, it really sets us up for success, because we'll pretty much know exactly what to do. We need to take raw data and make it so that it's ready for modeling. Let's take our use case, we're trying to reduce waste at this coffee shop. We're going to have raw data like sales receipts, the first thing we need to do is aggregate the data. So it's in line with my use case. And what that really means is you want to say out in the sentence, what you're looking to predict. And that's going to answer how to aggregate your data. So my use case is I want to know how much we sell of each product for each store for each day.
That's how I have to group my data. I want one record per product per store per day. If your use case is different, let's say we want to know how many beds will be filled at a hospital. Over the next three days, I have my menu answer. I want to aggregate per hospital total beds over the next three days. So by kind of verbalizing or writing down what your use case is, it becomes very clear how you set up your data. The next step will be to augment. You take your raw data, you're kind of doing that group by step, the next step is to augment Is there any additional data that might be helpful in predicting this use case? So we have our use case of waste, how much are we selling next, the next day. But maybe there's other information, not just a sales receipt that might be helpful. So maybe if I know what the weather will be like tomorrow, it might be helpful, maybe people are less likely to buy baked goods when it's raining out. Maybe they're more likely maybe if I if I mentioned that we're doing a marketing promotion, maybe that's more helpful as well. Maybe we see a huge uptick in a sale if we do a specific kind of marketing promotion.
So now you're gonna start to kind of review all of this other data that's available to see if it might maybe be helpful. And we'll talk about more of driverless later but the value of driverless is that you can kind of give it more data and driverless will see which is helpful and determine which features are not. And then the last step is to understand your data. So we're going to review all the columns in the data set. And we need to figure out what do we know in advance? And what do we not know? And we're basically going to go through the columns and figure out where do they where do they live? Some things we'll know in advance, maybe marketing promotions are planned over time. But we don't know the weather tomorrow. So we have to start to identify kind of how this can be used in real life, what will we know about what's going to have leakage? So here's our raw data, we have some sales receipts, transaction ID, the date, the time the store the product and how much was sold. We're not ready yet. For modeling, we have this raw data. And it's not in line with our goal, I don't want to know the quantity sold for each transaction, I want to know how much of each product will be sold at each store. So I have my first step, which is to aggregate I want to group the data by the date by the store and buy the product and calculate the total quantity.
That's my first step, I want to make sure that this data set is in line with my use case. And the important reason for that is it's much easier to predict how much will be sold the next day than it is to try to predict how much every transaction will sell for. That's just a really hard problem. Almost impossible, I would say. So you want to make it as easy as possible and really make sure that whatever we're solving for is exactly in line with our use case so we can measure the model performance. Okay, we've done our aggregation that's in the dark gray. I've grouped the data by transaction store product, and I've calculated the total sold. Now I gotta look and see if there's any other data that might be valuable. So for example, maybe the description that we have of the product might be predictive, maybe the current wholesale price might be predictive. Maybe if we're having a marketing promotion, or it's a new product flag, we could add all these features now to see, to see what what's helpful.
And for anyone that wants to follow along, this dataset has a public data set on the Kaggle datasets. website, so feel free to try it out on driverless AI later. The third step for preparing our data is to understand our columns. So we have, we have to kind of go through everything now and figure out where everything fits. So our first three columns, the transaction date, the store and the product, those are all grouping columns, essentially, we have to, we want to answer at that level, you want to know the quantity sold per transaction store and products. Now, there are other things that we have in our dataset other features, like the product description, which we know in advance, in this case, we know how we describe the product in advance. And there might be some features we don't know in advance, like the current wholesale price. So how much does it actually cost to make this product that'll change. Over time, we might not be able to predict that.
Or to know that that the next day is wholesale price. And then finally, we have our target, which is the total quantity sold. So the idea is that this dataset should look something like how you want your answer to be provided, you want to be able to say for this date for the store for this product, we predict the quantity sold to be about eight. Ideally, if we get this completely right. So that's the sanity check is this matching, again, my business use case. Once we have our data set ready, we're going to move to the modeling phase. And our goal is to really see if we can learn from the past behavior to predict the future. So here I have a time series graph. And we can see there's a couple of different components as we look at the history, but one is there's a general trend where maybe my coffee shops are growing in popularity over time. And then there's some cyclical nature. So maybe that number of sales peaks on the weekend drops on Monday, and then slowly grows up.
So there's definitely a clear pattern we see forming as we look at the data historically. And the goal of the time series modeling is to leverage that, that pattern, we want to take what's known and what we found in the past and use it to forecast into the future. And again, for our use case today, if we know how much will be sold tomorrow, the next week, the next quarter, we can more appropriately first stock our inventory. But secondly, more appropriately bake the right number of goods to minimize waste. So now that we've talked a little bit about the business use case and setting up our data to make sure it's in line, we're going to talk about driverless AI and how it works and how it how it handles time series models.
So driverless AI is our automated machine learning products. It automatically builds models on based on your use case to try to improve some model metrics. So it's really going to kind of do the best practices of a data scientist under the hood. For time series, specifically, it's going to still do that automated modeling, but with the lens of a forecasting problem. So when we when we do this demo today, we're going to see that the way that driverless approaches modeling is very time series specific. So it's going to automatically build features that look into the past, it's going to automatically validate our model into the future. So it keeps in mind that whole temporal aspect as it builds models. The other thing that driverless AI offers from the modeling perspective are leaderboards. Driverless has many different settings, which we'll go over and what it will try. And the leaderboard mode is building models with different settings to see which one does better. So perhaps a simple linear model might just be better overall, or a more complex model, it's going to try different settings and give you a leaderboard to examine and see.
The other thing that driverless AI offers is automatic post hoc analysis. It offers visual diagnostics and interpretability. For forecasting models. When you build a model, a time series model, you're going to often get a metric of how well it performed. So on average, I was off by maybe three product units, or $5. But it's also important to understand how the model is generating insights. So what is driving the predictions? Is there some kind of cyclical nature to our model that maybe is really important just for other pieces of the business to understand? Are there some features that also drive the model predictions? And then diagnostics? How does our model perform is, you know, are certain products easier to forecast than others? Maybe some seasonal cakes are really easy to forecast because they're always very popular around a certain time, but maybe other things are harder. And also to where do we perform really poorly. So where do we do badly? And why is it? Is it okay that we did badly? Is there something we can do to fix it? How do we handle those anomalies. So this is all automatically offered in driverless AI.
So we have our, again, our raw data, we're going to take this data and try to figure out the quantity sold. If I build my model, without, you know, looking into the past, essentially, I just treat this like a regular Id use case and build a tree-based model on top of it. In this example, I'm off by about four and a half units. So the model is not too great. If I use driverless AI, what driverless will do is it will start augmenting my data with lookback features or lakhs. And it does automatically I don't need to guide it, it will start to say okay, well, maybe the quantity sold yesterday will be predicted maybe what was sold two days ago was predictive, maybe what was sold last week, maybe if the date of the holiday is predictive, maybe what the wholesale price yesterday was predictive. So it's going to start creating these features, lags holiday features. Date Time Units, moving averages, is there a general trend where we see some stores just growing in popularity because of their location, it's going to create all of these features for me automatically, with the hope that these features will help improve the prediction of quantity sold, and it's going to use a traditional machine learning approach on top of this data, so we're gonna see x g boost and light GBM algorithms typically get utilized. But it's going to be utilized on top of this data that's already given historical context.
So we're going to keep adding these columns that give more and more insight into what happened over time. And when I do driverless AI with this, what we call a time series recipe, which is adding these historical features, we reduced the prediction error by one. So now we're only off by three and a half items. The more temporal kind of patterns we see in the data, the stronger that is, the more you'll see that the lags and the lag-based recipe and the moving averages are helpful. So when you see that there's a really strong pattern like we did acouple of slides ago. Like here, that's where we're really going to see this value of looking back at what happened last week, what happened last year, what happened yesterday, you're going to see it usually at a big improvement in model performance by looking back at these historical features. Okay, so I'm just going to talk quickly about how to set up this experiment. And then we'll jump into a demo and do it together. Well, when you use driverless AI, you're going to be set up to this this experiment setup page where you're going to answer questions about your use case. And again, these should really be in line with your business problem. So if you have your business use case or your problem statement out, we're kind of just looking through it and answering it for driverless AI so that they can build the correct model. So the first question is, what are the time group columns?
And this is really what level do you want predictions at. So in my example, I want to know the quantity sold per store, which is sales outlet, Id per day, and per product. So I'm going to give those three columns to driverless AI, so that it's aware of how I'm getting these predictions at what level I want them to be at. The second, the second input, I'm going to give us what columns are unavailable at prediction time. And we were doing this, this exercise together, essentially, when we looked at our data, while Collins will may not know about a time of scoring, maybe I found that the weather that day was really predictive, but I don't actually know the weather. So I need to tell driverless AI, that that information is unknown at scoring time. So look at the historical data. So driver, let's say I don't use the weather today use the weather yesterday. We basically want to make sure that once we build this model, we can actually use it we're not using any information that we wouldn't have available when we productionize. The next thing we want to fill in is the forecast horizon. So how many eggs should we forecast how many days how many years?
How many months? How far in advance? Are we forecasting? You know, I'm going to try to predict tomorrow but it might be more realistic that I need to forecast over the next seven days to give my Baker enough time to prep. So we can play around with this date. And again, it will be in line with our use case to make sure that whatever we predict is actionable. And finally the gap. Is there any gap you should have spectable for forecasting a gap means that we don't want to forecast directly after our data ends, we want to forecast further into the future. So one example of this could be that maybe it takes one week to kind of outlay the plan of how many goods should be made. So I don't want to forecast tomorrow, I want to forecast one week from today, seven days from today. So do you need a gap in your use case to make it actionable? Maybe, you know, nothing can be acted upon immediately, we need to actually predict further out where we can make a change.
Okay. The next thing we're going to fill in is our evaluating metric. So how do we say that this model does well or not? There's a lot of different metrics for evaluating regression models like this, where we're trying to predict a number, we're trying to predict the quantity sold. And again, we're going to try to make sure that this is in line with your use case. So really, when you're setting up an experiment, or doing any modeling, all of the pieces that you pick, should directly match in some way your use case. So Root Mean Squared Error is a really common metric. For scoring regression models, it's called our MSE for short. And it's something about it, it's sensitive to outliers. So if you're, if it's really important that we predict outliers correctly, we want to probably use Root Mean Squared Error. mean absolute error is another common performance metric. And it's going to be the average absolute error. So on average, I was three units off, that would be our mean absolute error. This is robust to outliers, meaning it's more important if we use the mean absolute error metric to get the behavior right on a typical occasion.
And then finally, median error rate, which is going to be kind of the percent error. So this is great. If we have a different scale across different groups, maybe one store sells a lot, and another store sells a little, but we still want to know on average, the percent error that we're getting, so that's going to be this percent error. So I was off by 10%, I was off by 5%. So these are the different, we have more. But these are kind of three of the common model metrics that we see being utilized. And again, it's really in line with how it helps your business use case. Okay. With that, I'm going to kind of switch over to driverless AI to show us setting up together and starting the model process. Okay. So this is our AI cloud, if anyone's familiar with it. And what you can do is go to my AI engines, which is going to start steam, which will help me launch a driverless AI instance. And I have one already running, so I'm going to just open up to it. Right, I've loaded my data set. And I have two data sets a training dataset in a testing data set, the testing data set consists of my last day of data.
And we can see, if we click on details a little bit about this dataset, so we can see distributions. quantity sold the products; I can look at the data. So data set rows as well. So we can see this is my data set. And here's my target column that quantity sold. So already looking we can kind of identify, Okay, we want to group by the transaction date, the sales outlet ID, the product. We know promotions, we know new products in advance, maybe we don't know the current wholesale price. So we kind of take a look and get an understanding of our data set. What we're going to do with driverless AI is automatically predict. So we click on that Predict button. And the first thing we'll fill in is our target column. What are we looking to solve for, we want to figure out the quantity sold. And the next thing we're gonna fill in as our time column, we're going to figure out the sales or the quantity sold in the future. So this time column is how driverless AI knows how to look into the future.
And that's going to be our transaction date. When you give it a time column, it's going to automatically open up a panel on the right-hand side, which will be a specific time series setting that we walked through. The first thing we're going to ask is how to group that data. I want to figure out the number of items sold per store and per products. That's what's needed for me because maybe there's one Baker per store and they need to know how much of each product to make, right? So I'm gonna have to group by that to get my prediction at that level. The next thing I'm going to fill in is what's unavailable at the time of prediction. So which ones do I not know about? Maybe we don't know let's say about the current retail price, and the current wholesale price. We don't know how much things are gonna sell for in advance, or maybe just the wholesale price Sorry, we don't know how much it will cost to make this product tomorrow. Finally, we have our forecast horizon, which will be one you want to know one day into the future. And we don't need any gap in this use case. But if we did, we could start to increase this and mentioned that we need a couple of days maybe to get this answer or to actually act on it.
You can also provide a test data set that you have to determine how it performs on some holdout data set that driverless AI will not view until the model the model has been created. So there won't be any leakage, I'm going to just lower these settings down for the purpose of the demo, accuracy and time control how many models are built, and how complex these models are. I'm going to lower this for the demo. But you can usually use the default settings and see how it performs. And it's going to try different settings, different feature engineering, transformers, different models, and so on. And the last thing you want to set is that score. So I'm going to try to predict the mean absolute error, which means how on average, how many units off of my launch. One thing to note about the score, and we picked me an absolute error, but it may be that in this use case, it's better if I have a little bit of waste than run out of things and have people unhappy that they couldn't get what they wanted.
So while we have some default scores available, we can even upload custom scores. If we say that it's 10 times worse, for example to under predict and have people unhappy, we can incorporate that in a custom score and make sure that again, is just really in line with my use case. So we see that driverless AI is building models. And as they've been talking, built about six. And while it's building it will see this variable importance graph, keep updating. The most important feature currently is the quantity sold yesterday for that product and sales outlet. And this makes kind of intuitive sense. Probably what happened yesterday is really indicative of what happens tomorrow. The other things that are coming up automatically are what happened 12 days ago, a moving average over time. So how is the salesman going up or down in general over time for this product and store. And it's gonna automatically keep trying to use different features. It's for each model trying different features.
So if you hover over different models will see different feature engineering being performed. On the right-hand side, you'll see the residuals of the model, and then the actual versus the predictive. So how well are we performing? Over time? Sorry, how are we performing actual versus predicted where we kind of performing poorly. And then finally, if you click on the insights button, here, you'll get to see how driverless AI is creating this validation. So it's automatically taking a look at this data for train and then this data for validation. And you can use this to determine how it's how it's creating these graphs on the right-hand side these performance metrics. Okay, I'm going to jump back to our slides for a moment just to talk about the other possibilities with driverless AI. We built a model, a single model, but we can also build a leaderboard. And what a leaderboard will do is it will create a series of those experiments automatically, with different settings and parameters. So here is our time series leaderboard. It's specific to the fact that it's time series. And it will automatically build these models so that you can compare them. So I'm not going to go through everything. But just to kind of point out a couple, it will try the default setting, whatever driverless AI kind of automatically wanted to do, which is what we were trying, it'll try turning certain things off, kind of not looking for holidays, not looking for features like year, month and day of week. Maybe it over fits on that. And it will try things like let's build a simple linear model with simple feature engineering. So driverless AI will try all of these different feature different experiments to see which performs best. The last two is when it tries, let's say time on aware transformation.
So treating it more like an IID problem. And this can be helpful if your use case while forecasting doesn't have strong predictive power from lags. So maybe it's more random. It's not this clear pattern forming over time. And these time unaware features are really predictive. So this leaderboard will automatically create a whole series of experiments for me, and we can try that out right now. Go here, we can run a new experiment. But rather than asked to launch experiment, I'm going to click on the right-hand button to create a leaderboard. And this is going to create a project for me. And we'll see all of these experiments start to queue up. And they'll be built with all of those different parameters that we discussed in the slide. And then I can start to kind of compare them, how long does it take to score with them? How do they perform, insert to evaluate my models.
Just one other thing I wanted to touch on is we use all the default settings. But driverless AI does offer expert settings specific for time series, and I'm going to highlight a couple of them that might be interesting. So by default, driverless AI, performed autocorrelation. To figure out what lags are the most important is it is it really predicted to look one day before two days, 12 days, 365 days, it's going to do all that automatically. But you can override that behavior and specify exactly how you want the lag to be made. The other thing you can do is modify the probability to create non target lag features. When you run time series, driverless will spend 90% of its time focusing on lagging the target column. But it might be equally as important to lag. Other features like maybe the lagging the, the retail hope the wholesale price, rather, of the baked goods is really predictive. So we can ask dry realistic changes behavior and focus on lagging other features that we think are helpful. And then finally, we also have a centering or detrending transformation that you can turn on or off. And this will normalize the target first, either by centering it or removing some linear trend and then fitting the model on that residual, essentially. So that's another kind of common export setting. There's a lot more in there if you take a look. But I just wanted to highlight a couple in case you're maybe concerned that the default settings don't give you enough control or you have kind of a specific way you want to do it. And this is how you would go about modifying that.
Alright, so the last thing we'll just talk about second to last thing we'll talk about is validating our models. So we built all these models, and you want to validate, and what driverless offers is the automatic interpretability, which will provide insights into how the model performed and also why it performed that way. So once we're done with our model, and it's complete, we can ask to interpret the model. And what this will do is it will show me different visuals as a way to identify the model performance. So one thing I can do is I can look at a specific group, and see the actual in yellow versus the forecasted, and how well did it perform. So I'm gonna just move over now to my driverless instance, and we'll show the slides, but this is the content of the slides that I'm going to show it live. You got it? Going, okay.
All right. So when I click on interpretation, maybe I'll show that to very quickly. Once you have your experiment on, you can click on interpret this model. And I already have the interpretation done, and it's going to open up these panels with different ways to explain the model. Our time series explainer is focused on the kind of the diagnostics, the validation. So I can choose a product and a sales outlet and see, how did it perform over time. So what's the actual prediction versus the forecasted? I'm gonna show this one. Just quickly, just so I can show it for a longer time, time hold out. All right, so we can see the actual in yellow and the predicted in white. And you can see how your model is doing over time. What's nice about this is that you'll quickly get a table with the top performing groups. So we're really good at predicting the number of cranberries going sold at this sales outlet. But you can also see the bottom we're really bad at predicting the hazelnut biscotti and sales outlet eight.
So we can use this to kind of identify where we want to focus on. So maybe we're really bad at predicting the hazelnuts, Scotty, why is that? Okay? We're really under predicting on a specific date. We can click on it. And you'll see on the right-hand side Shapley values, which are these essential reason codes, how the model predicted what it did, you'll see features that increase the prediction. So what happened eight days ago, increase the prediction by 1.2 units. And we can scroll down and see also what decrease the prediction as it goes down. And you can click on different points to analyze that and get a sense of why it’s perform poorly performed, performed well. So what happened yesterday reduced the prediction by half a unit. The other thing that the interpretability offers is kind of classical post hoc interpretability analysis like partial dependence plots.
So we can see here something like how the product affected the prediction. So, it always shows me um for specific products and just letting a compute how did that prediction change in general. So, if everyone had an if every item was a gender scone, on average prediction would be about 12. Ginger scones are really popular, but the other products he's on epistatic, and so on are much less popular. So we can see how the historical information is affecting the predictions. But we can also see from a global view, how different features affect the predictions are certain stores, or certain products more popular or less popular. And this will get us greater insight into our model. Maybe we're going to use our model to forecast but we also want to understand general trends happening in our in our data set, from a marketing perspective, sales perspective, and so on. Okay, we're going to finish up our talk with deploying these models. And then once that's over, we'll leave about 15 minutes for the q&a.
So I'm going to finish up our doc talk about how do we now do something with these models? So we've built models, we've interpreted them, we validated them. Now we want to kind of do something with them to make impactful business decisions. We started when in our poll, we asked how we kind of how the audience uses time series forecasting models. And a lot of the folks said production Ising. And that's what we're going to focus on today. But you know, the actions from the model don't have to necessarily be a model in production that scoring regularly, it may be that the action item is to make it easy to produce a dashboard with forecasted, forecasted quantity, supply, revenue, and so on to present to a team. That can be a lot of different ways how this model can be, let's say actionable, but today we're going to be focusing on actually deploying this model in a production environment. So there are three deployment options for time series. The first is we can use our driverless AI instance to forecast when we are in driverless AI, we can ask driverless to predict. So if we go to our experiment again. We can ask driverless to predict on a new data set. So we can go to driverless AI and ask for forecasts once we get future data. We can also do it through the Python API. So we can use Python code to communicate to driver lists to run the predictions. So our first and kind of most simple Production Ising step would be we're going to ask driverless to score at some interval.
The second way we offer for scoring is to download a scoring pipeline. There are two scoring pipelines, which one is Java, and one is Python. For a time series, we usually recommend Python scoring pipeline, because the Python scoring pipeline is able to do what we call test time augmentation, which means is able to regenerate these lag features with appropriate data and essentially allow you to use the model further out into the future. The other option is the Mojo which is a Java object for deployment. But again, it's limited to a specific time horizon before it becomes stale. And I'm going to talk about this. And then the last way we could deploy is with our AI cloud or H2O cloud. We have deployment options there for helping deploy models with you know, ease, one, click deployment, some model metrics, monitoring, and so on. We're going to talk a little bit about how we actually deploy with this test time augmentation approach. So I mentioned before that the Python scoring pipeline is the preferred method because it's able to do test time augmentation. And that really means it can take some historical information that happens after the training set to recreate the lag features. So let's take a look at our data. We have some features like the quantity sold yesterday, the quantity sold two days ago, you know is a holiday and I'm sorry, that was supposed to be wholesale price yesterday.
So we have used historical features to help build our model. And when we want to deploy this, and let's say we ended our data in April, but now it's August and we still want to we still want to get a prediction. We need to fill in this little gap here of what has happened since April. Driverless AI only has no knowledge about what happens up until April. So we have to make sure we pass this required contextual data, so that we give given an ability to make remake these lag features. So in our example, we want to predict an August. So we need to make sure when we score with that Python scoring pipeline, or when we score with that data to a cloud that we give information, what's been happening since April. Driverless only knows about April up to April data. So we need to give it that information. If we don’t, we risk getting a really poor performing prediction. So when we look back at our historical data, or data being used by the model, it's going to try to figure out it's going to give to the model, what was the quantity sold yesterday, what was the quantity sold two days ago, what was the wholesale price yesterday.
So we need to make sure we give the scoring pipeline the information to create these features again, otherwise, it has nothing to go on. And these really important lab features will be missing. So that's our test time augmentation. And that's available in the Python scoring pipeline, or in driverless AI or in different deployment options in the AI cloud. And what it would look like for deployment purposes. And this is just a simple example is you can ask driverless to generate that Python scoring pipeline we saw on the UI. This is an independent Python wheel. So it's not, it's no longer dependent on driverless AI anymore, you can import it like you would normally import any Python library. And this core will have information on how to take a raw record of our bakery data and get back a prediction. And I can give it a whole frame of data with historical information and or that contextual historical information that I need. And the output is going to be the prediction, which in this case, is five units and upper threshold seven and a lower threshold three. So it'll give the prediction and also some kind of upper and lower bounds to kind of help with kind of figuring out what's kind of a typical prediction and where it would range. And that's how we would use the python’s green pipeline in a simple example, but this can be deployed in many different ways since it's completely independent of driverless AI, and automatically generated. Okay, and with that, I think we're right around the time where we wanted to leave for questions.