ON DEMAND

Accuracy Masterclass Part 5 - The Last Mile of Accuracy

AutoML tools help data scientists avoid common pitfalls and achieve their desired accuracy and interpretability. AutoML products are generally available in open-source and closed-source. We discuss how H2O Driverless AI stacks-up against other tools for accuracy and extensibility.

3 Main Learning Points

How H2O Driverless AI seeks to obtain the best accuracy and helps avoid common data science pitfalls
How H2O Driverless AI does against other AutoML tools on diverse datasets
How H2O Driverless AI is easily extended to custom problems via custom recipes

Read Transcript

So today, I'll be talking about auto ml. And basically how tools in general obtain quite accurate results. Using a variety of techniques, I'll talk about some of the tools that are used in the world to do this auto ml trick. And I'll discuss some of the data science best practices that will showcase how some of these tools work. And I'll talk about customizing Driverless AI. And because that's one of the ways in which you can really dig into the domain science specifics for your, for your own data. And then I'll summarize. So what should auto ml tools do? And by that, I mean, what do they help with. So in generally, you're probably familiar with this. But I wanted to give this background to make it clear what we're talking about. So roughly speaking, auto ml tools help with accuracy, of course, they try to make it so that you don't have to worry about whether or not you're gonna get the best model. And in particular, which I'll discuss a little bit, you want to make sure that they don't at least give you the worst model. That would be the worst thing is that you some advanced tool, and you end up getting the worst model that you could get just because of problem tool. It also will help with interoperability, which means that it will be able to balance how many features you have against the needs of latency for scoring, versus the size of the of the model. And basically find the balance between all those different things. In particular, regarding the durability, something you can control. So hopefully, that that will be something you're able to do, and easy to handle diverse situations. There can be many classes, it can be regression binary, but maybe there are 350 classes, like different outcomes that you're trying to predict. And there may be 300 Different kinds of foods that you're trying to predict or different kinds of commodities, it needs to handle a variety of different kinds of columns, meaning, not only counting needs to be, you know, one column, for example, if it's a one column, one feature and one target, or it could be a million columns. And not all tools can handle that, of course, a large number of rows or even a very small number of rows. So sometimes you might have a very small data set. And that's very important to you, it doesn't really matter that it's 1000 rows or something like that, you need the best model, what do you do when you have only 1000 rows. And of course, it needs to handle the kind of data different data. And that could be called multimodal. And it could handle a variety of types of data like text, or other categories, which are look kind of like text, but are more not a written word of some kind, more like a label like cat, dog mouse. And it needs to handle maybe images, it may need to handle actual text, like in natural language processing type model. It needs to find a balance between all those things, like I mentioned, it needs also probably help you explore custom aspects that are domain specific issues for your problem. And of course, the big thing is avoiding mistakes. If you create one of these kinds of tools like I have, you'll find that it's very easy to make mistakes. And I make mistakes all the time. And having a tool helps you formalize things that you don't make mistakes, like whether or not you have data leakage or something like that. And it needs to be easy to reuse. So when you create a model, you don't want that to be the end of the story. You want to be able to reuse it, study it, deploy it, give it share it with somebody else, you want to be able to track it as it's performing predictions, and you want to be able to maintain it across a different set of people like a lot of products to another aspect of auto ML is in terms of what it is how it consumes data. So the very first step is of course you need data. And that's an important part of any auto ml tool. And it's not part of every tool. So one is you need to deal with data preparation. You want some kind of feature store that manages the data, and not just a data lake, but one that knows about the importance of the features to a variety models. And maybe you need to write arbitrary code like in Driverless you can write arbitrary Python code to deal with data was its internals. And then of course, data ingestion you have For a variety of statistics and visualizations about the data,

and whenever you're doing auto ml, you need to make sure that the data makes sense. You have to check the training data, the validation data, test data, make sure the types makes sense between all those different sets. Deal with Dan's infinities. Check the target, to make sure it makes sense. Check the Time column to make sure it has no missing because it wouldn't make any sense. Check any weight or full column. And of course, you can check for duplicates. Because if you have duplicates, which is quite common, in a lot of cases, between, say the training set has duplicates. That might be okay if it's intentional, but if there's duplicates that are in train, but they also appear on the test set, you probably want to know about that. And we need to determine the data science types like so we've taken in the actual data, we need to determine is this stringy looking thing. It's alphanumeric, categorical or text like, and that has to be, you know, that helps cut down on the search space that the auto ml tool eventually has to do. And of course, figure out the date format in case you're doing time series, it should check for data leakage in a variety of ways. It should compute something like AUC for leakage, it could do R squared or compute the correlation. And it should check for data shift data shift is when you have a training set and say a test set. And you assumed let's say that this is independent rows, we call ID, but maybe they're not, maybe there is some kind of trend in time for half per, for example, from the training set into the test set. And you need to know about that, and maybe deal with it somehow. And of course, as we've had other talks, in particular, Dimitri Larco talked about feature selection. And one important aspect, even up front, when you have too much data, is to somehow reduce the dimensionality. If you have a large number of categoricals. How do you do that. And then now that you have the data, and you sort of prepared yourself for the kind of data you have, you need to make models. This is just an overview. So you might start tuning like, try different things. This is what auto ML does, but a more guided way, typically. So you might try a variety of target transformations, changing the target for regression. And then of course, you might have a variety of models, tree models, linear models, neural networks, NLP models, image models, and they have their own hyper parameters. And you want of course, try those. And you might have a variety of features, you want to do feature engineering, and you want to, for example, have the features that were text or something text like to have some kind of frequency encoding, or like TF IDF, where it's looking at the actual

account of different patterns in the data. And then you have the build up, after you've done all that tuning, which is what I'll focus on a lot a little bit later, just kind of the more interesting aspects of auto ml, you know, build a final model. And that final model might be a single model might be an ensemble, if it's doing time series, that's when you can do some back testing to determine what's happening across different time slices, you can check the stability of the model. So you can compare the final model to what you are tuning and check and see whether or not the features are behaving appropriately and that the model across all say cross validated splits first behaving appropriately. And of course, you can generate artifacts for scoring later. Like either what we call Mojo in H2O three, which is just basically a Java code to run and spark or something like that, or Python code to run an ML ops platform or for local use. And then once you're done with all this, it's not the end of the story. Even auto ml means that you need more than just the model you probably need documentation like auto doc and Driverless AI. So we generate an experiment documentation which will tell you what happened and experiment which is very useful. It's it can be you know, roughly 50 pages of explanations and other kinds of things. You probably want model Shapley, you want the Shapley values for the model. You might want to run interpretation through our MOI or some kind of any other tool where you're trying to understand the significance of certain features through partial dependence plots, whether you're What's the trend versus As other features is for the predictions, and you might want to just analyze the model through a variety of metrics, what we call diagnostics. And then of course, you want to share the model. with other people, we do that through projects or import export, you can you want to rerun the experiment without having to go through the whole process. Again, we also call that refit. And you might want to tweak the model. So maybe you like it. But there's one feature that you want to remove, you should be allowed to do that. Otherwise, it's a two blackbox of a tool. And finally, you want to be able to track performance, and maybe swap out models on demand and ml ops platform. So there are a variety of animal tools out there. And you can consider them as there's like two kinds of rough groups. One is open source, and one is kind of closed source or a little bit mixed. Now open source could include H2O, three, auto ml, there's also a large number of other packages. Some of the most powerful ones, I would say, our auto SK learn 2.0. It has a lot of advances, like auto ml. And even flannel by Microsoft is pretty good. Amazon has other glue on. And there's a variety of tools here. Some of these are not in favor anymore. Like if you say teapot. There's nobody who would rely only on teapot, for example, at this point. It's good. It was a good starting point a long time ago, not anymore. And then there are close source products like Driverless AI. And they're mixed because Driverless AI does provide an like the full exposure of the mode of the Mojo artifact, you can you can stare at the full contents as it's as if it's open source, we can also expose the options within Driverless through pipe, you can write arbitrary Python code to modify any experiment. And other companies also provide some access to either looking at the source code or something like that. So there's a variety of tools. So how do you how do you choose which tool to use? I just want to bring up in the recent Gartner H2O AI is auto ml use case was highest ranked? That's good. And I'll explain a little bit why that is. Why did it Why Is H2O doing well and auto ml? Why is it doing well in accuracy? Other but you could say that, in reality 3.7 vs 3.6. A is that's not a big deal. So the platform perspective is required, how well would this tool work with him an ecosystem of ml ops, interpretation, running interpretations, documentation for the use case of any kind of use case of needing to report on the performance of the model, like your review board?

So how do you choose? Well, there's actually there's always a risk of going to a benchmark war, which is we affectionately call it in, when we're looking at these different tools. This happens in any field. But there is a nice tool to do a benchmarking of AutoML for tabular data. And it's it was helped put together by one of our chief machine learning scientist Arun. And she helped with her collaborators put together this website, and open ml, which I'm showing here, it's a very nice website where you can basically download 10s of 1000s of different kinds of data. And it's very well annotated. And this particular AutoML benchmark, what part of the website is runs the any arbitrary tool that you set up on a variety of different kinds of datasets, maybe 50, or hundreds, and it will tell you how to perform. So this is quite important to do, especially at least internally to understand how your own tool is performing. So this is just showing a sample of how Driverless which I worked on performs compared to some other tools that I mentioned. And these are just open source tools. Later, we'll have blogs about how we compare to our own other tools internally, like H2O, three auto mount or closed source, other closer's tools. And for this talk, I'll just talk about how we compare against other open source tools that are quite strong. And so here we have make it a little bit more clear. So here we have a team table, it looks like a very busy table I've chopped it into. And we have a variety of different datasets that are shown by name, both on the left and on the right, and the first column. And then there, it tells you how many rows the dataset house, how many columns has and how many classes. And all of these, there are 39 datasets here. But all of these are classification type problems where you're trying to classify instead of a regression type problem where you're trying to predict the actual number. So you can see Driverless car here, and what is been done is to take the either accuracy or log loss and try to write it in a way where smaller is better. So this is either log loss for particular dataset after one hour of running, or it's the one minus AUC, if you know what that means. So we just take the area under the curve ROC curve, and take one minus that. This is what other tools have done in various papers. And so now you have that the smaller the number, the better. And you can kind of compare across tools. And one thing you can see is that well, what I've done is I've made it bold, whatever the tool has done the best. And I've underlined it whenever the tool has done the worst. And we tally up those things over here. So there's a certain number of best a certain number of worst for each tool. This column was Driverless AI, this one is llama. This one is Microsoft's flannel. And this these two are auto SK learn. And SK autoscaler. Two is sort of one of the more advanced tools that's come out in the last year. So at least the goods, these open source tools, you can see Driverless does a pretty good job. It's not the best always. There are 39 datasets, so we're best on 20 of them. But we're not worst on any of them. Yeah, however, some of the other tools, they're always worse on something. So you could ask why is that relevant? Well, let's just say, first thing I can point out is that in some cases, Driverless is quite a lot better. For example, in this dataset, which has a quite large number of rows, 4 million rows, 60 columns, but it has a large number of classes, that's a really challenging case, about 335 classes. And the log loss were the basically measurement of how accurate is, is substantially better than a lot of other tools. Some other tools just really have problems, such a problem.

Such a use case. And like case, where journalists does incredibly better is this jungle chats just another kind of problem where it has certain number of rows, but 50,000, only very small data set seven classes, seven columns and three classes. And the log loss is 10 times smaller than any other tool. You can say, Well, John, that's great. Driverless does well, sometimes, but that's not super interesting, because most of the time, it's just going to perform similarly. So that's why I mentioned this last thing, which is how often we're the worst. So if you're looking at some of these other tools, for example, Mama, you can see that in one case, they're not just worse, they're really bad. And that doesn't usually happen. But it sometimes happens. And this is a bank marketing case where you're trying to study whether or not your marketing campaign is going to do well, for a particular bank. And you have other cases where other tools like auto SK learn, does substantially worse. So here we have a case, like this jungle chest, one where it's substantially worse than all of them. This is supposed to be the most advanced tool. And it's even worse than their prior iteration autoscaler. And one. So the point of this saying all this is that you don't only want a tool or the tool, which does the best, but you don't want it to do worse, because you might have an arbitrary use case. And you want to avoid the risk of ending up with a model that would have been worse than any other tool. So that's what journalists helps avoid. We spent the time to figure that out. So let's move on to make sure that yeah, let's move on to how do you do auto ml? What are the best practices? How does how does Driverless successful compared to some of these other open source tools? And there they use quite advanced techniques, but Driveless uses advanced techniques, but a lot of the reasons for the behavior the good behavior is because we keep it simple. We don't overcomplicate So, for example, When you're doing auto ml, the very first question after you have the data is how should you do the searching of all possible hyper parameters and all possible features? Well, they're very simple baseline models that you can make, they can just be very simple. If you have a low cardinality, which means that a column feature has very few uniques. So maybe it's just four different values in the column, you just do one hot encoding, that's reasonable. If it's higher cardinality, many uniques in the column, you can do frequency or target encoding. And you want to try a variety of models to see how they perform at least you know how they perform. That's one of the good things about Driverless it can be a benchmark, regardless of the use case, it should be three methods GLM, multi layer perceptron, may at least one of each to see how it would perform. And you also need to try cases well, what Sometimes, though, features can make things worse. So for example, one hot encoding in some cases can overfit. And it will do a poor job. So you can't just assume that one hot encoding will work, you need to try disabling one hot encoding or disabling target encoding. And target encoding, by the way is if these different things I can explain what they are real quick. One, Hot Encoding is just where you take a feature that has, say, a certain number of levels, like say five, and you turn it into new five new columns, each with 101 inside. So you've kind of may add new columns to tell you which level it was. Whereas frequency encoding just counts how often some number appears in the data in the column. And target encoding literally takes the target and says for that given value of the future, what is the target, and it memorizes that it sounds like a really bad thing. But if you do it very well, in a controlled way, it's actually quite good in a lot of cases. And lastly, you might also want to only do target encoding and no other kinds of

encodings. And you also need to, like start from that baseline, but you need to have some kind of models, which are a little exploration. And this is a concept called exploitation exploration, you have to balance these things, you can't only rely on what you've always done need to explore a little bit. And so what we do, and this is kind of a common thing is that you can try a model, use some measurement of its variable importance to see how valuable each column is. And then now knowing that you can tell the next model to build features that use that fact. So for example, you might want to combine columns together into a new feature. And you can use the top performing features from the prior model to build a new model that for that input of the new feature. So for example, target encoding and frequency coding can take multiple features, it takes multiple columns, and then combines them into a single string or value. Merge just merges them together. And that's a key upon which you memorize the target, or you count frequency. So sometimes you want to have that the higher interaction depth as we call it in Driverless. And you can do that. But you got to be careful about which features you choose because the search space is so large, so you can use what the prior model said, to iterate on that and choose those features. And sometimes you need just random features. Sometimes you get lucky. And you might run 10 experiments were run a leaderboard we call leaderboard and Driverless and that you want to do you want to see that that work? And you might have exceptional situations you, for example, wide data, it's very risky. You might think, well, I use a GLM. And I have very wide data, which means you have many, many columns, but very few rows, maybe you have 100,000 columns and 1000 rows. This is very common in genomics data. And it's quite risky to use anything like a GLM, because it will pick out the exact features that work. But will they work again, will they generalize, and often they don't. Because there's always some shift, when you go from the training set to the test set a real test set. So it's very risky. So you want to use a random forest, which is less likely to be biased, even though it has a large variance. And then finally, when you do the search, a lot of different tools do a differently, you could use a genetic algorithm, which basically treats models as individuals. And this is what one of the things Driverless does. And this is what teapot originally did. So basically take every model, and it's parameters and features is an individual. And then there's a population of individuals, population of models, and you score them and then you say, Well, okay, How do they do, you basically can make them into a competition of maybe a pairwise competition. And if the losers drop out, the new remaining survived individuals will create offspring with a mutation. And the mutation is just a change in hyper parameter or one additional feature, or maybe removing a feature that was low importance, those kind of things. So that's the genetic algorithm. And they can also share the different individuals can share features across them. And, of course, while you're doing this, you have to be very careful, you need to balance this exploration versus exploitation problem in machine learning, you need to maybe explore early, but then as you dig in, and you have a more accurate model, you want to exploit what you've already learned. But then maybe you get the model modeling process gets stuck. And then you need to do more exploration. And you can imagine that doing this manually be quite tedious. So that's what an auto ml tool should do. And of course, while you're doing all this, it's very important to validate everything, you know, Crossfield validation repeats, which is Biswal repeat is where you just change the seed of the different the sampling use used for the across full splits. And it gives different data and each one of those splits. So it's just like a whole new experiment with when you do repeats in new for you have time series need to do time based splits. And even some of the features that we build, like Target encoding, it's very sensitive to leakage, it can leak the target into the, into your data into your feature. And so you need to do sometimes cross validation within cross validation. It's called nested cross validation. So that's, that's useful. So all these things put together means that in the end, you have a lot of good practices. And that's what we've put into Driverless. And so I'll switch the screen now.

And then go to Driverless and give you an example of what I was just showing on the main screen. So this is an example run where I just wanted to show how dry listen improves. So we have here a plot of the log loss. So lower is better versus iteration. And you can see at the beginning, it has some kind of log loss of point 467, which is high. But it's still a pretty good model. It's has one hot encoding, target encoding, it's trying to do the best they can still. And it's has maybe a little bit more features than you originally have. And then it tries to do the search process that I just described, where we're trying to figure out early on among all these different baseline models, plus some other random models, which one's best. And then it switches gears, it goes into a new mode, where it's doing the genetic algorithm. And then it starts doing these mutations and trying things. So these see all these dots up above, these dots correspond to where it's tried something that didn't work, too high of a log loss. And it's trying to find the best track. And eventually it does a model which is significantly improved compared to before. But you needed a tool, which in this case, it has something like 400 features, but maybe there's too many. So you want to be able to control all that. So let me show you an example. Okay, so here's Driverless. Usually, we have our data set, we already I'm focusing on the part of the process where we've already consumed the data. And we're thinking about this auto ml. So we might ingest some data set, which has a credit card data set. We have train and test. And then in Driverless. The view is that well, you can go check this data out, look at the details, I'm just only using the left click. And you can view all the different new histograms about the data with the properties of the data. You can look at particular data set Rose says actually see the raw data. You can change the how we treat the data. You can also modify the data with the recipe. So that is you can actually generate arbitrary Python code, which I'll talk about in a moment. But once you have that data, you can do a variety of things. You visualize the data, which I won't get into right now. Split the data in case you want to split the data yourself. But the most important thing for this talk is whether how you predict so Ardo, the CTO gave a talk a masterclass talk a while back if you Two weeks ago, talking about what we have is the wizard. The wizard is very nice, it basically makes the process of using Driverless very easy. I'm going to show just the standard classic version, since he already went through that, focusing on the auto ml aspect as opposed to the experiment setup process. So we run predict, and predict means that you're trying to do something with the data. That's all. It can be supervised or unsupervised. So we support a variety of unsupervised models built in. And all of these things like your K means truncated SVD, or some aggregator stuff, isolation force anomaly. But, you know, maybe we're just doing supervised, if you're doing supervised, you just select your target. So we get down and it's like, well, we're trying to figure out for this credit card data set, whether or not they're going to the person is going to default on their payment next month, from the given data of whether they've been paying their age and stuff like that. And this is the classic non wizard view of Driverless, it's a little bit dense, which is why we think the wizard is quite useful for anybody who wants to learn more about data science, or once a an easier experience of setting up Driverless, the wizard is very good. It's growing quite quickly. And it does a lot of things helps you turn off this symbol. And the it's better than the wizard is also good. And since it avoids overwhelming you, but in this old, classic view, you can see that you can choose a variety of things. And what should I do next? Well, that's, that's what I want to think about. It looks like I can choose a time column, but I'm not going to choose Time column from yesterday. So I can control the accuracy, which really just means choosing how confident you want to be in the accuracy. So it's a little bit confusing. That's why again, the wizard is useful.

You can show how long you want to spend on this process. And you can if you go to a different dialogue, you can choose just how long hours or minutes and this interoperability dial will control how many features there are. So more interpretable will be fewer features, more likely to GLM, for example, to be more interpretable. And you can choose which metric you have. Here we have just AUC. And you can choose what kind of problem type you have. And over here will provide you with a preview just tell you Well, given these accuracy, and the data will build a GLM will do like GBM, xgboost, will do three fold cross validation, we'll finally in the final model will blend them up. And we'll tell you a little about the features, we're going to be able to target encoding, clustering, frequency encoding interactions between features, like plus and minus and divide. We'll do one hot encoding and weight of evidence. And we'll treat the feature just as it was an American feature. And so if we go with that, we launch, you could either launch it by selecting the experiment. Or if I repeat it, it can go. And that can actually launch a leaderboard. I won't do that at the moment, but a critical test and it will launch 10 experiments at the same time. And you can study in a project, how well they do. So let me go back to this one. So that's nice. So then, let me finish real quick. So it's, it's showing the accuracy again, over here, it's showing you very important, what kind of features like one hot encoding, maybe it did an aggregation trance transformation. So let's go, let's let that complete. It's relatively quick. I didn't try to get it to do improve the accuracy. But that's basically what the tool is doing eventually is going to search for the accuracy like I showed in the other picture. And what I want to shift to is now that you have a model, and you've believed it's accurate, like I've showed over here, you have something that's improved. It's the auto ml tool. Well, what do you do with it? And how can you go further. So if we let this finish up, one of the things you can do with Driverless is not a look at datasets visualizations, manage your experiments with including with the whole project view, share them with other people in the project. But you can also interpret the experiments, which I won't get into in this talk. But I will talk about this thing called recipes. So I'll just cancel this otter report since I'm not going to show that either. So we're done here. Now what can I do? Well, I can continue tuning the experiment. That's one thing I can do. And what I'll focus on in this talk is what we call recipes in general. So what you can do with an experiment, you can actually create a recipe out of the experiment. Upload it will validate the recipe and you can write your own recipes. This is arbitrary pie some code is a little bit small font, I apologize for that. It's arbitrary Python code that for any experiment within Driverless time series, whatever, you have the full code for how to control the every detail, the model hyper parameters. So for example, we show that Yeah, down here. So the model type here is if you can see, this is like GBM, the model parameters, and every single feature, there'll be crate down here is listed. We have a columns taken pay zero education. And if you want, you can say, well, I don't want to use, for example, of all these features. Sex, I don't want to be biased against that. There's different perspectives, you can say you need to force that in to see how biased your model is, or you need to remove it so that your model is unbiased. But let's just say that you want to remove it. So I removed that feature. And then I can save that feature. And now I have a new recipe, which then if I do more actions, and I use it in a new experiment, over here, you'll notice in the preview, it's talking about custom individuals. So this is a custom, we call custom individual recipe. And I'm able to list the individuals here, it has a funny name because it was auto generated. And if I launched this, then I'm guaranteed that it won't use that feature. And not only can you remove features, or do little tweaks to the hyper parameters, you can add features, you can do pretty much anything you want. So this is what we call per feature control. This allows you to take any experiment and do art. It's just Python code, you can do loops, you can do anything you want. And in this Python code. So let's go back to that.

This would be in morning here. idiom. So you can do anything you want in this arbitrary Python code. It's, it's just a class. And if you're good at Python, this gives you that access. So suppose that you want to do more, and you don't want to just control the experiment, you want to modify the type of model that you have, because you have some domain expertise, you can do that. So for example, just to show you one recipe, this is a I'm not sure if I can make this bigger. Let me go to the actual talk. Or I have this one. zoomed in a little bit. here so this is again, a what we call a recipe, you can edit arbitrary Python code and upload it to Driverless it will have, it can be in this case, doing a custom model that even uses our own internal Driverless stuff, you can reuse our own stuff. And we have about 200 or 250 recipes online that allow you to see examples of all this. This is a custom model. And it's basically just doing a custom objective, the objective happens to be some asymmetric objective. But you can see it's just kind of simple Python code is not that long. And you can do this too, you can write an arbitrary model yourself. And it may have domain it may have specific features in there that are expected. And you can use those features. You can do a lot of powerful stuff with an arbitrary model in Driverless. We call that model recipe. Another type of recipe is a transformer. I guess I'll show a slightly different one recipe. Log transformer. Yeah, so this is quite short. It also builds the Mojo the Java code, I can show that in the other views so that you can actually see it a little bit better. See, it's right here. So this shows you a very simple just takes the log of the input, and it's in a table. So it's really fascinating tables are own. It's an open source product, but we built it with our team, H2O. But you can use pandas, that's fine. You can basically you can also import arbitrary packages in any of these kinds of recipes. As if you were PIP installing, you can also for models and data recipes. Also data recipes and moment. You can even have an arbitrary environment which is totally different from Driverless so it gives you full capability and The other examples in on the website we have for that. So we have this website called ACPI, driver LCRs dash recipes, which is open source. There's a huge number of types of recipes like the individual recipes I showed. There are model recipes, like I showed the transformer like that log transformer. And there are data recipe. So a data recipe, one of the last things I'll show is like this one. So let's do this. So in this case, what's interesting is I'm showing that you can even download data through some safe, secure URL, it could be internally within your company. And you can create data and this, this can change the number of rows, it can do, basically anything. And it's still just arbitrary Python code that has, it could have even its own environment. It's quite flexible. So the recipes are quite extensive. And this is another tool that Driverless gives you for basically making it possible to customize things for your domain specific needs. So let me wrap up. So in summary, auto ml helps by automatically, efficiently reliably exploring all your options, basically.

And if you want to balance, I didn't go into the details of this, but you might want to focus on interpretability. In that case, you might want to reduce the number of features and we make that easy in Driverless AI by either the interpretability dial, you can raise that up, and we'll automatically kind of do a good idea of what we should be doing. Focusing on a single model GLM. Few features simple features. Or you can fully control it with that custom individual. Or you can go to I didn't show the expert panel, where you can basically control all of Driverless from the UI, you don't have to do Python code. And the wizard helps with that, like Arlo showed before in one of the other talks, as well. It needs to handle diverse situations and find the balance between all the kinds of issues and it should avoid making mistakes. So as I mentioned, driver size, one of the best, especially compared to open source tools, one of the best tools out there, it helps you avoid that risk of the model ending up as one of the worst. And that's a good thing. And it finally it helps you explore all possible customer things. You can have custom models, I didn't go into custom metrics, but you can choose how to optimize your model arbitrarily. And this really helps you dig into your domain expertise problems with whatever specific you have. Okay, that's it. Thank you guys.

Generative AI

Predictive AI

On-Premise Platform

Managed Cloud

Hybrid Cloud

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

ENERGY

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

Submit AI 100 2025 Nomination

2025 Gartner® Magic Quadrant™

H2O AI 100 2024

ON DEMAND

Accuracy Masterclass Part 5 - The Last Mile of Accuracy

3 Main Learning Points

Read Transcript

Why H2O.ai

Products

Resources

Insights