ON-DEMAND WEBINAR

Introduction to Machine Learning for All of Us

Read the Full Transcript

Vance: Our experts just continue on here at the Intelligent Data Summit. This is the session for H20.AI, and rejoining us is Rafael Coss, Director of Technical Marketing. Rafael, welcome back.

Rafael Coss: Thanks, Vance.

Vance: Rafael has spent much of his career in the world of intelligent data and we’re really glad to have him with us this morning. Prior to joining H2O.ai, he was community director and a developer advocate at Hortonworks, and at IBM, he served in several data-centric roles, including with Big Insights. In fact, many of you may know Rafael or know his name, as he is a co-author of “Hadoop for Dummies.” His passion now is to make AI and ML achievable for every company. And to that point, we have his session this morning: “An Introduction to Machine Learning for All of Us” with a focus on a beginner’s guide to automatic machine learning. We all know machine learning is a specific subset of AI and it’s exploding for apps and adoption, but often ML can require some special skills which will delay the big benefits that many companies are looking for.

In his session this morning, Rafael is going to bridge that gap; we’ll learn what he sees as the core basics of ML and how automatic machine learning is making capabilities more accessible to a wider community of people. And before I turn it to Rafael, just a quick reminder that you can download the slides. Just click the big red button under the view screen. We also have some great takeaway assets for you today; they’re all available with those links with no extra registration required. You did that to join us today. And there’s even a link to the free trial and tutorials, which we highly recommend. So to connect with any of those just click the links below. And with that, Rafael, let me turn it to you and tell us about An Introduction to Machine Learning for All of Us.

Rafael Coss: Thanks, Vance. So today, what I want to do is tell you a little bit about H2O.ai and I really want to get into the AI fundamentals, talk about how AI is transforming all industries, and then position automatic machine learning within that context. And then to wrap it all up to see this capability, how it’s really accessible to everybody, I want to do a short little demo.

So with that who is H2O? So H2O.ai Is the open source leader in AI and machine learning. We’re focused on democratizing AI for everyone. We want to make your company into an AI company. So we were founded in 2012; we just got a series D funding this summer. We have a wide array of products; many of you might’ve heard of H2O.ai from the open source community. We have a distributed machine learning engine that’s been available in open source for almost eight years. We have something like 20,000 companies using that open source tool, over a thousand universities, and a very open and big community around the open source and our commercial offerings. So we have offices around the world and that’s a quick introduction into H2O.ai.

So let’s get into what is AI. So AI is a study under computer science. If you look it up on Wikipedia, it’s an ideal agent machine that’s flexible as a rational agent that perceives its environment; it takes action that maximizes its chance of success at an arbitrary goal. And so, you might listen to that definition and go, “Huh?” And so, let’s make it a little simpler. AI is the ability of a computer to learn and reason like humans. And there are various techniques that can make that possible. And AI had actually had a very rich history. At its core, in its foundation, it’s really about math and statistics. The thinking around AI really developed around the ‘50s, and there’s certainly been multiple generations of AI since then.

But one of the key trends that’s making AI progress through all of this is, one, the algorithms and techniques to find those patterns. Secondly is the ability to leverage data to find patterns and the third is compute. Because finding these patterns, and going through these algorithms across lots of data and nowadays through big data, needs a lot of compute. So the fact that these three things have been commoditized is a key enabler to make AI a reality today. And that’s why in 2020, AI is spreading like wildfire through various enterprises.

So within AI there’s an area called machine learning and historically in AI, there were expert systems, but machine learning is really being able to leverage a series of algorithms to learn and make predictions on data without being explicitly programmed. So machine learning is about learning from data. So as I mentioned, data is key, so data is everywhere. With big data, and the digitalization of the world more and more, data is becoming available and that data has patterns. We can identify customer experiences and the interaction between our customers; it’s continuous, it’s on everything. It can be in our supply chain, it can be in our devices, it can be even in wearables, so it’s everywhere. The question is, how do we leverage that?

So machine learning is that study of computer science where we want to learn without being explicitly programmed. So what are the kind of things we want to learn? We want to be able to find a category. Is this tweet positive or negative? Is this person going to default on a credit card? Is that a yes or no? Or I want to be able to find a number. So I have my sales, I want to predict my sales into the future. And lastly, what machine learning can do is find a grouping. I watch videos online and I want to figure out who are the different groupings of the folks who watch videos online on Netflix. So it’s like this group of people are all similar, so that’s a cluster or a grouping of folks. So these are the three key capabilities of machine learning.

Let me dive into one a little bit more about being able to find a category. So when we look at finding a category, I want to use something called supervised learning. And so in supervised learning, what happens is, I’m going to learn by example. So in this case I’m looking at a dataset around a credit card and I’m trying to figure out if someone’s going to default on a loan. I needed to give it a bunch of examples and from those examples, which I refer as my training data, I’m going to try to find a pattern. And so, if there’s new data that doesn’t have that label, I’m trying to identify how can I define that pattern. And what happens in machine learning when we find that pattern is that it goes into this thing called a model. And once you have a model, you can use that model to guess, using new data, how you should rate that. So in this case did someone default or not default?

Part of this is we’re seeing many people embark on this AI journey. The AI journey we see has five key elements. It’s about creating a data culture and creating insights from that data culture. It’s about asking the right questions. It’s about leveraging the community. Someone that is from a learning perspective and someone that’s also from “it takes a village to make this happen.” So it’s not just the data scientists, but it’s the developers, it’s the data scientists, it’s the business leaders. And within your enterprise you have to work together to make this AI transformation really happen. Clearly there’s a technology consideration: what kind of tools you’re going to leverage. Can you leverage automation in those tools to accelerate you in this path?

And lastly, as you’re looking at machine learning and having a model or a machine make decisions, you need to develop trust within that enterprise. And sometimes it’s not just developing trust within the business leaders but oftentimes there’s regulations. So there are corporate regulations in finance, and more and more, we’re starting to see different governments develop regulations about, “Hey, if you’re going to use AI, you need to follow a series of rules.”

So as we talked about this is a team sport. Making this transformation really involves your data scientists, who are trying to look at this whole process of building these models; it’s also the developers who are looking at changing applications, leveraging the models, integrate it into their environment, and sometimes also starting to dabble in data science. Lastly is the business leaders,, because they’re providing insights into where the business is going and hopefully building that trust to be able to shift from decision making from the business leaders to these potential algorithms.

So AI is transforming every industry. So we’re seeing a massive increase in spending in AI. Year over year, we see over a 300% increase in spending. We’re also seeing jobs for folks working in AI with a 200% increase. And lastly, we’re seeing AI as a priority, and particularly, machine learning as a priority for these various companies. So as we’re looking at trends in AI and machine learning and we’re really starting to see AI graduate from an innovation lab to something that’s across the enterprise. And we’re seeing companies moving from experimenting and maybe building a couple models, to tens of models, to potentially building hundreds of models and generating this model factory.

And so as folks are building this model factory, they’re seeing challenges around managing that change in AI, whether it’s implementing that within their enterprise and the deployment of it, or just getting that cultural change to happen. And lastly, we’re seeing more verticalization of AI solutions and a wider array of people wanting to get involved. So it’s like the data scientists, the citizen data scientists, and maybe even a savvy business user that can potentially start leveraging AI.

So let’s take a look at some use cases in AI. So as we look at the use cases in AI, we can see that they’re across all these different industries such as financial services, healthcare, telecom, marketing and retail, IOT, manufacturing; ; the list goes on and on. There are folks leveraging AI in all industries. It’s really trying to take that next step in analytics. So maybe you’ve been doing predictive analytics using a data warehouse, using Hadoop, looking back in time, but can you start looking forward in time and leveraging that forward insight to maybe predict churn in a customer, to better understand the patterns that are happening, and maybe predict fraud within your enterprise. That’s going to deliver to you more value. We’re trying to figure out how to save you time, save you money, and get you a competitive advantage.

But we’ve seen a series of challenges in order to make AI a reality. Some of those challenges are around talent, such as finding the developers who can put this into production. And secondly is how much time it takes for them to go through this process. Machine learning can be very compute intensive; sometimes just waiting for the machine and the algorithm to finish could take days, maybe even a week. And lastly is if you find the talent, if you invest the time, can you trust it? And how can you develop that trust and facilitate building that trust quickly?

So as we look at a machine learning workflow, it can be fairly rich and complex. So we can just start with the data: exploring the data, preparing the data, going through an optimization phase of tuning models, and selecting models. And then you are moving from a phase of training models to the deployment of models, and you’re actually getting to a point where you’re making predictions in an application. So this can be very complicated and rich, and that’s part of the challenge as to why you need so much talent. And that’s where we’re really hoping to leverage automatic machine learning to make this even simpler.

So as we look at that same workflow at a little macro level, there’s a data preparation and then there’s a machine learning aspect. And part of the technologies and the benefit is that we can leverage automation here. Like many things in software, can we bring more automation to make it available to even more people? And so that’s really where automatic machine learning comes in; can we automate this whole data transformation, quality exploration, model building model deployment, and in an automated fashion make it available to even more people?

So what kind of impact can this automation have? Well, it can definitely reduce your time challenge because you can now have the benefits and the insights of expert data scientists and Kaggle grandmasters built into a tool. And by the automation, you’re going to reduce the time in doing the processing, but because this automation, it’s also not only leveraging CPUs but GPUs, it’s also going to reduce some of the time that you need in building models. And lastly, it’s not just about the building the model and exploring the data; it’s about looking at the results of it afterwards. And can you generate explanations that can make this understandable by a wider array of people or even by a regulated industry.

So with that, we want to introduce this whole notion around an AI platform to help you make your company into an AI company. So we’re looking at automating the whole machine learning process and actually leveraging AI to do AI. And not only just delivering and building models, but helping you move to the deployment of those models and understanding and generating explanations for those models. And this platform should not only automate this process and give you all this nice automation, but it should be something that’s open and extensible. So that way, if you want to bring your secret sauce, your IP, your understanding in the business, and you want to mix it with the capability of the tool, you want to have your cake and eat it, too, and be able to leverage the combination of those two.

So what does that look like? So, we’re going to talk about a tool called Driverless AI. Here’s a flow about how the machine learning process works. First of all, you’re going to connect to a dataset. In that dataset, you’re going to identify where the pattern is; they refer to that sometimes as a label or the why in this case. And then you’re going to go through process of exploring the data; making sure that you have good qualities because garbage in, garbage out. And once you’re done with that quality check to have that business discussion, then you go through this model optimization phase, and this is where the beauty and the power of automatic machine learning really comes in. Where a tool can automatically select an algorithm to that algorithm, do the hyperparameter tuning, do the feature engineering, and do this in an iterative passion in the leaderboard to give you the optimal model.

And fourth is, we want to have a platform that’s extensible. So yes, there are these great things as part of the model optimization phase, but we want to have an ability to bring your own feature engineering, your own transformers, your own algorithms, and your own scores that you’re optimizing into this. And then once you have an output, you want to look at the results, look at documentation, look at explanations, and when you’re ready, you want to go quickly from a training phase to a deployment phase, where you have an artifact that’s ready to go into deployment.

So let’s talk a little bit more about deployment. So we’re going to have this notion where we want to train once and run anywhere. As we described, machine learning is being used in various enterprises and various use cases. So sometimes you’re going to leverage that on prem, sometimes you’re going to do it on the cloud, sometimes you’re going to be running in the backend system. Or in this case, you might be working in a IOT situation, where you’re looking in a smart phone, or a car, or a watch and you want to deploy that model into those environments.

So let’s talk a little bit more about deployment. So we want this deployment- ready artifact and we want the capability to train once and run anywhere. So one of the benefits of H2O.ai is that we produce this thing called a MOJO. A MOJO is Model Object Optimized and it’s a representation of the models; it includes the model and the feature engineering, it’s a binary representation, it’s fast, and it’s portable.

So this is train once and run anywhere. So you can run it in the cloud, run it on prem, or run it in hybrid. You can use various run times; you can have a Java runtime, a Python runtime, an R run time, a C++ run time; it’s very flexible and embeddable. So you can run this in a batch in an area within an application or a database or in real time. So you can run it in rest, streaming, or an IOT situation, where you’re deploying it into your real-time environment. Lastly, you want to have an environment that’s algorithm-independent to simplify that deployment mechanism. So instead of having to deal with all these different algorithms and all these different run times, you just have one runtime that can work with all the different algorithms. So the beauty of the MOJO is that it gives you a deployment-ready artifact that you can train once and run anywhere.

The next thing is explainability. We talked about building models and using the optimization and the automation to do that, making it easy to deploy, but we want to be able to understand what is being built. And we’re going to help generate trust and understanding by not only automating the process of building models but also automating the process of building explanations. So there are lots of statistical and machine learning techniques out there, like LIME and Shapley, Variable Importance, and Partial Dependence that Driverless AI can generate out of the box. It can also automatically generate documentation as well as start doing things around bias. So for example, disparate impact analysis to understand if there’s a bias in your models, or more importantly, whether there’s bias in the data that have built your models. And also, if this is not reacting or behaving the way you’re expecting, can you debug what’s happening.? And so this is what’s coming out of the box ,not only in building the models but actually generating the explanations.

So how about we actually see a demo and check this out. So this is Driverless AI, and this is the UI, so unlike many other machine learning tools where you have to program, here it’s optional. So you can use a UI to go through the process of building a model or you can leverage a Python and our client to build a model. So let’s start from the beginning. So in this case you can go into a tool and load a dataset. You can connect to a file system, you can connect to a file system on the cloud like an object store, whether it’s Amazon, Azure, or Google, or you can manipulate a relational database. We have many connectors that you can leverage to bring data into the environment.

And today what we’re going to do is go through a credit card scenario. So I’ve already brought in the data; let’s start exploring it. So let’s first look at some details of this credit card dataset. And so we very quickly can get a profile of the schema and the distribution of data within that schema. So in this case, it’s a credit card dataset, we’re looking at some demographic information such as their education, their marriage status, their age, and then we get a series of historical features.

For example, we can see what’s been their payment history over the last six months, whether they made their payments on time or late. We can also see their bill amount over the last six months, and how much they have been paying over the last six months. And most importantly, we want to see an example of whether they have paid or not. So in this case, it’s looking at the default.

As we continue on that exploration of the data, we want to actually visualize understanding what’s happening. So many times, data scientists could spend a lot of time generating all kinds of graphs and trying to understand what’s happening and the quality of the data and the behavior that’s happening to try to deeper their understanding. In this case, this is where you’re starting to see the AI do AI. So we automatically generate a series of visualizations, depending on the patterns that are in your data. So sometimes it can be six visualizations, sometimes it can be 13; in this case we see 10. So let’s take a look at a couple.

For example, we can look at outliers. Here’s your distribution for bill amount five. And you can see there are two outliers here, and if there are multiple ones, you can scroll through them. You can actually look at details. So we’re looking at bill amount four here. And so here it gives you an opportunity to have a discussion with your line of business: “Hey, is this a typical range for bill amount four?” Or maybe it gives you an opportunity to have a discussion with your data engineer that says, “How was this value created? Where do they come from? How did you do the data prep? Did you merge multiple datasets?” It gives you an opportunity to explore the quality of the data and have a discussion with folks working together on this.

One of the key things that we want to look at is correlation. So each graph has an explanation. So typically in math, if you got your homework back in calculus and it’s all red, you did bad, but here when we look at this correlation graph, we can see that a high correlation is in red, so red is actually good, and a low correlation is in blue. So as we look at our correlation graphs, we can filter out and see there’s a high coloration around bill amount. So if you owe $1,000 next month, you’re probably going to owe something similar which makes sense. And there’s a correlation around how you make payments month to month, so maybe you start missing payments, or you start getting ahead of payments or below payments; there are some correlations there.

But we also want to explore some of these other correlations, so we can actually click on an individual value and see that at payment zero there are some weaker correlations with the following months. Or we can look at bill amount five and we can see that there are some high correlations on payment four, and the month before and the month after, and some weaker correlations on other historical payments, but there seems to be a weak correlation around bill amount five. So again, this is where we’re automating, helping the data scientists and the folks doing machine learning understand what are the patterns in their data, the quality of the data, and help them strengthen the understand because garbage in, garbage out.

But once we go through this process, we’re actually ready to actually start building a model. So let’s quickly go through the process of building a model with this credit card. So we can go in there, and say we want to make a prediction; we’ll give it a cool demo name one as a name. We can select a test data set, so here’s my test data set for a credit card. I don’t need to provide a validation. I could drop some columns if I needed to and lastly, I want to provide my example. Where’s my label data? And so, again, this is where we’re using AI to do AI. And so Driverless AI has figured out this is a classification problem. We’re working in an environment with GPUs and it gives some suggestions from key settings. So were looking at accuracy, time, probability and a score. And this is just the high-level knobs that are available; there are many more knobs that are available under expert settings.

I want to just take a look at accuracy interpretability a little bit. So accuracy is really influencing the algorithms available, so in this case, it’s suggesting algorithms to build a model here. It’s going to look at building an ensemble of eight models and here’s a set of feature engineering techniques. So as I crank up accuracy, I can see potentially the set of algorithms change, and so it did change here; the level of ensembling changed and the set of feature engineering changed. There’s a similar correlation around interpretability/complexity.

As I crank up interpretability, what’s going to happen is, I’m going to start looking at building a little bit simpler models. So I’m going to lower the level of ensembling, change the level of models, and change the level of feature engineering. So I can see some feature engineering techniques being eliminated, some algorithms being reduced, and at the end, we can see some ensembling being significantly reduced. So at this point now, we’ve completely removed the ensembling.

For the quick demo, I’m just going to reduce this back to seven; actually, I’m going to reduce it back to six so we get to seven. Bring this down to one. Let’s look at building a model quickly. So let’s launch this experiment. So we’re launching this experiment, it’s going to start going through a whole series of phases looking at the trained data set. And one of the nice things about building an experience is we give you all these notifications. So we look at how balanced the data is, and we look at shifts in your data between your training and tests. If there are suggestions to make improvements, we can provide those suggestions in the notifications. This environment is actually running with eight GPUs, so you can see the HTPs starting to fire as we look at understanding these data sets. Again, it’s starting to detect some differences in some of the different values.

We’re going to check for data leakage, so we want to make sure that the answer is not in the problem or in the dataset. Or we can also start the process of building models. So as you start building the process of building models, Driverless AI is using a genetic algorithm, and what it’s going to do is going to select an algorithm, it’s going to go through doing some feature engineering, it’s going to select some hyper primary tuning for that algorithm, and it’s going to do it in an iterative fashion. So at this point we’re one minute in, and we’ve already developed four models. So each one of these dots is a model that’s being built; the winning model is light GBM and the AUC for that is initially 0.78. But we can hover over some of these other ones and so we can see here’s a model that was built using a decision tree. And as we hover over them we can see the variable of importance change, because it’s looking at different variables and leveraging different feature engineering techniques to understand that.

With that, we give you lots of different metrics, we give you an ROC, we can give you the confusion metrics, key data points, and we can look at precision recall, lift, and gains. And so now it’s going through this iterative process in the genetic algorithm and trying all these different permutations, and this is going to go on for a while. So let’s treat this as a cooking show and look at something that’s been prebuilt and done. So I already built a model. I’m going to look at this model that I have prebuilt earlier. And so once it’s done, it goes through this process; it’s built, in this case, somewhere in the order of over 150 models. So I can see here some of the summary of the amount of models I rebuilt. The winning model at the end is an ensemble of models of two light GBM models and a cost set which gave us an AAC of 0.78.

So with that, we can look at an experiment summary. So as we look at an experiment summary we can automatically generate this report. This is a 30- page report that explains everything that happened with the model, the results that came out of it, the initial settings that went into it, and the whole methodology that went into automatically tuning this. So these are the various hyperparameters that were used and leveraged; some of the different featured techniques. that were leveraged. So it’s a very detailed report of what happened with this whole building process. After you do that, you maybe want to look at an explanation, so we can look at an explanation for this model.

With that explanation we give you lots of key different metrics, some of them directly around the model. We also give you some surrogate ones, and we give you a nice interactive dashboard so we could go through a use case. In this case, we see the most important feature was pay zero, and so we can actually go through a scenario where we can see what happened. And so we can give you reason codes for the scenario and give you a path of how you go through the decision tree. Because you missed your last payment, maybe your payment three months ago, or payment two months ago; we are making this particular prediction. And we can not only give you feature points in a global level, but we can give you feature points on the local level.

So we can see in this case that pay two in the global level wasn’t as important, but because of what their situation was the second month, third month, and fourth and fifth month, this is really impacting how they’re building this particular model. So with that that’s a quick summary of Driverless AI. We saw a quick demo of automatic machine learning. We’re using leveraging AI to do AI and not only go through the process of building models, but generating explanations, reports, and giving you a deploymen- ready artifact to go into production. So does your company want to be an AI company? At H2O.ai, we are here to partner with you on your journey to help democratize AI. So are you ready to make your company an AI company? So let me turn it back to Vance and see if there’s any questions.

Vance: Wow. Raphael, what a great session, a terrific overview of the state of AI/ML, and who doesn’t love a demo? Lots of activity going on there. So as you might expect, we’ve got some questions. With your permission, we’ll get to them. Are you ready?

Rafael Coss: Yeah, go for it.

Vance: Just as a top level, Rafael ,let’s just talk a little bit about the thinking behind the AI pipeline or the ML pipeline, whatever we might call it. You mentioned the idea that data needs to be assembled and validated and perhaps even correlated. You mentioned certainly the model portion and then there’s production, which we didn’t have a lot of time to go into. It’s a really complicated pipeline, and I think you really made a great case of why a lot of folks struggle with it. Talk a little bit at the high level of what H2O.ai is doing to bring code free or automation to the AI pipeline.

So as you saw in the demo, with Driverless AI, we’re really automating this whole machine learning process so you don’t need to code; you can point and click, or you can code if you want to. But you can go through a GUI to point and click, and we go through the very tedious process of trying all these different combination permutations. We have the knowhow of how to develop high- quality models in the shortest amount of time. And really, the key thing there is, once you’ve built these models, you probably need some basic statistics to understand some of the results. But you can focus on interpreting the models instead of the tedious process of building the models, and that’s what automatic machine learning is enabling.

Vance: Rafael, it’s really eye popping; it’s really rich, and you get a lot of dimensions for many of the job titles that we have here at the Intelligent Data Summit. Let’s zoom in a little bit on the model portion. Talk a little bit about how H2O.ai is built to not just streamline the delivery of a model, but actually encourage fact finding and collaboration.

Rafael Coss: So definitely building machine learning models in this transformation is a team effort. So you’re going to have your data scientist or your citizen data scientist who’s going to be focusing on building models. You have your business analyst, you have your business decision maker. So you want to be able to have tools to make it easy for the data scientists to build models. We work with a whole rich ecosystem of tools that help you prepare data, to bring it to the environment. And we also want to be able to give the person building the models the right insights so that they can have the discussions, whether it’s with the business analyst or the business decision maker. Within Driverless AI, we also have a series of collaboration tools so that the various folks working on building models can collaborate with each other so they can share data, they can share experiments, and they can piggyback on each other as they go through the process of building experiments and models.

Vance: Yeah, that’s really great. In fact, given your open source heritage, Rafael, a question here: it sounds like we could use H2O.ai as a platform for a full AI ecosystem; is there anything to that? Are you seeing that among some of your adopters?

Rafael Coss: Absolutely. We’re seeing many people leverage our various products to build their AI infrastructure and make their companies into an AI company, whether it’s open source or not. So our open source is called H2O 3; that can go from one machine to a cluster, or to try to address those talent/time/trust challenges, moving to an automatic machine learning platform like Driverless AI to streamline that process.

Vance: This is an excellent discussion, Rafael. A couple of implementation type questions here: one of them is, “Can I extend or customize any of my work product out of the H2O.ai?”

That’s a great question. We have this whole notion around making your company an AI company. Really, there are two factors; one is automation, which is great, because we’re going to streamline this whole process. But then people’s often say, “Well, I have my own intellectual property, my IP, and my understanding of the data, and I know that another feature engineering technique that might be helpful.” Or, “Maybe there’s a different algorithm that I want you to leverage in this automation.” Or, “I want you to optimize to a different score.”

Part of the thing about the recipes is we have Driverless AI as an open and extensible platform. We have a whole set of recipes in the box that are among the industry-leading for feature engineering or key algorithms that can leverage both CPUs and GPUs. But we also have an open catalog where there are more recipes that people can leverage, and if you don’t find it in the open catalog, it’s extensible, so you can bring your own; so you can write some Python code and it says “add this feature engineering technique, add this algorithm or add this particular score.” We have some customers who have very strict requirements and need to score models in the order of 30 milliseconds or 20 milliseconds. And so you can get some really high throughput and low latency out of the box.

Vance: Fantastic. I see time is just about up, Rafael, but before you go, what would you say is the best way to get started with H2O.ai and especially that automatic machine learning we learned about today?

Rafael Coss: I think that’s really a two-step approach. One, you want to make sure you can identify your problem. If you can leverage something within your company as a problem that you want to try to solve, there are a lot of datasets in the public where you can find a dataset, and there are various competitions out there like Kaggle that define a problem in a dataset. Then you want to identify some technology. We really think that Driverless AI and automation learning is a great technology for folks to get started. Or if you’re an existing data scientist, you can see how you can leverage some of this automation to streamline your process and be able to focus more on more use cases versus the tedious nature of building a model. And they work with the team, so that maybe it’s just not you, but it’s a couple of people; maybe someone with more experience and you connect with folks, whether it’s online or in person, who can help you.

Because this is a journey, sometimes things are going to work, and sometimes things are not going to work. Who can help you answer some of your questions to get you through some places where you suck or where you need more explanation? With that, H2O offers this thing called a Driverless AI test drive. It’s a free two-hour environment; it’s on a cloud. You can actually use it multiple times. So once you’re done with your two-hour test drive, you can come back and do it again. And so this gives you an environment very quickly where you can try this out. And we give you a series of prescriptive tutorials on how to get started.

So this first one walks you just through the registration process and then it’s a whole learning path. You can go through a series of tutorials, and these are the core ones where you can learn the basics, understand the different processes, the metrics, how to explain and interpret models, and go into different use cases around time series and NLP, and then how to customize things. So to me, it’s about trying to get hands on experience and doing it quickly. So leverage a test drive to start quickly and leverage our tutorials to go through a prescriptive set of scenarios ,and then with that, maybe become dangerous, like bring your own data to this challenge.

Vance: Dangerous and exciting I would say. Rafael Coss, Director of Technical Marketing at H2O.ai. That was a fantastic package of material; not only a terrific demo and slide deck, but now we’ve got a wonderful beginner’s toolset of options, both the free trial and a set of tutorials; it was a really fantastic session. Thank you very much for coming to the Intelligent Data Summit, Rafael.

Rafael Coss: Thank you Vance.

Vance: It’s been totally our pleasure. And just a quick reminder – many of these assets, including the terrific free trial and tutorials, are linked right below the view screen here in Rafael’s breakout room. We recommend that you take a look at that, and as you can tell, there is a lot going on in AI/ML for both the beginner and just about any company, as a matter of fact, at H2O.ai. Here’s a slide that’ll take you to some other great assets directly at the H2O.ai website. Download Rafael’s slides and these links will be live. Thanks again, everyone.

Generative AI

Predictive AI

On-Premise Platform

Managed Cloud

Hybrid Cloud

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

ENERGY

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

Submit AI 100 2025 Nomination

2025 Gartner® Magic Quadrant™

H2O AI 100 2024

ON-DEMAND WEBINAR

Introduction to Machine Learning for All of Us

Read the Full Transcript

Why H2O.ai

Products

Resources

Insights