ON-DEMAND WEBINAR
Introduction to Machine Learning for All of Us
Read the Full Transcript
Vance:Â Our experts just continue on here at the Intelligent Data Summit. This is the session for H20.AI, and rejoining us is Rafael Coss, Director of Technical Marketing. Rafael, welcome back.
Rafael Coss:Â Thanks, Vance.
Vance:Â Rafael has spent much of his career in the world of intelligent data and weâre really glad to have him with us this morning. Prior to joining H2O.ai, he was community director and a developer advocate at Hortonworks, and at IBM, he served in several data-centric roles, including with Big Insights. In fact, many of you may know Rafael or know his name, as he is a co-author of âHadoop for Dummies.â His passion now is to make AI and ML achievable for every company. And to that point, we have his session this morning: âAn Introduction to Machine Learning for All of Usâ with a focus on a beginnerâs guide to automatic machine learning. We all know machine learning is a specific subset of AI and itâs exploding for apps and adoption, but often ML can require some special skills which will delay the big benefits that many companies are looking for.
In his session this morning, Rafael is going to bridge that gap; weâll learn what he sees as the core basics of ML and how automatic machine learning is making capabilities more accessible to a wider community of people. And before I turn it to Rafael, just a quick reminder that you can download the slides. Just click the big red button under the view screen. We also have some great takeaway assets for you today; theyâre all available with those links with no extra registration required. You did that to join us today. And thereâs even a link to the free trial and tutorials, which we highly recommend. So to connect with any of those just click the links below. And with that, Rafael, let me turn it to you and tell us about An Introduction to Machine Learning for All of Us.
Rafael Coss:Â Thanks, Vance. So today, what I want to do is tell you a little bit about H2O.ai and I really want to get into the AI fundamentals, talk about how AI is transforming all industries, and then position automatic machine learning within that context. And then to wrap it all up to see this capability, how itâs really accessible to everybody, I want to do a short little demo.
So with that who is H2O? So H2O.ai Is the open source leader in AI and machine learning. Weâre focused on democratizing AI for everyone. We want to make your company into an AI company. So we were founded in 2012; we just got a series D funding this summer. We have a wide array of products; many of you mightâve heard of H2O.ai from the open source community. We have a distributed machine learning engine thatâs been available in open source for almost eight years. We have something like 20,000 companies using that open source tool, over a thousand universities, and a very open and big community around the open source and our commercial offerings. So we have offices around the world and thatâs a quick introduction into H2O.ai.
So letâs get into what is AI. So AI is a study under computer science. If you look it up on Wikipedia, itâs an ideal agent machine thatâs flexible as a rational agent that perceives its environment; it takes action that maximizes its chance of success at an arbitrary goal. And so, you might listen to that definition and go, âHuh?â And so, letâs make it a little simpler. AI is the ability of a computer to learn and reason like humans. And there are various techniques that can make that possible. And AI had actually had a very rich history. At its core, in its foundation, itâs really about math and statistics. The thinking around AI really developed around the â50s, and thereâs certainly been multiple generations of AI since then.
But one of the key trends thatâs making AI progress through all of this is, one, the algorithms and techniques to find those patterns. Secondly is the ability to leverage data to find patterns and the third is compute. Because finding these patterns, and going through these algorithms across lots of data and nowadays through big data, needs a lot of compute. So the fact that these three things have been commoditized is a key enabler to make AI a reality today. And thatâs why in 2020, AI is spreading like wildfire through various enterprises.
So within AI thereâs an area called machine learning and historically in AI, there were expert systems, but machine learning is really being able to leverage a series of algorithms to learn and make predictions on data without being explicitly programmed. So machine learning is about learning from data. So as I mentioned, data is key, so data is everywhere. With big data, and the digitalization of the world more and more, data is becoming available and that data has patterns. We can identify customer experiences and the interaction between our customers; itâs continuous, itâs on everything. It can be in our supply chain, it can be in our devices, it can be even in wearables, so itâs everywhere. The question is, how do we leverage that?
So machine learning is that study of computer science where we want to learn without being explicitly programmed. So what are the kind of things we want to learn? We want to be able to find a category. Is this tweet positive or negative? Is this person going to default on a credit card? Is that a yes or no? Or I want to be able to find a number. So I have my sales, I want to predict my sales into the future. And lastly, what machine learning can do is find a grouping. I watch videos online and I want to figure out who are the different groupings of the folks who watch videos online on Netflix. So itâs like this group of people are all similar, so thatâs a cluster or a grouping of folks. So these are the three key capabilities of machine learning.
Let me dive into one a little bit more about being able to find a category. So when we look at finding a category, I want to use something called supervised learning. And so in supervised learning, what happens is, Iâm going to learn by example. So in this case Iâm looking at a dataset around a credit card and Iâm trying to figure out if someoneâs going to default on a loan. I needed to give it a bunch of examples and from those examples, which I refer as my training data, Iâm going to try to find a pattern. And so, if thereâs new data that doesnât have that label, Iâm trying to identify how can I define that pattern. And what happens in machine learning when we find that pattern is that it goes into this thing called a model. And once you have a model, you can use that model to guess, using new data, how you should rate that. So in this case did someone default or not default?
Part of this is weâre seeing many people embark on this AI journey. The AI journey we see has five key elements. Itâs about creating a data culture and creating insights from that data culture. Itâs about asking the right questions. Itâs about leveraging the community. Someone that is from a learning perspective and someone thatâs also from âit takes a village to make this happen.â So itâs not just the data scientists, but itâs the developers, itâs the data scientists, itâs the business leaders. And within your enterprise you have to work together to make this AI transformation really happen. Clearly thereâs a technology consideration: what kind of tools youâre going to leverage. Can you leverage automation in those tools to accelerate you in this path?
And lastly, as youâre looking at machine learning and having a model or a machine make decisions, you need to develop trust within that enterprise. And sometimes itâs not just developing trust within the business leaders but oftentimes thereâs regulations. So there are corporate regulations in finance, and more and more, weâre starting to see different governments develop regulations about, âHey, if youâre going to use AI, you need to follow a series of rules.â
So as we talked about this is a team sport. Making this transformation really involves your data scientists, who are trying to look at this whole process of building these models; itâs also the developers who are looking at changing applications, leveraging the models, integrate it into their environment, and sometimes also starting to dabble in data science. Lastly is the business leaders,, because theyâre providing insights into where the business is going and hopefully building that trust to be able to shift from decision making from the business leaders to these potential algorithms.
So AI is transforming every industry. So weâre seeing a massive increase in spending in AI. Year over year, we see over a 300% increase in spending. Weâre also seeing jobs for folks working in AI with a 200% increase. And lastly, weâre seeing AI as a priority, and particularly, machine learning as a priority for these various companies. So as weâre looking at trends in AI and machine learning and weâre really starting to see AI graduate from an innovation lab to something thatâs across the enterprise. And weâre seeing companies moving from experimenting and maybe building a couple models, to tens of models, to potentially building hundreds of models and generating this model factory.
And so as folks are building this model factory, theyâre seeing challenges around managing that change in AI, whether itâs implementing that within their enterprise and the deployment of it, or just getting that cultural change to happen. And lastly, weâre seeing more verticalization of AI solutions and a wider array of people wanting to get involved. So itâs like the data scientists, the citizen data scientists, and maybe even a savvy business user that can potentially start leveraging AI.
So letâs take a look at some use cases in AI. So as we look at the use cases in AI, we can see that theyâre across all these different industries such as financial services, healthcare, telecom, marketing and retail, IOT, manufacturing; ; the list goes on and on. There are folks leveraging AI in all industries. Itâs really trying to take that next step in analytics. So maybe youâve been doing predictive analytics using a data warehouse, using Hadoop, looking back in time, but can you start looking forward in time and leveraging that forward insight to maybe predict churn in a customer, to better understand the patterns that are happening, and maybe predict fraud within your enterprise. Thatâs going to deliver to you more value. Weâre trying to figure out how to save you time, save you money, and get you a competitive advantage.
But weâve seen a series of challenges in order to make AI a reality. Some of those challenges are around talent, such as finding the developers who can put this into production. And secondly is how much time it takes for them to go through this process. Machine learning can be very compute intensive; sometimes just waiting for the machine and the algorithm to finish could take days, maybe even a week. And lastly is if you find the talent, if you invest the time, can you trust it? And how can you develop that trust and facilitate building that trust quickly?
So as we look at a machine learning workflow, it can be fairly rich and complex. So we can just start with the data: exploring the data, preparing the data, going through an optimization phase of tuning models, and selecting models. And then you are moving from a phase of training models to the deployment of models, and youâre actually getting to a point where youâre making predictions in an application. So this can be very complicated and rich, and thatâs part of the challenge as to why you need so much talent. And thatâs where weâre really hoping to leverage automatic machine learning to make this even simpler.
So as we look at that same workflow at a little macro level, thereâs a data preparation and then thereâs a machine learning aspect. And part of the technologies and the benefit is that we can leverage automation here. Like many things in software, can we bring more automation to make it available to even more people? And so thatâs really where automatic machine learning comes in; can we automate this whole data transformation, quality exploration, model building model deployment, and in an automated fashion make it available to even more people?
So what kind of impact can this automation have? Well, it can definitely reduce your time challenge because you can now have the benefits and the insights of expert data scientists and Kaggle grandmasters built into a tool. And by the automation, youâre going to reduce the time in doing the processing, but because this automation, itâs also not only leveraging CPUs but GPUs, itâs also going to reduce some of the time that you need in building models. And lastly, itâs not just about the building the model and exploring the data; itâs about looking at the results of it afterwards. And can you generate explanations that can make this understandable by a wider array of people or even by a regulated industry.
So with that, we want to introduce this whole notion around an AI platform to help you make your company into an AI company. So weâre looking at automating the whole machine learning process and actually leveraging AI to do AI. And not only just delivering and building models, but helping you move to the deployment of those models and understanding and generating explanations for those models. And this platform should not only automate this process and give you all this nice automation, but it should be something thatâs open and extensible. So that way, if you want to bring your secret sauce, your IP, your understanding in the business, and you want to mix it with the capability of the tool, you want to have your cake and eat it, too, and be able to leverage the combination of those two.
So what does that look like? So, weâre going to talk about a tool called Driverless AI. Hereâs a flow about how the machine learning process works. First of all, youâre going to connect to a dataset. In that dataset, youâre going to identify where the pattern is; they refer to that sometimes as a label or the why in this case. And then youâre going to go through process of exploring the data; making sure that you have good qualities because garbage in, garbage out. And once youâre done with that quality check to have that business discussion, then you go through this model optimization phase, and this is where the beauty and the power of automatic machine learning really comes in. Where a tool can automatically select an algorithm to that algorithm, do the hyperparameter tuning, do the feature engineering, and do this in an iterative passion in the leaderboard to give you the optimal model.
And fourth is, we want to have a platform thatâs extensible. So yes, there are these great things as part of the model optimization phase, but we want to have an ability to bring your own feature engineering, your own transformers, your own algorithms, and your own scores that youâre optimizing into this. And then once you have an output, you want to look at the results, look at documentation, look at explanations, and when youâre ready, you want to go quickly from a training phase to a deployment phase, where you have an artifact thatâs ready to go into deployment.
So letâs talk a little bit more about deployment. So weâre going to have this notion where we want to train once and run anywhere. As we described, machine learning is being used in various enterprises and various use cases. So sometimes youâre going to leverage that on prem, sometimes youâre going to do it on the cloud, sometimes youâre going to be running in the backend system. Or in this case, you might be working in a IOT situation, where youâre looking in a smart phone, or a car, or a watch and you want to deploy that model into those environments.
So letâs talk a little bit more about deployment. So we want this deployment- ready artifact and we want the capability to train once and run anywhere. So one of the benefits of H2O.ai is that we produce this thing called a MOJO. A MOJO is Model Object Optimized and itâs a representation of the models; it includes the model and the feature engineering, itâs a binary representation, itâs fast, and itâs portable.
So this is train once and run anywhere. So you can run it in the cloud, run it on prem, or run it in hybrid. You can use various run times; you can have a Java runtime, a Python runtime, an R run time, a C++ run time; itâs very flexible and embeddable. So you can run this in a batch in an area within an application or a database or in real time. So you can run it in rest, streaming, or an IOT situation, where youâre deploying it into your real-time environment. Lastly, you want to have an environment thatâs algorithm-independent to simplify that deployment mechanism. So instead of having to deal with all these different algorithms and all these different run times, you just have one runtime that can work with all the different algorithms. So the beauty of the MOJO is that it gives you a deployment-ready artifact that you can train once and run anywhere.
The next thing is explainability. We talked about building models and using the optimization and the automation to do that, making it easy to deploy, but we want to be able to understand what is being built. And weâre going to help generate trust and understanding by not only automating the process of building models but also automating the process of building explanations. So there are lots of statistical and machine learning techniques out there, like LIME and Shapley, Variable Importance, and Partial Dependence that Driverless AI can generate out of the box. It can also automatically generate documentation as well as start doing things around bias. So for example, disparate impact analysis to understand if thereâs a bias in your models, or more importantly, whether thereâs bias in the data that have built your models. And also, if this is not reacting or behaving the way youâre expecting, can you debug whatâs happening.? And so this is whatâs coming out of the box ,not only in building the models but actually generating the explanations.
So how about we actually see a demo and check this out. So this is Driverless AI, and this is the UI, so unlike many other machine learning tools where you have to program, here itâs optional. So you can use a UI to go through the process of building a model or you can leverage a Python and our client to build a model. So letâs start from the beginning. So in this case you can go into a tool and load a dataset. You can connect to a file system, you can connect to a file system on the cloud like an object store, whether itâs Amazon, Azure, or Google, or you can manipulate a relational database. We have many connectors that you can leverage to bring data into the environment.
And today what weâre going to do is go through a credit card scenario. So Iâve already brought in the data; letâs start exploring it. So letâs first look at some details of this credit card dataset. And so we very quickly can get a profile of the schema and the distribution of data within that schema. So in this case, itâs a credit card dataset, weâre looking at some demographic information such as their education, their marriage status, their age, and then we get a series of historical features.
For example, we can see whatâs been their payment history over the last six months, whether they made their payments on time or late. We can also see their bill amount over the last six months, and how much they have been paying over the last six months. And most importantly, we want to see an example of whether they have paid or not. So in this case, itâs looking at the default.
As we continue on that exploration of the data, we want to actually visualize understanding whatâs happening. So many times, data scientists could spend a lot of time generating all kinds of graphs and trying to understand whatâs happening and the quality of the data and the behavior thatâs happening to try to deeper their understanding. In this case, this is where youâre starting to see the AI do AI. So we automatically generate a series of visualizations, depending on the patterns that are in your data. So sometimes it can be six visualizations, sometimes it can be 13; in this case we see 10. So letâs take a look at a couple.
For example, we can look at outliers. Hereâs your distribution for bill amount five. And you can see there are two outliers here, and if there are multiple ones, you can scroll through them. You can actually look at details. So weâre looking at bill amount four here. And so here it gives you an opportunity to have a discussion with your line of business: âHey, is this a typical range for bill amount four?â Or maybe it gives you an opportunity to have a discussion with your data engineer that says, âHow was this value created? Where do they come from? How did you do the data prep? Did you merge multiple datasets?â It gives you an opportunity to explore the quality of the data and have a discussion with folks working together on this.
One of the key things that we want to look at is correlation. So each graph has an explanation. So typically in math, if you got your homework back in calculus and itâs all red, you did bad, but here when we look at this correlation graph, we can see that a high correlation is in red, so red is actually good, and a low correlation is in blue. So as we look at our correlation graphs, we can filter out and see thereâs a high coloration around bill amount. So if you owe $1,000 next month, youâre probably going to owe something similar which makes sense. And thereâs a correlation around how you make payments month to month, so maybe you start missing payments, or you start getting ahead of payments or below payments; there are some correlations there.
But we also want to explore some of these other correlations, so we can actually click on an individual value and see that at payment zero there are some weaker correlations with the following months. Or we can look at bill amount five and we can see that there are some high correlations on payment four, and the month before and the month after, and some weaker correlations on other historical payments, but there seems to be a weak correlation around bill amount five. So again, this is where weâre automating, helping the data scientists and the folks doing machine learning understand what are the patterns in their data, the quality of the data, and help them strengthen the understand because garbage in, garbage out.
But once we go through this process, weâre actually ready to actually start building a model. So letâs quickly go through the process of building a model with this credit card. So we can go in there, and say we want to make a prediction; weâll give it a cool demo name one as a name. We can select a test data set, so hereâs my test data set for a credit card. I donât need to provide a validation. I could drop some columns if I needed to and lastly, I want to provide my example. Whereâs my label data? And so, again, this is where weâre using AI to do AI. And so Driverless AI has figured out this is a classification problem. Weâre working in an environment with GPUs and it gives some suggestions from key settings. So were looking at accuracy, time, probability and a score. And this is just the high-level knobs that are available; there are many more knobs that are available under expert settings.
I want to just take a look at accuracy interpretability a little bit. So accuracy is really influencing the algorithms available, so in this case, itâs suggesting algorithms to build a model here. Itâs going to look at building an ensemble of eight models and hereâs a set of feature engineering techniques. So as I crank up accuracy, I can see potentially the set of algorithms change, and so it did change here; the level of ensembling changed and the set of feature engineering changed. Thereâs a similar correlation around interpretability/complexity.
As I crank up interpretability, whatâs going to happen is, Iâm going to start looking at building a little bit simpler models. So Iâm going to lower the level of ensembling, change the level of models, and change the level of feature engineering. So I can see some feature engineering techniques being eliminated, some algorithms being reduced, and at the end, we can see some ensembling being significantly reduced. So at this point now, weâve completely removed the ensembling.
For the quick demo, Iâm just going to reduce this back to seven; actually, Iâm going to reduce it back to six so we get to seven. Bring this down to one. Letâs look at building a model quickly. So letâs launch this experiment. So weâre launching this experiment, itâs going to start going through a whole series of phases looking at the trained data set. And one of the nice things about building an experience is we give you all these notifications. So we look at how balanced the data is, and we look at shifts in your data between your training and tests. If there are suggestions to make improvements, we can provide those suggestions in the notifications. This environment is actually running with eight GPUs, so you can see the HTPs starting to fire as we look at understanding these data sets. Again, itâs starting to detect some differences in some of the different values.
Weâre going to check for data leakage, so we want to make sure that the answer is not in the problem or in the dataset. Or we can also start the process of building models. So as you start building the process of building models, Driverless AI is using a genetic algorithm, and what itâs going to do is going to select an algorithm, itâs going to go through doing some feature engineering, itâs going to select some hyper primary tuning for that algorithm, and itâs going to do it in an iterative fashion. So at this point weâre one minute in, and weâve already developed four models. So each one of these dots is a model thatâs being built; the winning model is light GBM and the AUC for that is initially 0.78. But we can hover over some of these other ones and so we can see hereâs a model that was built using a decision tree. And as we hover over them we can see the variable of importance change, because itâs looking at different variables and leveraging different feature engineering techniques to understand that.
With that, we give you lots of different metrics, we give you an ROC, we can give you the confusion metrics, key data points, and we can look at precision recall, lift, and gains. And so now itâs going through this iterative process in the genetic algorithm and trying all these different permutations, and this is going to go on for a while. So letâs treat this as a cooking show and look at something thatâs been prebuilt and done. So I already built a model. Iâm going to look at this model that I have prebuilt earlier. And so once itâs done, it goes through this process; itâs built, in this case, somewhere in the order of over 150 models. So I can see here some of the summary of the amount of models I rebuilt. The winning model at the end is an ensemble of models of two light GBM models and a cost set which gave us an AAC of 0.78.
So with that, we can look at an experiment summary. So as we look at an experiment summary we can automatically generate this report. This is a 30- page report that explains everything that happened with the model, the results that came out of it, the initial settings that went into it, and the whole methodology that went into automatically tuning this. So these are the various hyperparameters that were used and leveraged; some of the different featured techniques. that were leveraged. So itâs a very detailed report of what happened with this whole building process. After you do that, you maybe want to look at an explanation, so we can look at an explanation for this model.
With that explanation we give you lots of key different metrics, some of them directly around the model. We also give you some surrogate ones, and we give you a nice interactive dashboard so we could go through a use case. In this case, we see the most important feature was pay zero, and so we can actually go through a scenario where we can see what happened. And so we can give you reason codes for the scenario and give you a path of how you go through the decision tree. Because you missed your last payment, maybe your payment three months ago, or payment two months ago; we are making this particular prediction. And we can not only give you feature points in a global level, but we can give you feature points on the local level.
So we can see in this case that pay two in the global level wasnât as important, but because of what their situation was the second month, third month, and fourth and fifth month, this is really impacting how theyâre building this particular model. So with that thatâs a quick summary of Driverless AI. We saw a quick demo of automatic machine learning. Weâre using leveraging AI to do AI and not only go through the process of building models, but generating explanations, reports, and giving you a deploymen- ready artifact to go into production. So does your company want to be an AI company? At H2O.ai, we are here to partner with you on your journey to help democratize AI. So are you ready to make your company an AI company? So let me turn it back to Vance and see if thereâs any questions.
Vance:Â Wow. Raphael, what a great session, a terrific overview of the state of AI/ML, and who doesnât love a demo? Lots of activity going on there. So as you might expect, weâve got some questions. With your permission, weâll get to them. Are you ready?
Rafael Coss:Â Yeah, go for it.
Vance:Â Just as a top level, Rafael ,letâs just talk a little bit about the thinking behind the AI pipeline or the ML pipeline, whatever we might call it. You mentioned the idea that data needs to be assembled and validated and perhaps even correlated. You mentioned certainly the model portion and then thereâs production, which we didnât have a lot of time to go into. Itâs a really complicated pipeline, and I think you really made a great case of why a lot of folks struggle with it. Talk a little bit at the high level of what H2O.ai is doing to bring code free or automation to the AI pipeline.
So as you saw in the demo, with Driverless AI, weâre really automating this whole machine learning process so you donât need to code; you can point and click, or you can code if you want to. But you can go through a GUI to point and click, and we go through the very tedious process of trying all these different combination permutations. We have the knowhow of how to develop high- quality models in the shortest amount of time. And really, the key thing there is, once youâve built these models, you probably need some basic statistics to understand some of the results. But you can focus on interpreting the models instead of the tedious process of building the models, and thatâs what automatic machine learning is enabling.
Vance:Â Rafael, itâs really eye popping; itâs really rich, and you get a lot of dimensions for many of the job titles that we have here at the Intelligent Data Summit. Letâs zoom in a little bit on the model portion. Talk a little bit about how H2O.ai is built to not just streamline the delivery of a model, but actually encourage fact finding and collaboration.
Rafael Coss:Â So definitely building machine learning models in this transformation is a team effort. So youâre going to have your data scientist or your citizen data scientist whoâs going to be focusing on building models. You have your business analyst, you have your business decision maker. So you want to be able to have tools to make it easy for the data scientists to build models. We work with a whole rich ecosystem of tools that help you prepare data, to bring it to the environment. And we also want to be able to give the person building the models the right insights so that they can have the discussions, whether itâs with the business analyst or the business decision maker. Within Driverless AI, we also have a series of collaboration tools so that the various folks working on building models can collaborate with each other so they can share data, they can share experiments, and they can piggyback on each other as they go through the process of building experiments and models.
Vance:Â Yeah, thatâs really great. In fact, given your open source heritage, Rafael, a question here: it sounds like we could use H2O.ai as a platform for a full AI ecosystem; is there anything to that? Are you seeing that among some of your adopters?
Rafael Coss:Â Absolutely. Weâre seeing many people leverage our various products to build their AI infrastructure and make their companies into an AI company, whether itâs open source or not. So our open source is called H2O 3; that can go from one machine to a cluster, or to try to address those talent/time/trust challenges, moving to an automatic machine learning platform like Driverless AI to streamline that process.
Vance:Â This is an excellent discussion, Rafael. A couple of implementation type questions here: one of them is, âCan I extend or customize any of my work product out of the H2O.ai?â
Thatâs a great question. We have this whole notion around making your company an AI company. Really, there are two factors; one is automation, which is great, because weâre going to streamline this whole process. But then peopleâs often say, âWell, I have my own intellectual property, my IP, and my understanding of the data, and I know that another feature engineering technique that might be helpful.â Or, âMaybe thereâs a different algorithm that I want you to leverage in this automation.â Or, âI want you to optimize to a different score.â
Part of the thing about the recipes is we have Driverless AI as an open and extensible platform. We have a whole set of recipes in the box that are among the industry-leading for feature engineering or key algorithms that can leverage both CPUs and GPUs. But we also have an open catalog where there are more recipes that people can leverage, and if you donât find it in the open catalog, itâs extensible, so you can bring your own; so you can write some Python code and it says âadd this feature engineering technique, add this algorithm or add this particular score.â We have some customers who have very strict requirements and need to score models in the order of 30 milliseconds or 20 milliseconds. And so you can get some really high throughput and low latency out of the box.
Vance:Â Fantastic. I see time is just about up, Rafael, but before you go, what would you say is the best way to get started with H2O.ai and especially that automatic machine learning we learned about today?
Rafael Coss:Â I think thatâs really a two-step approach. One, you want to make sure you can identify your problem. If you can leverage something within your company as a problem that you want to try to solve, there are a lot of datasets in the public where you can find a dataset, and there are various competitions out there like Kaggle that define a problem in a dataset. Then you want to identify some technology. We really think that Driverless AI and automation learning is a great technology for folks to get started. Or if youâre an existing data scientist, you can see how you can leverage some of this automation to streamline your process and be able to focus more on more use cases versus the tedious nature of building a model. And they work with the team, so that maybe itâs just not you, but itâs a couple of people; maybe someone with more experience and you connect with folks, whether itâs online or in person, who can help you.
Because this is a journey, sometimes things are going to work, and sometimes things are not going to work. Who can help you answer some of your questions to get you through some places where you suck or where you need more explanation? With that, H2O offers this thing called a Driverless AI test drive. Itâs a free two-hour environment; itâs on a cloud. You can actually use it multiple times. So once youâre done with your two-hour test drive, you can come back and do it again. And so this gives you an environment very quickly where you can try this out. And we give you a series of prescriptive tutorials on how to get started.
So this first one walks you just through the registration process and then itâs a whole learning path. You can go through a series of tutorials, and these are the core ones where you can learn the basics, understand the different processes, the metrics, how to explain and interpret models, and go into different use cases around time series and NLP, and then how to customize things. So to me, itâs about trying to get hands on experience and doing it quickly. So leverage a test drive to start quickly and leverage our tutorials to go through a prescriptive set of scenarios ,and then with that, maybe become dangerous, like bring your own data to this challenge.
Vance:Â Dangerous and exciting I would say. Rafael Coss, Director of Technical Marketing at H2O.ai. That was a fantastic package of material; not only a terrific demo and slide deck, but now weâve got a wonderful beginnerâs toolset of options, both the free trial and a set of tutorials; it was a really fantastic session. Thank you very much for coming to the Intelligent Data Summit, Rafael.
Rafael Coss:Â Thank you Vance.
Vance:Â Itâs been totally our pleasure. And just a quick reminder â many of these assets, including the terrific free trial and tutorials, are linked right below the view screen here in Rafaelâs breakout room. We recommend that you take a look at that, and as you can tell, there is a lot going on in AI/ML for both the beginner and just about any company, as a matter of fact, at H2O.ai. Hereâs a slide thatâll take you to some other great assets directly at the H2O.ai website. Download Rafaelâs slides and these links will be live. Thanks again, everyone.