Automatic Model Documentation with H2O

This video was recorded on June 18, 2020.

Slides from the session are available here: https://www.slideshare.net/0xdata/automatic-model-documentation-with-h2o

For many companies, model documentation is a requirement for any model to be used in the business. For other companies, model documentation is part of a data science team’s best practices. Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, and information on model performance.

Collecting and documenting this information can take a data scientist days to complete for each model. The model document needs to be comprehensive and consistent across various projects. The process of creating this documentation is tedious for the data scientist and wasteful for the business because the data scientist could be using that time to build additional models and create more value. Inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.

In this virtual meetup, we will learn how to create comprehensive, high-quality model documentation in minutes that saves time, increases productivity and improves model governance.

Speaker’s Bio:

Nikhil Shekhar: Nikhil is a Machine Learning Engineer at H2O.ai. He is currently working on our automatic machine learning platform, Driverless AI. He graduated from the University of Buffalo majoring in Artificial Intelligence and is interested in developing scalable machine learning algorithms.

Read the Full Transcript

Saurabh Kumar:

Again, a very warm welcome. Thank you again for joining us for our webinar titled Automatic Model Documentation with H2O. My name is Saurabh Kumar and I’m on the marketing team here at H2O.ai. I’d love to start off by introducing our speaker for the day. We have with us Nikhil Shekhar. Nikhil is a machine learning engineer at H2O.ai. He is currently working on our flagship product Driverless AI, which is an automatic machine learning platform. He graduated from the University of Buffalo, major in Artificial Intelligence, and is interested in developing scalable machine learning algorithms.

Before I hand it over to him, I’d like to go over the following housekeeping items. Please feel free to send us your questions throughout the session via the questions tab. We will be very happy to answer them towards the end of the session. This presentation is being recorded, a copy of the recording and a slide deck will be available shortly after the presentation is over. Without further delay, I’d now like to hand it over to Nikhil.

Nikhil Shekhar:

Thanks, Saurabh. Good morning everyone. I’ll start with the presenting my screen, just give me a second. Can you guys see my screen now?

Saurabh Kumar:

Yes, Nikhil. Yeah.

Nikhil Shekhar:

Yeah. Okay, good. We’ll be going over ML AutoDoc, which is a new product from H2O. Basically, it’s for documenting the models which you develop, including the model pipeline, feature engineering, et cetera, et cetera. We’ll go into the details of what kind of models are supported in AutoDoc and how the idea of AutoDoc came into picture, et cetera, over the course of the next one hour or so.

Let’s begin. Whenever there is a regulatory body, there is a need to document the models that you develop and there are multiple different challenges, which people or an organization on the whole face, especially the data scientists. A few challenges are, like, it’s tedious to document each and every details of the models with the parameters, et cetera, and all the hyperparameters in the format which is acceptable by the regulatory bodies. It’s tedious for data scientists. It’s usually time-consuming. It needs a lot of iteration of the reviews between multiple different folks who are in-charge of building the model than somebody inside the organization who would be reviewing the model and then finally you submit it to the regulatory bodies before you can use the model in production. So, it’s really time consuming.

Then, if there are multiple different teams within an organization, they might have their own template and ways of documenting the model, So the documents which get created at the end within these sub-teams might be inconsistent and there might not be an organization level consistency in the document, which is submitted by the same organization to the regulatory bodies.

Also, if there are multiple different organizations and the regulatory body which reviews the documents, they have to review different kinds of documentation, which has been provided by the organization. Since there is no template usually, no standard template, the documentation process kind of becomes an incomplete; there might be some missing parameters or what feature engineering happened. Or maybe if at the end you have a stacked model, the details in the document is available only for only a couple of models and not for all the models, et cetera.

There are many cases the document which gets submitted or gets created for the model is incomplete. And since it’s a manual process, human beings are expected to make some errors. So there might be issues with the documentation. Then it goes to the previous point that it would need several iteration of reviews for the document to be complete so that there are no errors.

Then, coming into 2020, or maybe beginning 2019, there have been regulatory bodies set up in the US and APAC, a few countries in APAC and India wherein there is a compliance requirement for the companies, especially in the banking sector and the healthcare sector wherein they have to submit the documented version of the model, which they want to use for a particular use case in production, which affects the users or the consumers. For instance, the credit score, right? It affects the end-user because he might be given a loan or denied a loan.

In such cases, there are compliance authorities which are set up in various parts of the world. The companies who run these use cases, like scoring the credit history of a particular customer, they need to adhere to the compliance to these authorities. For this, they need to submit the document which details the hyperparameters, the model, the features, et cetera, which is being used.

These are the few challenges which companies are facing. Since there are regulatory compliance beginning 2019 or so, this has become a major point where all the companies are trying to focus on because all the model should be well-documented and then it needs to be tick marked by the regulatory authorities. So these are the few challenges which companies are facing in documenting the models which they build.

There are also challenges, like a company which has been using, for instance, a model for the last 10 years. There are challenges that they need to go back and document those models also going ahead and submit the documentation on the same to the compliance authorities if they want to continue using the model, or they will have to duplicate it. So these are the few use cases or why AutoDoc is very important in the current world.

This is an overview of what the H2O AutoDoc does. When you build a model, it automatically generates a Word document, which you can edit. It has the details of algos, the techniques, or the feature engineering that happened during the modeling phase. Since it’s a Word document, you can go ahead and edit it after it generates the documentation in a particular format, and it can be customized to your needs.

It saves a lot of time for the data scientists because 95% of the document would be built in the format in which you want. If you want to add a few other features, you can go ahead and add it. But the format in which the AutoDoc gives you, the final document, is the one which can be directly submitted to most of the regulatory bodies across US, India, and parts of APAC like Singapore, et cetera. Then, if it’s an internal document wherein you want to document a model for your internal use, so you can also edit it and customize it based on your business needs.

It’s very simple to use. You will see in the next few slides how can you generate an AutoDoc for the model which you have already built. You just need to add a couple of things or code to it and then you get a full pledged documented model for it.

I’m going to the next slide. Give me a second. Okay. Yeah. The concept of AutoDoc came from our product called Driverless AI, which is an enterprise product. Because many of our customers needed the pipeline to be well-documented and then we went ahead and extended it to support H2O open-source model, the Scikit-learn model. The idea is to support a lot of other packages in the AutoDoc going ahead. Currently, it supports these three libraries, Driverless AI modeling, the H2O-3 opensource, and the Scikit-learn.

First, let’s go through how you can document the model which you are generating H2O-3 and create a document out of it. These are the algorithms which is supported inside of H2O, which you can create AutoDoc out of the box. It’s an XGBoost, now which everyone is pretty familiar with. Inside H2O-3, we have a variant of XGBoost, which can run in parallel. Then, the Gradient Boosting Machine, which is internal to H2O-3. So, you can generate AutoDoc for this, then all the GLM flavors, which are available in H2O-3 package can also be… AutoDoc can be created for them as well. The Deep Learning model or the library which are their APIs which are there in H2O-3 can also be documented using the AutoDoc.

You can create documentation even for the Distributed Random Forest and Stacked Ensembles. When you have an ensemble of models, the document which you get will have the details of each and every model which is there in the ensemble. So that’s pretty useful, as in you don’t have to go ahead and pick the parameters or the metrics, et cetera, of each and every model of which are the part of stacked ensemble. If you an ensemble which is, for instance, a 21, 20-plus models, you’ll have to do the documentation for each and every model manually if you don’t use an AutoDoc. But here, AutoDoc will do it out of the box, it will capture all the details of all the models which are there in the ensemble. It’s pretty useful.

The AutoDoc package is integrated into the H2O Steam, which is again an enterprise product, and it also comes as a Python package. As in you can just do an install of the package in your Python environment and you can start using it. There is no external dependency. As in you don’t need H2O Steam as a necessity for using AutoDoc. You can just use it out of the box just by installing the Python package which is available. If you go to the docs there, you will find all the details of how to use AutoDoc, et cetera. It’s just a link wherein we have all the documentation there and with subsequent folder where the same documentation would be updated. Then, we support other models and new packages, et cetera.

This a snapshot of how the AutoDoc wherein H2O Steam model would look like. If you look here in the middle, it shows what all sections are there in the document. It says that it’s an H2O-3 experiment. The first subsection, it’s where you get the experiment overview. As in what was the dataset that was used, how many columns were there, and some summary statistics on each and every column. Then, you get the data overview. Then, the validation strategy that was used during the modeling phase. It documents that, whether it was a cross-validation technique which you used or it was an [inaudible], for example, that you are using or you provided in a different test all together, et cetera.

Then, it gives you a feature importance of the final model. If you look at the right bottom, it gives you feature importance, which is a Shapley plot. It gives you a scaled native importance, as well as the relative Shapley importance. These things are pretty handy and all the regulatory bodies would want these… actually, not want. They need these to be documented when the document is being prepared. All of this, you get out of the box.

Then, the next subsection is the final model. The final model, it will document all the parameter details. There’s some metrics of… For instance, say, if you use AUC as your score, then it will give you the AUC and a few other metrics values for the particular model Then, what were the alternative models which were built, you get some details of that. As well as you can get a PDP plot. If you look at the top left, it shows the partial dependency plot, how it looks like. It gives you a PDP as well as an out of range PDP.

The top right has the prediction stats for each and every quantile in the order of your training dataset. Then, on the left bottom, you have the actual versus predicted graph. All these graphs, you’ll see in the AutoDoc along with a lot of text, which would be there to document multiple different things.

I’ll move to the next slide, we’ll have some text there. Okay. This shows how you generate an AutoDoc. The first example is for Scikit. For instance, if you want to generate an AutoDoc for a Scikit model, you just install the AutoDoc package and then you import it. Then, you just need to add in the line there wherein you just say: render_ autodoc H2O, config, best_model. You need to provide in the train dataset, the features, and the label. Then, you get the entire documentation created for the particular Scikit model.

Similarly, for the AutoDoc, autodoc.h2o3, you need to provide the H2O context, then the config, and the best model. Then, you get the model documented. So you get the AutoDoc by the name AutoDoc_ H2O3.docx. So it’s that simple to create a documentation for the model, which you have already built, so you don’t need to do anything. Not much, except for adding a few lines of code. So you do your modeling, you do your EDA, and you do your feature engineering, and then you do your modeling. And at the end, you just add these four, five lines of code and then you get the entire document generated for it.

Moving to the next. We have a protocol enterprise Steam, which is used mainly to start H2O-3 clusters. If any of our customers are using H2O Steam, they can also use the AutoDoc because it is well integrated into it, and then Steam exposes the service to generate model report for open source H2O-3 model. The same service is used in Driverless AI to generate the auto documentation.

The idea of AutoDoc, as I was saying, came from Driverless AI and then we extended it to support H2O-3 models, and now Scikit-learn models, and going ahead with supporting a lot of other models. If you look at the screenshot on the bottom left, it shows you what the AutoDoc will have. The pictures are not very clear. We’ll go through them in the next few slides. But it has a lot of content. Usually, it has a textual data, it has some graphs, and it has some tables, then whatever is needed for documenting the model well.

Now, we move onto the Scikit-learn examples. This is the first third-party model which the AutoDoc is supporting. For now, it supports only Supervised Learning Models. The Scikit-learn Linear Model, which AutoDoc supports currently, is a logistic regression. The Ensemble Method is Random Forest Classifier, Radiant Boosting Classifier, and the Gradient Boosting Regressive. We support these four models right now, but we’ll have support for a plethora of other models going ahead. In addition to Scikit-learn, we plan to support other models or other packages as well.

If you look at the AutoDoc examples for Scikit-learn, so here, if you look at the middle image, it says the subsections are more or less the same. There might be a few additions or a few that are missing based on what the particular library provides information about. Whatever the library can provide, we document all of it. This Scikit-learn experiment has the experiment overview, data overview, feature importance, final model, alternative models, and the partial dependency plots. In one of the subsections, we will have confusion matrix if it’s a classification model. If it’s not a classification model, it’s a regression problem, then you won’t see a confusion matrix. Instead you will still see the various of RMSE or discordant ratio for the particular model.

In the middle, you see the shift detection. If there is a shift in the data between the training and the test, then you should see the shift detection chart as well. If there is no shift detection, it will just be skipped in the documentation. But if you want to specifically have the shift detection to be there, irrespective of whether there’s a shift or not, you can specify that using the conflicts. Then, again you have the partial dependency plot, the response rate, and the AUC on the left side.

This is for a classification problem. The graphs would look different for a regression problem. You won’t see the AUC curve, you won’t see the confusion matrix, but you would still see the partial dependency plot, shift detection if there is a shift in the training and test data, and the response rate. The AutoDoc would figure out based on the model that you build what all to show and you can also configure what all needs to be part of the final model. We’ll come to the configuration part during the latter half of the presentation.

Next, we’ll go through a couple of examples of how the AutoDoc looks for Driverless AI models. This is where everything began. This is an enterprise product. Many of our customers came back to us and they wanted their models to be well-documented so that they can use it for internal use, as well as to submit to regulatory body. So we built AutoDoc and integrated it into Driverless AI.

Inside Driverless AI, the algorithms which are supported by AutoDoc are XGBoost, LightGBM, Tensorflow, and so on. Basically, whatever libraries we support in Driverless AI, all of them are well-documented and AutoDoc captures the details of all of them. For now, we have the XGBoost, LightGBM, Tensorflow, but that’s not an exhaustive list. So if inside of Driverless AI we add in new algorithms and they start getting shipped out with Driverless AI, the AutoDoc would be able to capture the details of those models as well.

Additional features. AutoDoc is included in Driverless AI. It has the explainability. It has some charts or a lot of techniques, which we use instead of Driverless AI to explain model in our MLA module. So all of that is also well-documented in the AutoDoc, which you get for Driverless AI. Also, there you can customize the report, as in you can specify what to have in the document at the end of it or what not to have. If you want some additional features to be added, those can also be added. We’ll go through each of them one by one.

If you guys use Driverless AI in some capacity, then you would know that once an experiment completes or once your modeling completes, this is the screen which you get, the screenshot that you see on the screen. Then, there’s an option which says build auto report. If you click on it, it will build auto report. You can also configure Driverless AI to build auto report automatically for each and every experiment.

Once you have the AutoDoc, this is how it looks like. These are a few more subsections. It says, okay, this AutoDoc was built by the Driverless AI experiment. And then it gives you the name of the experiment. Then, it says experiment overview, data overview. These two are the same, which we saw for the H2O-3 open source and the Scikit-learn models. And then the methodology. It’s something which was not there in H2O-3 and Scikit-learn, because in Driverless AI, once you feed in the data for a supervised learning problem, we do a lot of feature engineering, we create new features, then we do the modeling on top of those features, create multiple different models, and then finalize on what model should be part of the final pipeline.

The feature engineering details would be captured… a high-level of the entire pipeline would be captured in the methodology subsection here, and then whether the data sampling was used or not during the entire experiment, which comprises of the feature engineering plus the modeling phase. If there was any sampling done, the details of those would be there in that subsection, and then what was the validation strategy that was used. So the validation strategy in subsection remains the same across Scikit-learn, H2O-3, and Driverless AI.

In model tunings, for instance, if you’re not using an auto-ML from H2O-3, then you won’t have a model tuning subsection for H2O-3. In Scikit-learn, by default, the model tuning subsection will be skipped because Scikit-learn just creates a model and it doesn’t really fine tunes the hyperparameters by itself. Driverless AI has another section, the feature evolution, wherein it captures the details of how many features were tried, what were the features which were selected to be used in the final model, et cetera. We will go through most of these in the next few slides as well.

Then, the feature transformation, which were used whether we did a one hot encoding or we did a cross-validated weight of evidence, or something like that, so what were the feature transformations which were applied to each and every column or other input features which will be a part of the final pipeline model. All those details would be in the subsection feature transformation, then you’ll have the final model, alternative models which were tried but were not a part of the final pipeline.

And then, if there is a deployment which you did from the Driverless AI screen, those details would also be there whether a module is created, which is a java artifact, or whether you deployed it on AWS using one click feature which comes packaged with Driverless AI. All those details will be there. Then, partial dependency plot, which remains consistent across H2O-3, Scikit, and Driverless AI.

Let’s go through some of the details of each of these subsections in the next few slides. Okay. This is how the final screen looks like. The one that we highlighted in the previous section is the middle yellow piece, the second column if you might say. This experiment, the name was Airbnb. The training dataset is called Train. The target column was Price. There was a test dataset which was provided to it. It created tons of features. It shows the variable importance in the middle, right? These were the features which were created from the original features which were inputted into it.

If you look at the first one here, it says 35_CVTE, then it gives you the column name. It’s 35 cross-validated target encoding on the column, which is specified next to it. You see it there, right? The name of the feature gets truncated, but when the same variable importance, when you see it in AutoDoc, it will have the full feature name and then it’s much easier to read. Otherwise, in this case, you just have to click on it or over it to see the entire name, the feature name.

In the middle section, the last section is where you click and then you get the auto report for this particular experiment. That’s what it shows here. Also, you get the AutoDoc if you click on the download experiment summary. That gives you a zip file and one of the files inside it is the AutoDoc. That’s another way in which you can download AutoDoc.

Now we’ll go through a sample AutoDoc, which gets created for Driverless AI and the various sections which we would have been discussing till now. The first subsection is an experiment overview. It gives you an idea of, it says: Driverless AI built a stacked ensemble of two LightGBM models to predict positive review. Then, there were four original features which were there in the dataset and this is a classification problem. And that it took 23 minutes and 54 seconds to finish. In the final model, it’s using none of the original features, but it’s using 22 engineered features, and the experiment tried a total of 2,000 plus features. So from the four original features which were given to Driverless AI, it tried or it built 2,000 plus features and then the final pipeline selected 22 features. That gives you a high-level overview of the experiment that you ran.

The next subsection is the performance, wherein it says the internal validation score, so it says 0.946 per AUC. Since this experiment did not have a test dataset that was provided, so the AUC under test dataset is not available. The next section, it says about accuracy, time, and interpretability. What was the value of these three knobs that was selected, if I may go back. So if you see here, six, four, and six. This is what we are talking about right now, what was the accuracy, time and knob setting values.

For this particular experiment, it was 7, 3, and 7. Then it says what kind of machine was Driverless AI running on. It was running on Linux and in Docker environment. It gives some system details as in the system memory and the number of CPUs and the number of GPUs that was attached to the system. And the version of Driverless AI. So these things operate in the old version of Driverless AI, but the version is 152.

Next, in the data overview, you get some summary stats for each and every column that was there in the original input dataset. The left side, if you look, the name of the column was id, score, et cetera, et cetera. Then, it gives you the type of the data, whether it’s in strings, categorical, et cetera. Then, summary stats, median, max, standard deviation, unique, frequency of mode, et cetera.

Then you have on Boolean column, which is positive review and it gives you… There’s so many stats for that. Then, comes a categorical column. The next subsection talks about whether there was a shift detection or not, it will tell you how Driverless AI performs shift detection. And if there is any, it will show you the graph for the shift which would look something like this. What this says, there’s a significant difference detected between training and test distribution for the column each, so it’s dropping this feature. If there is a shift detection at the columns, then it will show you the shift detection graph, otherwise not.

Then comes the methodology. This is specific to Driverless AI. You will not see this for Scikit-learn models and H2O-3 models. It says first step is to ingest the data and when it ingested data, it detected a column types. Then, we did some feature processing, turned raw features into numeric. Then, in model tuning and feature tuning, it says they found the optimal parameters for XGBoost and Light GBM models by training models with different parameters. For instance, here it says XGBoost and for Light GBM, the parameters were fine-tuned. But if your model uses only Tensorflow, it would say the fine-tuning happened only for the Tensorflow model.

So all this text is generated on the fly for the particular model that you’re trying to document. It’s not like a preset text, it’s dynamic and it captures the feature for the model which you generate and which you’re trying to document using AutoDoc. Then, it says the best parameter are those that generate the greatest AUC. Basically, we are trying to optimize AUC for this particular experiment.

The next bullet point says that there were 241 trained and scored to evaluate features and model parameters. So there were 241 models which were trained and scored to evaluate which feature should be a part of the final pipeline, as well as the final model which was created. These are the feature evolution as we went through in the experiment summary, 2,096 features were used over 41 iterations, and then 720 trained and scored 2 to further evaluate engineered features. So 720 models were trained during the feature evolution phase is what it’s trying to say.

Then, the final model was stacked ensemble of two LightGBM models. It did not have an XGBoost model, or Tensorflow model, or some linear models. This is the final model for this particular experiment. Then, for the scoring pipeline, it says the Python scoring pipeline was created. Mojo scoring, which is a java artifact was not created for this particular experiment. It gives you the name of the Python scoring pipeline, which is h2Oai_experiment_ the name of the experiment, then the scoring pipeline, and scorer.zip. You can download the scoring pipeline from the UI and use it for either real-time scoring or for batch scoring.

It gives you the time that Driverless AI took for each of these to do. For the data preparation, it took 12.86 seconds. For the model and feature tuning, it took 280 seconds and 241 models were built during feature evolution. It took 1.021 seconds and 720 models were built. The feature evolution is usually the step where Driverless AI takes the most amount of time. That’s where feature engineering tries to figure out which are the best features. So that’s a pretty cool feature of Driverless AI, which takes time because there’s a lot of models which get built there. For instance, in this example, if you see there were 720 models built during the feature evolution phase, so it’s obvious that it would take the longest amount of time. And then to create the final pipeline, it took 96 seconds and there were six models which were built.

This is the validation strategy, which was used for this particular experiment. There was a five-fold cross-validation which was used because the training dataset was not provided. From the training dataset, it created five folds and then, in the subsequent, each of them was used as a validation, and the rest was used as a training dataset. What validation strategy was used? That will be well-documented in this. If there was a test dataset or a validation dataset that was provided to Driverless AI, it will not do a five-fold cross-validation, but it would just use the validation dataset for validation. This strategy or the one that you see here would look different because it would be using the validation dataset, not the cross-validation technique.

This documents though for the model tuning. The first table shows the job order. Job order is something which was internal to Driverless AI, but each row is a model which was tried during the tuning phase of Driverless AI. The job order 14, or basically the job 14 tried tuning a LightGBM model and the number of features was 63. Since this a AUC, which we were trying to optimize on, the scores will be all AUC. So the score of the model was 0.94, and the training time. Similarly, it will give you the details of all the models which are used during model tuning.

It gives more details of each and every model, as in what was the tree method that was used. So it says, okay, GPU histogram was used. And the grow policy was log loss. The max depth. The number of maximum of leaves. And the other details of the models that were used during the model tuning phase.

Then comes the Light GBM tuning. For Light GBM, they used GPU hist tree method. The grow policy for the first one was depthwise, the next one lossguide, and the last one again was depthwise. And you get the other parameters as well that was tried to tune the model.

So after tuning the model, you finalized which is the final model and what are the final hyperparameters which you want to use. So let’s go there. These are the few features and the feature engineering details that was done during the experiment. But this not an exhaustive list, it just shows the details of a few features which were a part of the final pipeline. If you look at the table on the left side, the first one says 6_TxtTE.

For instance, if you’re looking at the feature name, you might not be able to figure out what exactly happened to create this feature. So we have a description here. It says: “Predicted probability of class one based on the linear model on Tfidf feature from text column.” Description: internal parameters 2,0.01. So you get a better understanding of what the feature or created feature means.

The next column says which transformer was used to it. So it uses a bag of words regressor to create this feature. Then, the final one is the relative importance of this particular feature for the final model which was built. The next feature is a cross-validated target encoding. It will give you a description of it and then the relative importance. So you get this kind of table for each and every feature. This table would have the details of each and every feature for the part of the final pipeline.

If you look at the right side, it says the dates transformer, which Driverless AI uses. It retrieves any date or time values, including year, quarter, et cetera. It’s showing you the date transformer because one of the features, which would have been created for this particular experiment, was transforming the date, and that’s why it’s there.

CV and then the next one is cross-validated target encoding DT. If you see the second feature on the left side, it has a CVTE description. That’s why there is some text around this particular transformer, which was used to create a particular feature for this particular experiment. Basically, what I am trying to emphasize here is it will show you the details of only those features of the transformers, which were used in this particular experiment and not all the features which were there. Or all the transformers which Driverless AI has and it can create features using them.

Okay. I’ll move onto the next slide. This is a document, the final model. There were two LightGBM models. It says the final stacked ensemble pipeline with ensemble level two transforming three original features into 22 features in each of six models, each fitted three internal holdout splits then linearly blend them. So the final model is a linear blend of the two models which were built. The model weight is 0.66 for the model index zero; and 0.33 for the model index one. It gives you the validation scheme for each of these models and the number of features that were used in each of these models.

The modern index zero model is using 22 features, but the other model is using 21 features on which the model was trained. It has a few other details of the model index zero and one. In the tree method that was used, GPU hist and then the maximum leaves that was there, maximum depth, sub sampling rate, grow policy, and the colsample by tree. Similarly, it tells details of the other model. Then, you see that on the ROC curve for the model and then the PR curve for the model. This ROC and PR curve is for the final model, which is a blend of the model index zero and model index makes one, and not for the individual models.

This is another example wherein the idea is just to show that the AutoDoc will look different based on the experiment that you have run. I’ll skip the slides which would look similar or are not different from the previous example, but I’ll come to the ones which might be interesting. Here, if you look at the numerical columns, so it has ID, host ID, latitude, longitude, et cetera, for which you have the summary stats.

These are the input columns to this particular experiment. As you can see, the data overview changes and it has only the details of the dataset that a particular experiment uses. Here, so the model tuning and feature. Here we are trying to optimize MAPE and not AUC. It’s showing those details. And then, the final model here is a stacked ensemble of one Light GBM model and one XGBoost GBM model. Here in the feature evolution and the final model, we’ll see the details of one Light GBM model and one XGBoost GBM model, but in the previous example they were just two Light GBM models.

Here, it’s again during the model tuning what all was tried and so it documented that. These are the list of features here. If you look on the right side, there are no details of the date transformer here because the date transformer was not used for this particular experiment. So we have only details on the cluster distance transformer, cluster ID transformer, et cetera, et cetera.

Then, the final model here, the two models in the final pipeline is a Light GBP model and XGBoost GBM model. Yeah. So that’s the difference for this particular experiment compared to the previous one. And then, here, for the validation, so this is actual versus predicted for the validation and test dataset because for this particular experiment, we had validation as well as test dataset apparently for that.

So we come to last section wherein the template which we went through for the AutoDoc of Driverless AI, that’s the standard template. But you can customize it. There are multiple different things which you can customize. For instance, in the model Diagnostics, you can add in graphs or the tables which would depict population stability index, or the prediction statistics per quantile, actual versus predicted, GINI plot, et cetera. So that you can choose whether to show them in the document or not to show them.

In the diagnostic linear dataset, you can configure whether you want to perform the model diagnostic on the list of new datasets and show them in the AutoDoc or not. For the model interpretability, you can now configure AutoDoc the way that we to show the variable importance, or not to show them similarly for the partial dependency plots and the ICE slot.

These are the standard techniques, which we use in Driverless AI for interpreting the model or understanding the model way better. So you can choose whether to show them or should those be a part of the AutoDoc or not. And how you do that? For instance, say, if you add in a config overwrite, which says, and if it says the autodoc_population_stability_index=true, then the AutoDoc will have the population stability index, table index, which would look something as in the right side. Similarly, for the prediction statistics per quantile, if you add in autodoc_predictions_statistical=true in config overrides, it will give you this. Similarly for the variable importance. And same goes for partial dependency plots.

This is just the partial dependency plot. Then, this is the partial dependency plot with ICE. So you can choose which one you want in the AutoDoc or which you don’t want. So the AutoDoc is highly templated and you can choose what subsections to show in there, and inside each subsection, what all plots and what all tables should be there, or the ones that you don’t want. So all of it is configurable using the config. So every time we are just overriding the config value, like in this case, if you want the response rate plot, you just provide an AutoDoc response rate equal to true. If you provide it as false, this plot will not be a part of the final document or the final AutoDoc which gets generated similarly for the GINI plots.

Yes. I think I missed the [inaudible]. But, yeah, that’s was more or less what I had to present. Now maybe we can take some questions, Saurabh.

Saurabh Kumar:

Absolutely. Thank you so much, Nikhil.

Nikhil Shekhar:

Yeah. Okay. I’ll stop sharing my screen. Yeah. Can you repeat the question again, sir?

Saurabh Kumar:

Yeah. How is AutoDoc different for the open source H2O-3 and the enterprise platform that was Driverless AI?

Nikhil Shekhar:

Yes. For Driverless AI, we have a lot more details captured in the AutoDoc. As in for H2O-3, you won’t have the feature engineering details like what features were tried it or how many models were built to create those features, et cetera. If you’re just using me plain [inaudible] H2O-3 without the Auto-ML, in the final model pipeline, you will just have one final model. In case of Driverless AI, you usually end up with a stacked ensemble or a linear blend of multiple different models.

So all those additional details are there, are captured there in the AutoDoc for Driverless AI which won’t be there for H2O-3 model or a Scikit-learn model. But when you build an H2O-3 models, the model parameter, et cetera, will all be there. But subsections, as I was saying, the feature engineering, the model tuning and the … ML? the externalability, the partial dependency plot. All of them will not be there. Foreign AutoDoc, which is generated for an open source H2O-3 of a Scikit-learn for that case.

Does that answer the question? I hope so.

Saurabh Kumar:

Yes, absolutely.

Nikhil Shekhar:

Thank you.

Saurabh Kumar:

The next one asks, “How does the auto-generated document fulfill compliance requirements for all countries?”

Nikhil Shekhar:

Yeah. The format which we ran through today, so that’s something which is the regularatory authorities in the US have proposed. And the AutoDoc is very templated. So now there are regulatory authorities in, for instance, Southeast Asia, Singapore that are coming up. So we have a slightly different template for those. And then the AutoDoc, instead of creating the template, which we start today, it would be populating the details in the template which is accepted by a regulatory authority in Singapore.

Conceptually, AutoDoc is a template-based and you can choose and select what all details you want in the AutoDoc and what all should not be there. So as we went through in the last section or during the end of our presentation, that you can choose what should be a part of the AutoDoc or what should not be a part of it. So it’s all very templated and then you can configure the AutoDoc to be enough particular format, which is accepted by a regulatory authority in the particular country.

But the template which we went through is widely accepted in India and the US. We have a slightly different template for Singapore. I don’t think India right now has a template as such, but we can use one of the templates that’s there. But if there is a new template in which you want to document it, that can also be plugged in very easily.

Saurabh Kumar:

Thank you. The next question is, “How much is the scope for document classification using AI as a research topic?”

Nikhil Shekhar:

I don’t know the answer to that. In the next version of Driverless AI which should be out in the first week of July, we have a lot of images and document classification which would come pre-built with Driverless AI. So you just install Driverless AI and you can do a lot of image-related use cases, a lot of NLP document-related use cases, et cetera.

So you can definitely have a look there, but a lot of… We have seen a lot of customers solving a lot of problems using images of document. For instance, if your salary, for instance, and there’s audit which you want to conduct whether everything is in place or a balance sheet of a company, so you can use different ML techniques for instance, there’s different variants of OCR to read images and then have your data stored in a particular format and then you can do a lot of analysis on top of it.

So there are a lot of use cases which people are trying to solve using an NLP on documents, which are in form of images or maybe text, et cetera. So there’s a lot of research that is going on, active research that’s going on. And then NLP… Before COVID, I would say like the NLP and the research in NLP was at an all-time high, so 2019 beginning was the, for instance, a lot of research papers and new techniques came out for images. But mid 2019 until before COVID-19, there was a lot of breakthroughs which were happening in the NLP domain. So that’s what I’m saying.

Saurabh Kumar:

Thank you, Nikhil. The next few questions are on accessing Driverless AI, pricing, free trials. So for that, please go to our website. There’s a free trial available, you can take it for a spin on-prem on our cloud environment. It’s fairly straightforward. If there’s any hiccups, we’re always available. There’s another question…

Nikhil Shekhar:

Just to add to that, sort of, if there are any students here, so we have a university version of Driverless AI, which you can down… If you have a university username or ID, you can download and use Driverless AI for six months and then you can keep renewing it till you are a student. So that’s kind of free for students to play around on. We also have university partnerships, so feel free to reach out to Saurabh for any of it. Thank you.

Saurabh Kumar:

Awesome. Thank you, Nikhil. We are at the top of the hour, if you have any concluding comments, or else we’ll just wrap it up.

Nikhil Shekhar:

No. I would just like to say thanks a lot guys for joining in. This perhaps was the first webinar for AutoDoc which we released a couple of weeks back. So thanks a lot for joining in today. Thank you, Saurabh, for [crosstalk].

Saurabh Kumar:

Thank you, Nikhil. Thank you to all our attendees for taking the time. Like we mentioned before, a copy of the presentation and the recording will be distributed in sometime. Have a good day.

Nikhil Shekhar:

Thank you. Thanks a lot, guys. Bye-bye.

Generative AI

Predictive AI

On-Premise Platform

Managed Cloud

Hybrid Cloud

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

ENERGY

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

Submit AI 100 2025 Nomination

2025 Gartner® Magic Quadrant™

H2O AI 100 2024

Automatic Model Documentation with H2O

Read the Full Transcript

Ready to see the H2O.ai platform in action?

Why H2O.ai

Products

Resources

Insights