Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes.
The Federal Reserve’s 2011 guidelines state that without adequate documentation, model risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus model documentation today is more of a necessity than a choice. However, there is still no denying the fact that it is one of the most time-consuming jobs for a data scientist. As opposed to building and validating machine learning models, describing how a model works in detail is tedious and takes a considerable amount of time and effort. There are also issues of consistency, clarity, and collaboration.
What if there was a way to automate the entire documentation process? Well, this is precisely the issue that the H2O AutoDoc tries to address by creating comprehensive, high-quality model documentation in minutes. H2O AutoDoc frees up the user from the time-consuming task of documenting and summarizing their workflow while building machine learning models. Additionally, it also increases the consistency of model documentation by applying a standard template across all models, essential for model governance, reproducibility, and regulatory compliance. In a way, it is using AI to explain AI.
Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, information on model performance, etc. Today, documenting the models is both necessary as a best practice and a vital requirement from the business point of view.
But creating good documentation isn’t a piece of cake, and at times, many teams struggle with it. The process is often tedious and time-consuming for the business because the data scientist could be using that time to build additional models and create more value. Additionally, inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.
A better idea: Automate the documentation process itself with H2O AutoDoc.
Automated Model Documentation (H2O AutoDoc) is a new time-saving ML documentation product from H2O.ai. H2O AutoDoc can automatically generate model Documentation for supervised learning models created in H2O-3 and Scikit-Learn . Interestingly, automated documentation is already used in production in H2O Driverless AI . This industry-leading capability is now available as a new standalone commercial module.
H2O AutoDoc is a Python package for creating automatic reports for supervised learning models.
H2O AutoDoc provides various advantages over the traditional method of manual documentation:
We know that H2O AutoDoc can automatically generate model documentation for supervised learning models created in H2O-3 and Scikit-Learn. Let’s see some of the ways by which we can generate the automatic report:
H2O-3 is a fully open-source, distributed in-memory machine learning platform with linear scalability. The speed, quality, ease-of-use, and model-deployment for the various cutting-edge algorithms make H2O a highly sought-after API for big data data science. H2O also has an industry-leading AutoML functionality that can be used for automating the machine learning workflow.
The documentation can be generated in an editable word or a markdown format as follows:
from h2o_autodoc import Config from h2o_autodoc import render_autodoc# get the H2O-3 model object required to create an H2O AutoDoc model = h2o.get_model(“my_gbm_model”)# configure and render an AutoDoc Config = Config(output_path=”full/path/AutoDoc_H2O3.docx”) render_autodoc(h2o, config, model)
Scikit-learn is an open-source software machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms. The process to create automatic documentation for models created in scikit learn is also pretty similar to the ones created in H2O-3 and is as follows:
from h2o_autodoc import Config from h2o_autodoc.scikit.autodoc import render_autodoc# build a logistic regression model model = LogisticRegression() model.fit(X_train, y_train)# configure and render an AutoDoc Config = Config(output_path=”full/path/AutoDoc_ScikitLearn.docx”) render_autodoc(config, model, X_train, y_train)
The H2O AutoDoc is currently available in Steam, another H2O.ai product that allows you to launch or connect to an H2O Cluster securely. This version of the H2O AutoDoc leverages the Steam Python API. The code follows the same structure as the H2O AutoDoc Python API, and the generated report is almost identical. One difference is that the Steam version only supports H2O-3 models.
import h2osteam from h2osteam.clients import H2oClient# login to steam h2osteam.login(url=”https://steam.h2o.ai:9555", username=”user01", password=”token-here”, verify_ssl=True) cluster = H2oClient.get_cluster(“test-cluster”)from h2osteam import AutoDocConfig# get H2O-3 objects using their keys model = h2o.get_model(“gbm_model”) train = h2o.get_frame(“CreditCard_TRAIN”)# use default configuration settings config = AutoDocConfig()# specify the path to the output file output_file_path = “autodoc_report.docx”# download an H2O AutoDoc cluster.download_autodoc(model, config, train, output_file_path)
H2O AutoDoc generates an editable Word document based on an automated template that includes several features. Some of the important ones have been tabulated below:
* For supported algorithms
Try H2O AutoDocDo you want to get your hands dirty and experience the power that H2O AutoDoc brings to your machine learning project? We have made it easy for you. You can :
Conclusion
H2O AutoDoc automatically generates comprehensive model documentation in minutes using out-of-the-box or custom templates. H2O AutoDoc saves data science teams weeks of tedious work and increases data science productivity by allowing them to focus on model building. H2O AutoDoc increases the consistency of model documentation by applying a standard template across all models and teams, which is essential for model governance, reproducibility, and compliance with regulations.