November 19th, 2020
Automate your Model Documentation using H2O AutoDoc
RSS Share Category: Data Science, H2O Driverless AI
By: Parul Pandey
Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes.
The Federal Reserve’s 2011 guidelines state that without adequate documentation, model risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus model documentation today is more of a necessity than a choice. However, there is still no denying the fact that it is one of the most time-consuming jobs for a data scientist. As opposed to building and validating machine learning models, describing how a model works in detail is tedious and takes a considerable amount of time and effort. There are also issues of consistency, clarity, and collaboration.
What if there was a way to automate the entire documentation process? Well, this is precisely the issue that the H2O AutoDoc tries to address by creating comprehensive, high-quality model documentation in minutes. H2O AutoDoc frees up the user from the time-consuming task of documenting and summarizing their workflow while building machine learning models. Additionally, it also increases the consistency of model documentation by applying a standard template across all models, essential for model governance, reproducibility, and regulatory compliance. In a way, it is using AI to explain AI.
Challenges in creating a robust documentation
Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, information on model performance, etc. Today, documenting the models is both necessary as a best practice and a vital requirement from the business point of view.
But creating good documentation isn’t a piece of cake, and at times, many teams struggle with it. The process is often tedious and time-consuming for the business because the data scientist could be using that time to build additional models and create more value. Additionally, inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.
A better idea: Automate the documentation process itself with H2O AutoDoc.
H2O AutoDoc
Automated Model Documentation (H2O AutoDoc) is a new time-saving ML documentation product from H2O.ai. H2O AutoDoc can automatically generate model Documentation for supervised learning models created in H2O-3 and Scikit-Learn. Interestingly, automated documentation is already used in production in H2O Driverless AI. This industry-leading capability is now available as a new standalone commercial module.
Key Features of H2O AutoDoc
H2O AutoDoc is a Python package for creating automatic reports for supervised learning models.
- Distributed automatic document generation in Microsoft Word (.docx) and Markdown (.md) formats.
- Out-of-the-box documentation template included
- Customizable templates to fit unique business needs, internal best practices, and compliance requirements
- Support for a variety of supervised models generated in H2O-3 and Scikit-Learn
Advantages of using H2O AutoDoc
H2O AutoDoc provides various advantages over the traditional method of manual documentation:
- H2O AutoDoc ensures compliance and provides a consistent, accurate, and thorough approach to model documentation.
- It can be shared with production teams and other data scientists, thereby improving collaboration amongst teams.
- Saves time and money by automatically creating model documents instead of having valuable resources writing and editing documents
H2O AutoDoc in Action
We know that H2O AutoDoc can automatically generate model documentation for supervised learning models created in H2O-3 and Scikit-Learn. Let’s see some of the ways by which we can generate the automatic report:
1. H2O AutoDoc for models created in H2O-3
H2O-3 is a fully open-source, distributed in-memory machine learning platform with linear scalability. The speed, quality, ease-of-use, and model-deployment for the various cutting-edge algorithms make H2O a highly sought-after API for big data data science. H2O also has an industry-leading AutoML functionality that can be used for automating the machine learning workflow.
The documentation can be generated in an editable word or a markdown format as follows:
from h2o_autodoc import Config from h2o_autodoc import render_autodoc# get the H2O-3 model object required to create an H2O AutoDoc model = h2o.get_model(“my_gbm_model”)# configure and render an AutoDoc Config = Config(output_path=”full/path/AutoDoc_H2O3.docx”) render_autodoc(h2o, config, model)
2. H2O AutoDoc for models created in Scikit-learn
Scikit-learn is an open-source software machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms. The process to create automatic documentation for models created in scikit learn is also pretty similar to the ones created in H2O-3 and is as follows:
from h2o_autodoc import Config from h2o_autodoc.scikit.autodoc import render_autodoc# build a logistic regression model model = LogisticRegression() model.fit(X_train, y_train)# configure and render an AutoDoc Config = Config(output_path=”full/path/AutoDoc_ScikitLearn.docx”) render_autodoc(config, model, X_train, y_train)
3. Steam: H2O AutoDoc
The H2O AutoDoc is currently available in Steam, another H2O.ai product that allows you to launch or connect to an H2O Cluster securely. This version of the H2O AutoDoc leverages the Steam Python API. The code follows the same structure as the H2O AutoDoc Python API, and the generated report is almost identical. One difference is that the Steam version only supports H2O-3 models.
import h2osteam from h2osteam.clients import H2oClient# login to steam h2osteam.login(url=”https://steam.h2o.ai:9555", username=”user01", password=”token-here”, verify_ssl=True) cluster = H2oClient.get_cluster(“test-cluster”)from h2osteam import AutoDocConfig# get H2O-3 objects using their keys model = h2o.get_model(“gbm_model”) train = h2o.get_frame(“CreditCard_TRAIN”)# use default configuration settings config = AutoDocConfig()# specify the path to the output file output_file_path = “autodoc_report.docx”# download an H2O AutoDoc cluster.download_autodoc(model, config, train, output_file_path)
Documentation Features
H2O AutoDoc generates an editable Word document based on an automated template that includes several features. Some of the important ones have been tabulated below:
* For supported algorithms
Try H2O AutoDocDo you want to get your hands dirty and experience the power that H2O AutoDoc brings to your machine learning project? We have made it easy for you. You can :
- Register for the trial license here and then try H2O AutoDoc in your environment.
- Our team will reach out, provide a 30-day trial license, and help you get up and running.
- Experiment and use it with your H2O-3 and scikit-learn models.
Conclusion
H2O AutoDoc automatically generates comprehensive model documentation in minutes using out-of-the-box or custom templates. H2O AutoDoc saves data science teams weeks of tedious work and increases data science productivity by allowing them to focus on model building. H2O AutoDoc increases the consistency of model documentation by applying a standard template across all models and teams, which is essential for model governance, reproducibility, and compliance with regulations.