Return to page

BLOG

Automate your Model Documentation using H2O AutoDoc

 headshot

By Parul Pandey | minute read | November 19, 2020

Blog decorative banner image

Create model documentation for Supervised learning  models in H2O-3 and Scikit-Learn — in minutes.

The Federal Reserve’s 2011 guidelines  state that without adequate documentation, model  risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus model documentation today is more of a necessity than a choice. However, there is still no denying the fact that it is one of the most time-consuming jobs for a data scientist. As opposed to building and validating machine learning models, describing how a model works in detail is tedious and takes a considerable amount of time and effort. There are also issues of consistency, clarity, and collaboration.

What if there was a way to automate the entire documentation process? Well, this is precisely the issue that the H2O AutoDoc tries to address by creating comprehensive, high-quality model documentation in minutes. H2O AutoDoc frees up the user from the time-consuming task of documenting and summarizing their workflow while building machine learning models. Additionally, it also increases the consistency of model documentation by applying a standard template across all models, essential for model governance, reproducibility, and regulatory compliance. In a way, it is using AI to explain AI.

Challenges in creating a robust documentation

Image for post
Challenges associating with manually documenting models

But creating good documentation isn’t a piece of cake, and at times, many teams struggle with it. The process is often tedious and time-consuming for the business because the data scientist could be using that time to build additional models and create more value. Additionally, inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.

A better idea: Automate the documentation process itself with H2O AutoDoc.

H2O AutoDoc

Image for post

Automated Model Documentation (H2O AutoDoc) is a new time-saving ML documentation product from H2O.ai. H2O AutoDoc can automatically generate model Documentation for supervised learning models created in H2O-3  and Scikit-Learn . Interestingly, automated documentation is already used in production in H2O Driverless AI . This industry-leading capability is now available as a new standalone commercial module.

Key Features of H2O AutoDoc

  • Distributed automatic document generation in Microsoft Word (.docx) and Markdown (.md) formats.
  • Out-of-the-box documentation template included
  • Customizable templates to fit unique business needs, internal best practices, and compliance requirements
  • Support for a variety of supervised models generated in H2O-3 and Scikit-Learn

Advantages of using H2O AutoDoc

Image for post

H2O AutoDoc provides various advantages over the traditional method of manual documentation:

  • H2O AutoDoc ensures compliance and provides a consistentaccurate, and thorough approach to model documentation.
  • It can be shared with production teams and other data scientists, thereby improving collaboration amongst teams.
  • Saves time and money by automatically creating model documents instead of having valuable resources writing and editing documents

H2O AutoDoc in Action

1. H2O AutoDoc for models created in H2O-3

Image for post

H2O-3 is a fully open-source, distributed in-memory machine learning platform with linear scalability. The speed, quality, ease-of-use, and model-deployment for the various cutting-edge algorithms make H2O a highly sought-after API for big data data science. H2O also has an industry-leading AutoML functionality that can be used for automating the machine learning workflow.

The documentation can be generated in an editable word or a markdown format as follows:

Image for post
from h2o_autodoc import Config
from h2o_autodoc import render_autodoc# get the H2O-3 model object required to create an H2O AutoDoc
model = h2o.get_model(“my_gbm_model”)# configure and render an AutoDoc
Config = Config(output_path=”full/path/AutoDoc_H2O3.docx”)
render_autodoc(h2o, config, model)
Image for post

2. H2O AutoDoc for models created in Scikit-learn

Image for post

Scikit-learn is an open-source software machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms. The process to create automatic documentation for models created in scikit learn is also pretty similar to the ones created in H2O-3 and is as follows:

Image for post
from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc# build a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)# configure and render an AutoDoc
Config = Config(output_path=”full/path/AutoDoc_ScikitLearn.docx”)
render_autodoc(config, model, X_train, y_train)
Image for post

3. Steam: H2O AutoDoc

import h2osteam
from h2osteam.clients import H2oClient# login to steam
h2osteam.login(url=”https://steam.h2o.ai:9555", username=”user01", password=”token-here”, verify_ssl=True)
cluster = H2oClient.get_cluster(“test-cluster”)from h2osteam import AutoDocConfig# get H2O-3 objects using their keys
model = h2o.get_model(“gbm_model”)
train = h2o.get_frame(“CreditCard_TRAIN”)# use default configuration settings
config = AutoDocConfig()# specify the path to the output file
output_file_path = “autodoc_report.docx”# download an H2O AutoDoc
cluster.download_autodoc(model, config, train, output_file_path)

Documentation Features

Image for post

* For supported algorithms

Try H2O AutoDocImage for postDo you want to get your hands dirty and experience the power that H2O AutoDoc brings to your machine learning project? We have made it easy for you. You can :

  • Register for the trial license here and then try H2O AutoDoc in your environment.
  • Our team will reach out, provide a 30-day trial license, and help you get up and running.
  • Experiment and use it with your H2O-3 and scikit-learn models.

Conclusion

H2O AutoDoc automatically generates comprehensive model documentation in minutes using out-of-the-box or custom templates. H2O AutoDoc saves data science teams weeks of tedious work and increases data science productivity by allowing them to focus on model building. H2O AutoDoc increases the consistency of model documentation by applying a standard template across all models and teams, which is essential for model governance, reproducibility, and compliance with regulations.

 headshot

Parul Pandey

Parul focuses on the intersection of H2O.ai, data science and community. She works as a Principal Data Scientist and is also a Kaggle Grandmaster in the Notebooks category.