November 19th, 2020

Automate your Model Documentation using H2O AutoDoc

RSS icon RSS Category: Data Science, H2O Driverless AI

Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes.

The Federal Reserve’s 2011 guidelines state that without adequate documentation, model risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus model documentation today is more of a necessity than a choice. However, there is still no denying the fact that it is one of the most time-consuming jobs for a data scientist. As opposed to building and validating machine learning models, describing how a model works in detail is tedious and takes a considerable amount of time and effort. There are also issues of consistency, clarity, and collaboration.

What if there was a way to automate the entire documentation process? Well, this is precisely the issue that the H2O AutoDoc tries to address by creating comprehensive, high-quality model documentation in minutes. H2O AutoDoc frees up the user from the time-consuming task of documenting and summarizing their workflow while building machine learning models. Additionally, it also increases the consistency of model documentation by applying a standard template across all models, essential for model governance, reproducibility, and regulatory compliance. In a way, it is using AI to explain AI.

Challenges in creating a robust documentation

Image for post
Challenges associating with manually documenting models

But creating good documentation isn’t a piece of cake, and at times, many teams struggle with it. The process is often tedious and time-consuming for the business because the data scientist could be using that time to build additional models and create more value. Additionally, inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.

A better idea: Automate the documentation process itself with H2O AutoDoc.

H2O AutoDoc

Image for post

Automated Model Documentation (H2O AutoDoc) is a new time-saving ML documentation product from H2O.ai. H2O AutoDoc can automatically generate model Documentation for supervised learning models created in H2O-3 and Scikit-Learn. Interestingly, automated documentation is already used in production in H2O Driverless AI. This industry-leading capability is now available as a new standalone commercial module.

Key Features of H2O AutoDoc

  • Distributed automatic document generation in Microsoft Word (.docx) and Markdown (.md) formats.
  • Out-of-the-box documentation template included
  • Customizable templates to fit unique business needs, internal best practices, and compliance requirements
  • Support for a variety of supervised models generated in H2O-3 and Scikit-Learn

Advantages of using H2O AutoDoc

Image for post

H2O AutoDoc provides various advantages over the traditional method of manual documentation:

  • H2O AutoDoc ensures compliance and provides a consistentaccurate, and thorough approach to model documentation.
  • It can be shared with production teams and other data scientists, thereby improving collaboration amongst teams.
  • Saves time and money by automatically creating model documents instead of having valuable resources writing and editing documents

H2O AutoDoc in Action

1. H2O AutoDoc for models created in H2O-3

Image for post

H2O-3 is a fully open-source, distributed in-memory machine learning platform with linear scalability. The speed, quality, ease-of-use, and model-deployment for the various cutting-edge algorithms make H2O a highly sought-after API for big data data science. H2O also has an industry-leading AutoML functionality that can be used for automating the machine learning workflow.

The documentation can be generated in an editable word or a markdown format as follows:

Image for post
from h2o_autodoc import Config
from h2o_autodoc import render_autodoc# get the H2O-3 model object required to create an H2O AutoDoc
model = h2o.get_model(“my_gbm_model”)# configure and render an AutoDoc
Config = Config(output_path=”full/path/AutoDoc_H2O3.docx”)
render_autodoc(h2o, config, model)
Image for post

2. H2O AutoDoc for models created in Scikit-learn

Image for post

Scikit-learn is an open-source software machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms. The process to create automatic documentation for models created in scikit learn is also pretty similar to the ones created in H2O-3 and is as follows:

Image for post
from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc# build a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)# configure and render an AutoDoc
Config = Config(output_path=”full/path/AutoDoc_ScikitLearn.docx”)
render_autodoc(config, model, X_train, y_train)
Image for post

3. Steam: H2O AutoDoc

import h2osteam
from h2osteam.clients import H2oClient# login to steam
h2osteam.login(url=”https://steam.h2o.ai:9555", username=”user01", password=”token-here”, verify_ssl=True)
cluster = H2oClient.get_cluster(“test-cluster”)from h2osteam import AutoDocConfig# get H2O-3 objects using their keys
model = h2o.get_model(“gbm_model”)
train = h2o.get_frame(“CreditCard_TRAIN”)# use default configuration settings
config = AutoDocConfig()# specify the path to the output file
output_file_path = “autodoc_report.docx”# download an H2O AutoDoc
cluster.download_autodoc(model, config, train, output_file_path)

Documentation Features

Image for post

* For supported algorithms

Try H2O AutoDocImage for postDo you want to get your hands dirty and experience the power that H2O AutoDoc brings to your machine learning project? We have made it easy for you. You can :

  • Register for the trial license here and then try H2O AutoDoc in your environment.
  • Our team will reach out, provide a 30-day trial license, and help you get up and running.
  • Experiment and use it with your H2O-3 and scikit-learn models.

Conclusion

H2O AutoDoc automatically generates comprehensive model documentation in minutes using out-of-the-box or custom templates. H2O AutoDoc saves data science teams weeks of tedious work and increases data science productivity by allowing them to focus on model building. H2O AutoDoc increases the consistency of model documentation by applying a standard template across all models and teams, which is essential for model governance, reproducibility, and compliance with regulations.

About the Author

Parul Pandey

Parul focuses on the intersection of H2O.ai, data science and community. She works as a Principal Data Scientist and is also a Kaggle Grandmaster in the Notebooks category.

Leave a Reply

+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel
LLM blog header
+
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable,

May 1, 2023 - by Parul Pandey

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More