Return to page Announces H2O Document AI to Automate Document Processing

H2O Document AI Includes’s Latest Innovations in Machine Learning to Automate Processing a Vast Variety of Document Types, for Organizations of All Kinds, With Accuracy and Speed not Previously Possible

MOUNTAIN VIEW, Calif.—December 9, 2021— Today, leading AI cloud provider, announced the general availability of H2O Document AI, a machine learning service that understands, processes, and manages the large volume and types of documents and unstructured text data that businesses and organizations handle every day. H2O Document AI streamlines processes, reduces costs, and discovers new information and insights contained in documents. H2O Document AI “learns as it goes,” continuously improving processing accuracy using’s latest innovations in machine learning and deep learning to achieve automation across business verticals and use cases not previously possible. To get started with H2O Document AI, visit

Core business processes in today’s digital world rely on documents and unstructured or semi-structured data that contain valuable information critical to business operations. The definition of a ‘document’ continues to expand, and includes PDFs, emails, scans, images, paper and web forms, faxes and e-faxes, chats from chatbots, free-form text, and more. 80% of enterprise data is unstructured or in a format that is not machine-readable or readily available. However, traditional AI document processing solutions use Optical Character Recognition (OCR) or Robotic Process Automation (RPA), which are limited by rules-based and template-driven constraints. Additionally, the OCR and RPA solutions have limited capabilities to self-learn. Not surprisingly, the results from these existing document processing solutions are often lackluster. Without other options, some organizations make do with sub-optimal products/solutions that don’t scale and increase inefficiencies. Other organizations have not adopted any automation and rely on people to review, process, and act on documents. The manual labor required to process documents is tedious, can be error-prone, and keeps workers from doing more meaningful, impactful, and enjoyable work. In either scenario, using existing automation or no automation, handling documents is expensive and time-consuming—companies spend an average of $20 to file and store a single documentemployees spend up to 50% of their time searching for information and can take, on average, 18 minutes to locate a single document.

H2O Document AI helps organizations quickly and accurately process documents and unstructured text data to increase productivity and find hidden insights. H2O Document AI provides automation not previously possible by combining state-of-the-art Intelligent Character Recognition (ICR), Natural Language Processing (NLP), computer vision and layout intelligence. H2O Document AI comes pre-built with recipes created by’s two dozen Grandmasters (best in the world data scientists); a sophisticated labeling and training workflow; self-service capabilities to create, deploy, and manage high-accuracy AI models to classify documents and pages, in addition to extracting value, and meaning. The service has flexible out-of-the-box document pre- and post-processing and seamlessly integrates with customers’ existing business processes and workflows. H2O Document AI not only identifies entities and classifies them, but also understands the document and constituent pages, sections, and layout to provide additional context that can help drive decision-making and lead to an improved end-customer experience.

Document data can easily be used to build smart searches on large archives of documents, or it can be loaded into a datastore to be queried, analyzed, audited, or used by other applications. H2O Document AI provides a workflow that goes from labeling to training to scoring to low touch integrations and consumption options. It is customizable and extensible, and works in conjunction with H2O AI Cloud, so customers can easily scale to additional use cases. And, the types of documents that benefit from AI-enabled ICR and NLP are numerous, including mortgage paperwork, receipts, applications of all kinds (life insurance, jobs, loans), tax documents, employment records, and more.

“Our banking, insurance, health, audit, and public sector customers each process billions of documents every year. Documents are the fastest growing source of data in the enterprise, ranging from contracts, bank statements, invoices, payroll reports, regulatory reports, and medical referrals to customer conversations in text, chat, and email,” said Sri Ambati, CEO and founder, “H2O Document AI enables customers to sieve intelligence across a wide variety of document types not possible before, with unprecedented accuracy and speed. With H2O Document AI businesses can now seamlessly integrate insights from documents to their feature stores and transactional systems to delight their customers.”

Health systems, as an example, receive millions of faxed documents annually, including patient referrals, prescription refill requests, durable medical equipment requests, lab results, and school forms. Without AI technology, the documents must be reviewed by people multiple times to determine the document type, which patient it pertains to, and what specifically must be done to further process or respond to the document’s contents.

The Center for Digital Health Innovation (CDHI) at the University of California, San Francisco (UCSF), is collaborating with to develop and train AI algorithms to recognize these various document types, analyze the contents, extract relevant data, and appropriately route information and requests to systems or individuals as necessary for follow-through. Once trained, these algorithms will enable CDHI’s referral automation software to significantly speed the processing of 1.4 million faxes UCSF Health receives each year.

“When we started this journey, we were hopeful that information extraction from semi-structured documents was possible, but we weren’t sure. Some in the industry told us it couldn’t be done. Working with has opened up many possibilities,” said Bob Rogers, Expert in Residence for AI, UCSF Center for Digital Health Innovation. “This collaboration to create cutting-edge information extraction and workflow automation technology for faxed documents has energized both teams, and we expect it to be the template for the future of healthcare AI.”

About is the leading AI cloud company, on a mission to democratize AI for everyone. Customers use the H2O AI Cloud to rapidly solve complex business problems and accelerate the discovery of new ideas. is the trusted AI provider to more than 20,000 global organizations, including AT&T, Allergan, Bon Secours Mercy Health, GlaxoSmithKline, Hitachi, Kaiser Permanente, Procter & Gamble, PayPal, PwC, Reckitt, Unilever, Walgreens, over half of the Fortune 500, and one million data scientists. Commonwealth Bank of Australia, Goldman Sachs, NVIDIA, and Wells Fargo are not only customers and partners, but strategic investors in the company.’s customers have honored the company with a Net Promoter Score (NPS) of 78, the highest in the industry, based on breadth of technology and deep employee expertise. Over 20 of the world’s top Kaggle Grandmasters (the community of best-in-the-world machine learning practitioners and data scientists) are employees. A strong AI for Good ethos to make the world a better place and leading in Responsible AI drive the company’s purpose. Please join our movement at