Return to page Blog

Filter By:

509 results Category: Year:
Fine Tuning The H2O Danube2 LLM for The Singlish Language
by Dipam Chakraborty, Kavindu Warnakulasuriya, Jordan Seow | June 03, 2024

Singlish is an informal version of English spoken in Singapore. The primary variations lie in the style and structure of the text, and inclusion of elements of Chinese and Malay. Though Singlish is the common tongue in Singapore, it isn’t well defined or formalized. We fine tuned’s Danube-2 1.8B LLM on Singlish instruction data, wi...

Read more
Announcing H2O Danube 2: The next generation of Small Language Models from
by Michelle Tanco, Philipp Singer, Pascal Pfeiffer, Yauhen Babakhin | April 23, 2024 Danube , Generative AI , H2O Danube-1.8b , Large Language Models , Open Source , Product Updates

A new series of Small Language Models from, released under Apache 2.0 and ready to be fine-tuned for your specific needs to run offline and with a smaller footprint. Why Small Language Models? Like most decisions in AI and tech, the decision of which Language Model to use for your production use cases comes down to trade-offs. ...

Read more
H2O Release 3.46
by Wendy Wong, Adam Valenta | April 15, 2024 H2O Release , H2O-3

We are excited to announce the release of H2O-3! Some of the highlights of this major release are that we added custom metric support for XGBoost, allowed grid search models to be sorted with custom metrics, and we enabled H2O MOJO and POJO to work with MLFlow. Several improvements were also made to the Uplift model (like MLI ...

Read more
Open-Weight AI Models: A Path to Responsible Innovation
by Sri Ambati | April 04, 2024 GenAI App Store , H2O-3 , Responsible AI , h2oGPT

The recent Request for Comments (RFC) issued by the National Telecommunications and Information Administration (NTIA) on open-weight AI models has sparked an important conversation about the future of AI. As we consider the potential benefits and risks associated with making AI model weights more accessible and transparent, it is clear ...

Read more
Transformando Empresas Latinoamericanas con Inteligencia Artificial: Estrategias y Perspectivas
by David Alexis Garcia Espinosa | February 16, 2024 Generative AI , LATAM

En la actualidad podemos reconocer que hay una alta emoción en foros y publicaciones acerca del uso de inteligencia artificial (IA) en diferentes ámbitos empresariales, muchas veces se habla de los grandes cambios que conlleva el uso de la IA en procesos de negocios sin embargo estos casos de uso exitosos en su mayoría pertenecen a com...

Read more
Unlocking GenAI Magic: GenAI AppStudio Revolutionizes App Development with LLMs! (Part 2)
by Piraveen Sivakumar, Shivam Bansal | February 13, 2024 GenAI App Store , Generative AI

GenAI AppStudio provides a no code way to take user sketches and generates the code for you. DEMO   Introducing GenAI AppStudio GenAI AppStudio is a no-code platform specifically crafted for non-technical users, to easily transform app ideas into reality with a few simple steps. One of its key features is the ability to sea...

Read more
H2O LLM DataStudio: V4.1 Release
by Nishaanthini Gnanavel, Genevieve Richards, Tarique Hussain | January 16, 2024 Data Preparation , Generative AI

H2O LLM DataStudio is a comprehensive no-code application designed to simplify data preparation tasks for Large Language Models (LLMs). This tool comprises three key components: Curate, Prepare, and Augment. Curate - Conversion of documents (PDFs, DOC & audio/video files) into question-answer pairs and summarization pairs Prepare ...

Read more
Introducing the H2O GenAI App Store: A Playground of Generative AI Innovation
by Michelle Tanco | November 07, 2023 Generative AI

As the world becomes increasingly interconnected and reliant on data-driven decisions, the need for powerful and innovative AI solutions has never been more critical. At, we've been at the forefront of AI and machine learning for the last decade, providing you with the tools and platforms to harness the power of data. Today, we're ...

Read more
Apresentamos a H2O GenAI App Store: um Playground de Inovação em Inteligência Artificial Generativa.
by Michelle Tanco | November 06, 2023 Generative AI

This blog was originally published in English here: À medida que o mundo se torna cada vez mais interconectado e dependente de decisões orientadas por dados, a necessidade de soluções de IA poderosas e inovadoras nunca foi tão crítica. Na, estivemos na vanguarda da IA e do aprendizado de ...

Read more
Presentamos la H2O GenAI App Store: Un Playground de Innovación en Inteligencia Artificial Generativa.
by Michelle Tanco | November 06, 2023 Generative AI

This blog was originally published in English here: A medida que el mundo se vuelve cada vez más interconectado y dependiente de decisiones basadas en datos, la necesidad de soluciones de inteligencia artificial (IA) potentes e innovadoras nunca ha sido tan crítica. En, hemos estado a la ...

Read more
H2O Release 3.44
by Marek Novotny, Wendy Wong | October 20, 2023 H2O Release , H2O-3

We are excited to announce the release of H2O-3! We have added and improved many items. A few of our highlights are the implementation of AdaBoost, Shapley values support, Python 3.10 and 3.11 support, and added custom metric support for Deep Learning, Uplift Distributed Random Forest (DRF), Stacked Ensemble, and AutoML. Please r...

Read more
Boosting LLMs to New Heights with Retrieval Augmented Generation

Businesses today can make leaps and bounds to revolutionize the way things are done with the use of Large Language Models (LLMs). LLMs are widely used by businesses today to automate certain tasks and create internal or customer-facing chatbots that boost efficiency. Challenges with dynamic adaption of LLMs As with any new hyped-up thi...

Read more
Entrenando Tu Propio LLM Sin Programación
by Favio Vazquez | October 06, 2023 Generative AI , H2O LLM Studio

This blog was originally published in English here: Introducción La Inteligencia Artificial Generativa, un campo fascinante que promete revolucionar cómo interactuamos con la tecnología y generamos contenido, ha causado sensación en el mundo. En este artí...

Read more
H2O LLM DataStudio Part II: Convert Documents to QA Pairs for fine tuning of LLMs
by Genevieve Richards, Tarique Hussain, Shivam Bansal | September 22, 2023 Generative AI , H2O LLM Studio

Convert unstructured datasets to Question-answer pairs required for LLM fine-tuning and other downstream tasks with H2O LLM Data Studio Curate. Every organization needs to own its GPT as simply as it needs to bring its data, algorithms, and models (read more here). A common problem we see in organizations is that they want to be able to...

Read more
Building a Fraud Detection Model with H2O AI Cloud

In a previous article [1], we discussed how machine learning could be harnessed to mitigate fraud. This time, we’ll delve into a step-by-step guide on leveraging H2O AI Cloud to construct efficient fraud detection models. We’ll tackle this process in three critical stages: build, operate, and detect. First, we’ll utilize Driverless AI in ...

Read more
A Look at the UniformRobust Method for Histogram Type
by Hannah Tillman, Megan Kurka | July 25, 2023 GBM , H2O-3

Tree-based algorithms, especially Gradient Boosting Machines (GBM’s), are one of the most popular algorithms used. They often out-perform linear models and neural networks for tabular data since they used a boosted approach where each tree built works to fix the error of the previous tree. As the model trains, it is continuously self-corr...

Read more
Testing Large Language Model (LLM) Vulnerabilities Using Adversarial Attacks

Adversarial analysis seeks to explain a machine learning model by understanding locally what changes need to be made to the input to change a model’s outcome. Depending on the context, adversarial results could be used as attacks, in which a change is made to trick a model into reaching a different outcome. Or they could be used as an exp...

Read more
H2O LLM EvalGPT: A Comprehensive Tool for Evaluating Large Language Models
by Srinivas Neppalli, Abhay Singhal, Michal Malohlava | July 19, 2023 Generative AI , Large Language Models , h2oGPT

In an era where Large Language Models (LLMs) are rapidly gaining traction for diverse applications, the need for comprehensive evaluation and comparison of these models has never been more critical. At, our commitment to democratizing AI is deeply ingrained in our ethos, and in this spirit, we are thrilled to introduce our innovati...

Read more
Reducing False Positives in Financial Transactions with AutoML

In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems we deploy to safeguard ourselves are raising too many false alarms, with over 90% of fraud alerts being false positives. These false positives, not only frustrating for consumers but also costly for financial institutions, can eclipse t...

Read more
Winner's Insight: Navigating the Parkinson's Disease Prediction Challenge with AI

Parkinson’s disease, a condition affecting movement, cognition, and sleep, is escalating rapidly. By 2037, it is projected that around 1.6 million U.S. residents will be confronting this disease, resulting in significant societal and economic challenges. Studies have hinted that disruptions in proteins or peptides could be instrumental in...

Read more and Snowflake Enable Developers to Train, Deploy, and Score Containerized Software Without Compromising Data Security today announced its participation as a launch partner for Snowflake’s Snowpark Container Services (available in private preview), which provides our joint customers with the flexibility to train, deploy, and score models all within their Snowflake account. This further expands the ease of use for data science teams to create machin...

Read more
H2O Releases and
by Marek Novotny, Wendy Wong | June 23, 2023 GBM , GLM , H2O Release , H2O-3 , XGBoost

Our new major releases of H2O are packed with new features and fixes! Some of the major highlights of these releases are the new Decision Tree algorithm, the added ability to grid over Infogram, an upgrade to the version of XGBoost and an improvement to its speed, the completion of the maximum likelihood dispersion parameter and its expan...

Read more
Generating LLM Powered Apps using H2O LLM AppStudio – Part1: Sketch2App

sketch2app is an application that let users instantly convert sketches to fully functional AI applications. This blog is Part 1 of the LLM AppStudio Blog Series and introduces sketch2app The team is dedicated to democratizing AI and making it accessible to everyone. One of the focus areas of our team is to simplify the adoption of...

Read more
H2O LLM DataStudio: Streamlining Data Curation and Data Preparation for LLMs related tasks
by Shivam Bansal, Sanjeepan Sivapiran, Nishaanthini Gnanavel | June 14, 2023 Data , Data Preparation , H2O LLM Studio , Large Language Models , NLP , h2oGPT

A no-code application and toolkit to streamline data preparation tasks related to Large Language Models (LLMs) H2O LLM DataStudio is a no-code application designed to streamline data preparation tasks specifically for Large Language Models (LLMs). It offers a comprehensive range of preprocessing and preparation functions such as text cl...

Read more
Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders
by Parul Pandey | May 29, 2023 AI4Good , Community , H2O World

On April 19th, the H2O World made its debut in India, marking yet another milestone in its global journey. The conference gathered an array of notable experts and enthusiasts from deep learning, artificial intelligence, and data science. A broad spectrum of topics was covered, shedding light on the strides made in AI technology and its ...

Read more
Enhancing H2O Model Validation App with h2oGPT Integration
by Parul Pandey | May 17, 2023 Deep Learning , H2O Model Validation , h2oGPT

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and enhance our processes. What if we could integrate the power of language models into our workflows, especially in the critical phase of model validation? Imagine running validation procedures, interpreting results, or even troubleshooting i...

Read more
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave
by Shivam Bansal, Genevieve Richards, Nishaanthini Gnanavel | May 15, 2023 H2O Hydrogen Torch , H2O Wave , MLOps , Manufacturing

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in the manufacturing industry. The presence of defective components can have adverse effects on various aspects, including escalating production costs, compromising product quality, diminishing product longevity, and l...

Read more
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution
by Parul Pandey, Shivam Bansal | May 10, 2023 AI4Good , H2O World , Hackathon

At, we believe technology can be a force for good, and we’re committed to leveraging its power to create a positive impact in the world. As part of this commitment, we recently organized an AI for Good Hackathon during the H2O World India event, where participants had the opportunity to apply their data science skills to a real-wor...

Read more
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our data, algorithms and models. H2O LLM Studio democratizes LLMs for everyone allowing customers, communities and individuals to fine-tune large open source LLMs like h2oGPT and others on their own private data and on their servers. Every nation, state and city needs it...

Read more
Building the World's Best Open-Source Large Language Model:'s Journey
by Arno Candel | May 03, 2023 Large Language Models , h2oGPT

At, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms. We released H2O, the most widely used open-source distributed and scalable machine learning platform, before XGBoost, TensorFlow and PyTorch existed. is home to over 25 Kaggle grandmasters, including the current #1. In 2017, w...

Read more
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio
by Parul Pandey | May 01, 2023 H2O LLM Studio , Large Language Models

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable, these pre-trained models may not always be tailored to specific domains. Fine-tuning — the process of adapting a pre-trained language model to a specific task or domain—plays a critical role in NLP applications. However, fine-tuning can be ...

Read more
What's new in the latest release of H2O AI Hybrid Cloud?
by Michelle Tanco | April 25, 2023 H2O AI App Store , Hybrid Cloud , Product Updates

Check out the complete release notes here! v23.01.0 | Apr 14, 2023 Upgraded ComponentsCore Components AI App Storev0.22.0 The AI App Store is a platform for accessing and operationalizing AI/ML applications and services that are built using H2O Wave . The 23.01.0 Hybrid Cloud release introduces multiple UI enhancements to make the us...

Read more
Navigating the challenges of time series forecasting
by Jon Farland | April 12, 2023 Time Series

Jon Farland is a Senior Data Scientist and Director of Solutions Engineering for North America at For the last decade, Jon has worked at the intersection of research, technology and energy sectors with a focus on developing large scale and real-time hierarchical forecasting systems. The machine learning models that drive these for...

Read more
How Commonwealth Bank is transforming operations with Document AI
by Liz Pratusevich | April 11, 2023 H2O Document AI , H2O World

Sonal Surana , General Manager at Commonwealth Bank of Australia shares recent innovative ideas at H2O World Sydney. It’s been a rollercoaster of a ride this first year of our partnership with, and the momentum continues to get even more exciting. We’ve heard from Matt about our AI ambition and how front and center it is for CBA s...

Read more
Introduction to H2O Document AI
by Mark Landry | April 05, 2023 H2O Document AI , H2O World

Mark Landry, Director of Data Science and Product, and Kaggle Grandmasters showcases H2O Document AI during the Technical Track Sessions at H2O World Sydney 2022. Mark Landry: I’m Mark Landry, with some different titles than you see on the screen here. I’ve got a bunch at H2O, so I’ve been at H2O for about seven and a half years...

Read more
AI in Insurance: Resolution Life's AI Journey with Rajesh Malla

Rajesh Malla , Head of Data Engineering – Data Platforms COE at Resolution Life insurance takes the stage at H2O World Sydney 2022 to discuss AI transformation within the insurance industry. Resolution Life is the largest life insurer in Australasia. Malla discusses the use of H2O Driverless AI to predict claim triage and other insurance ...

Read more
AT&T panel: AI as a Service (AIaaS)
by Liz Pratusevich | March 22, 2023 H2O World

Mark Austin, Vice President of Data Science at AT&T joined us on stage at H2O World Dallas, along with his colleagues Mike Berry, Lead Solution Architect; Prince Paulraj, AVP of Engineering; Alan Gray, Principal-Solutions Architect; and Rob Woods, Lead Solution Architect, CDO to discuss what they’re doing today and where they see the ...

Read more
[Infographic] Healthcare providers: How to avoid AI “Pilot-Itis”
by Team | March 15, 2023 Healthcare

From increased clinician burnout and financial instability to delays in elective and preventative care, the pandemic created a perfect storm of conditions that have strained the healthcare system in lasting ways. This storm continues unabated and is unleashing new challenges and exacerbating old ones. Artificial intelligence (AI) technol...

Read more
Deploy a WAVE app on an AWS EC2 instance
by Michelle Tanco, Greg Fousas | March 10, 2023 H2O Wave , Make with

This article was originally published by Greg Fousas and Michelle Tanco on Medium and reviewed by Martin Turoci (unusualcode) This guide will demonstrate how to deploy a WAVE app on an AWS EC2 instance. WAVE can run on many different OSs (macOS, Linux, Windows) and architectures (Mac, PC). In this document, Ubuntu Linux will be used. T...

Read more
How Horse Racing Predictions with Saved a Local Insurance Company $8M a Year

In this Technical Track session at H2O World Sydney 2022, SimplyAI’s Chief Data Scientist Matthew Foster explains his journey with machine learning and how applying the H2O framework resulted in significant success on and off the race track. Matthew Foster: I’m Matthew Foster, the Chief Data Scientist for SimplyAI. So, I’m going t...

Read more
AI and Humans Combating Extinction Together with Dr. Tanya Berger-Wolf
by Liz Pratusevich | March 01, 2023 Artificial Intelligence , H2O World

Dr. Tanya Berger-Wolf , Co-Founder and Director of AI for conservation nonprofit Wild Me , takes the stage at H2O World Sydney 2022 to discuss AI solutions for wildlife conservation, connecting data, people, and machines. AI can turn a massive collection of images into high-resolution information databases about wildlife, enabling scienti...

Read more
Improving Search Query Accuracy: A Beginner's Guide to Text Regression with H2O Hydrogen Torch
by Team | February 28, 2023 Deep Learning , H2O Hydrogen Torch , Machine Learning

Although search engines are vital to our daily lives, they need help understanding complex user queries. Search engines rely on natural language processing (NLP) to understand the intent behind a user’s query and return relevant results. By formulating a well-formed question, users can provide more precise and specific information about w...

Read more
What it means—and takes—to be at AI’s edge with Dr. Tim Fountaine
by Liz Pratusevich | February 23, 2023 H2O World

Dr. Tim Fountaine, Senior Partner at McKinsey & Company joins us at H2O World Sydney 2022 to discuss why business leaders should care about AI, what mindset to adopt, and what actions to take to effectively bring AI into your organization. Dr. Fountaine discusses real-world examples and insights from McKinsey’s collaboration with the ...

Read more
10 Consejos para Convertirte en un Científico de Datos Exitoso
by Favio Vazquez | January 19, 2023 AutoML , Beginners , Data Science

La ciencia de datos llegó para quedarse. Los científicos de datos utilizan sus habilidades para ayudar a las empresas a tomar mejores decisiones sobre sus productos, servicios, a optimizar procesos, ahorrar y mejorar rentabilidad. Convertirse en un científico de datos de éxito implica muchos aspectos y el estudio continuo, ya que es un...

Read more
Explaining models built in H2O-3 — Part 1

Machine Learning explainability refers to understanding and interpreting the decisions and predictions made by a machine learning model. Explainability is crucial for ensuring the trustworthiness and transparency of machine learning models, particularly in high-stakes situations where the consequences of incorrect predictions can be signi...

Read more at NeurIPS 2022
by Marcos V. | December 06, 2022 AI4Good , Data Science , Machine Learning is proud to participate in the 36th Conference on Neural Information Processing Systems (NeurIPS) 2022, one of the biggest and most prestigious international conferences in artificial intelligence. NeurIPS 2022 will be a Hybrid Conference from Monday, November 28th through Friday, December 9th, with an in-person event at the New Or...

Read more
A Brief Overview of AI Governance for Responsible Machine Learning Systems
by Navdeep Gill, Abhishek Mathur, Marcos V. | November 30, 2022 AI Governance , Machine Learning , Responsible AI

Our paper “A Brief Overview of AI Governance for Responsible Machine Learning Systems” was recently accepted to the Trustworthy and Socially Responsible Machine Learning (TSRML) workshop at NeurIPS 2022 (New Orleans). In this paper, we discuss the framework and value of AI Governance for organizations of all sizes, across all industries a...

Read more
H2O World Dallas Customer Talks
by Vinod Iyengar | November 24, 2022 H2O World

After three long years of not having an #H2OWorld, we finally held our first one in Sydney to a sold-out crowd! We then followed it up with H2O World Dallas in the same week! It was a fantastic and jam-packed event with customers, partners, colleagues, and community members sharing how they leverage to accelerate and transform AI l...

Read more
New in Wave 0.24.0
by Martin Turoci | November 21, 2022 H2O Hydrogen Torch , H2O Release , H2O Wave

Another Wave release has arrived with quite a few exciting new features. Let’s quickly go over the biggest ones.Wave init CLI​How many times you wanted to build a Wave app fast, but then you realized you need to start from scratch, copy over the skeleton of your app and work up from there? For these exact reasons, we introduced a new wave...

Read more Raises $40 Million to Democratize Artificial Intelligence for the Enterprise
by Team | November 20, 2022 Press Release

Series C round led by Wells Fargo and NVIDIA MOUNTAIN VIEW, CA – November 30, 2017 –, the leading company bringing AI to enterprises, today announced it has completed a $40 million Series C round of funding led by Wells Fargo and NVIDIA with participation from New York Life, Crane Venture Partners, Nexus Venture Partners and Tra...

Read more Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant. -- Copy
by Read Maloney | November 18, 2022 Business , Gartner , H2O Hydrogen Torch

At, our mission is to democratize AI, and we believe driving value from data is a team sport. Data needs to be organized and prepared, often by data engineers, and then models need to be built by data scientists. With models built, they need to be put into production and maintained by IT and DevOps personnel. Finally, these models...

Read more Expands Market Footprint in Healthcare AI by Signing Hackensack Meridian Health and Other Key Providers
by Prashant Natarajan | November 14, 2022 Healthcare

We’re excited to attend the HLTH conference this week in Las Vegas, NV. This industry event has quickly become the go-to event for c-level executives across all parts of the healthcare industry. It’s both incredible and inspiring to see how quickly the event has grown in its five years, and that’s why we’re excited to share some news abou...

Read more
An Introduction to H2O Wave Table
by Rohan Rao | November 13, 2022 H2O Hydrogen Torch , H2O Wave

H2O Wave is a Python package for creating realtime ML/AI applications for a wide variety of data science workflows and industry use cases. Data scientists view a significant amount of data in tabular form. Running SQL queries, pivoting data in Excel or slicing a pandas dataframe are pretty much bread-and-butter tasks. With the growing u...

Read more
Saving Zebras: “Their stripes are like fingerprints. No two are alike.”
by Anthony Gomes | November 10, 2022 AI4Good

It’s been said that a picture is worth a thousand words. But to Tanya Berger-Wolf, a picture is far more valuable than that. To Berger-Wolf, photos, images and videos are key to protecting biodiversity and entire species around the world. Scientists have known for years that we are in the middle of the sixth mass extinction on our planet...

Read more
H2O Managed Cloud With AWS PrivateLink is Now Generally Available
by Ophir Zahavi | November 10, 2022 Amazon Web Services , H2O AI Cloud

A n essential part of responsibly practicing machine learning is understanding how you secure your data. H2O Managed Cloud offers a single-tenant cloud environment with multiple layers of security – but how do you get your data securely into the cloud for training, and how do you score sensitive information without exposing it to the inte...

Read more Receives Innovation Award for H2O Hydrogen Torch
by Anthony Gomes | October 31, 2022 H2O Hydrogen Torch

We don’t like to brag, but we do like to celebrate the work our Makers create, and more importantly, why they create it: for you. was proud to accept the award for “Best Deep Learning Technology” at the AI Tech awards. H2O Hydrogen Torch , a no-code deep learning training engine, was released less than a year ago in February 2022...

Read more
AI for Good: Levels Up Furry Matchmaking
by Team | October 19, 2022 AI4Good , H2O Driverless AI

Nothing tugs at the heart strings quite like a poster in your neighborhood about a missing cat or dog. For years, technology has enabled lost pets to be reunited with their families in the form of a small microchip that contains an owner’s contact information. Now some organizations are turning to emerging technology to help the millions ...

Read more
H2O Wave joins Hacktoberfest
by Martin Turoci | September 29, 2022 H2O Hydrogen Torch , H2O Wave

It’s that time of the year again. A great initiative by DigitalOcean called Hacktoberfest that aims to bring more people to open source is about to start. Hacktoberfest incentives people to make at least 4 valuable contributions (pull requests) to an open source repository and get the reward i...

Read more
Three Keys to Ethical Artificial Intelligence in Your Organization
by Team | September 23, 2022 AI4Good , Machine Learning

There’s certainly been no shortage of examples of AI gone bad over the past few years–enough to give everyone pause on how (and if) this technology can truly be used for good. If it’s not Facebook selling data of its users , it’s self-driving cars from Uber that can’t recognize pedestrians in time to slow down or stop. So while the uses ...

Read more
Using GraphQL, HTTPX, and asyncio in H2O Wave
by Martin Turoci | September 21, 2022 H2O Wave , Use Cases

Today, I would like to cover the most basic use case for H2O Wave, which is collecting a bunch of data and displaying them in a nice and clean way. The goal is to build a simple dashboard that shows how H2O Wave compares against its main competitors in terms of popularity and codebase metrics. The main competitors in question are: Stre...

Read more
머신러닝 자동화 솔루션 H2O Driveless AI를 이용한 뇌에서의 성차 예측
by Team | August 29, 2022 H2O Driverless AI , Healthcare , Solutions

Predicting Gender Differences in the Brain Using Machine Learning Automation Solution H2O Driverless AI아동기 뇌인지 발달은 기억, 주의력, 사회성 등 고등 인지 기능에 영향을 미치고, 청소년기와 성인기의 뇌 발달로까지 이어집니다.Brain cognitive development in childhood affects higher cognitive functions such as memory, attention, and sociability, and leads to brain development in adolescence ...

Read more
Make with Recap: Validation Scheme Best Practices

Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with session on validation scheme best practices, our second accuracy masterclass. The session covered key concepts, different validation methods, data leaks, practical examples, and validation and ensembling. Key Concepts While the validation topics cove...

Read more
Integrating VSCode editor into H2O Wave
by Martin Turoci | August 18, 2022 H2O Hydrogen Torch , H2O Wave , Tutorials

Let’s have a look at how to provide our users with a truly amazing experience when we need to allow them to edit pieces of code or configuration. We will use one of the most popular and well-known code editors called Monaco editor which powers VSCode. The resulting app will have the editor on the left side and a markdown card on the righ...

Read more
5 Tips for Improving Your H2O Wave Apps
by Martin Turoci | August 09, 2022 H2O Wave , Tutorials

Let’s quickly uncover a few simple tips that are quick to implement and have a big impact. Do not recreate navigation, update it The most common error I see across the Wave apps is ugly navigation that seems to be laggy. Laggy navigation. The reason for this behavior is that we want to save the clicked value and set it e...

Read more
Make with Recap: Getting Started with H2O Document AI
by Blair Averett | August 05, 2022 Deep Learning , H2O Document AI , Make with , NLP

Product Owner, Data Scientist, and Kaggle Grandmaster, Mark Landry presented at the Make with session on getting started with H2O Document AI. The session covered an overview of H2O Document AI , a tool to extract insights and automate document processing. The session also included a product demo, looking at documents as data sets...

Read more
Advice for Those Getting Started on Their AI Journey
by Blair Averett | August 04, 2022 AI Journey , Business , Events Innovation Day Summer ‘22 included a customer insights panel made up of Prince Paulraj, AVP, Data Insights and Chief Data Officer at AT&T , Chris Throop, Managing Director and Global Head of Data Science at Castleton Commodities International and Sean Otto, Director of Advanced Analytics at AES . One of the questions panelists...

Read more
AES Transforms its Energy Business with AI and
by Read Maloney | June 20, 2022 Business , Energy

AES is a leading renewable-energy company with global operations. The business produces energy and distributes energy for both private, public, and governmental organizations. AES was recently named one of the World’s Most Ethical Companies for the ninth straight year and won the Edison Electric Institute’s (EEI’s) Edison Award– the indus...

Read more
The Wildfire Challenge Winners Blog Series - Team Titans
by Team | June 14, 2022 AI4Good , Community

Note : this is a community blog post by Team Titans – one of the Wildfire Challenge winners. You can check out their app here .BackgroundForest fires have been getting worse in recent years. According to a report by the WWF, the duration of fire seasons across the globe has increased by 19% on average. The fire season has been sta...

Read more
Improving Machine Learning Operations with and Snowflake
by Eric Gudgion | June 07, 2022 Cloud , H2O AI Cloud , MLOps , Snowflake

Operationalizing models is critical for companies to get a return on their machine learning investments, but deployment is only one part of that operationalization process. With’s latest Snowflake Integration Application, authorized Snowflake users can easily deploy models, significantly reducing deployment timelines and enabling a...

Read more
Improving Manufacturing Quality with and Snowflake

Manufacturers are rapidly expanding their machine learning use cases by leveraging the deep integration between Snowflake’s Data Cloud and the H2O AI Cloud. Many current manufacturing quality checks require that sensor data and image data be processed and analyzed separately. Standard tooling presents challenges in storing and referencin...

Read more
The Wildfire Challenge Winners Blog Series - Team PSR
by Team, Shamil Prematunga | May 31, 2022 AI4Good , Community , H2O Driverless AI , H2O Hydrogen Torch

Note : this is a community blog post by Team PSR – one of the Wildfire Challenge winners.This blog represents an experience we gained by participating in the H2O wildfire challenge. We need to mention that competing in this challenge is like a journey in a knowledge pool. For a person who is willing to get the knowledge of buildin...

Read more
Developing and Retaining Data Science Talent
by Jon Farland | May 12, 2022 Company , Makers

It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is the sexiest job of the 21st century. Since then, there has been an explosion of job opportunities and university degree programs claiming to give students all of the skills they need to accel in the field of data science . Yet, the scarcity of ...

Read more
The Wildfire Challenge Winners Blog Series - Team HTB

Note : this is a community blog post by Team HTB – one of the Wildfire Challenge winners. You can check out their app here . The Challenge The purpose of the challenge was to develop an AI application to improve the forecast of bushfires and wildfires, with the main aim of reducing the human losses that these phenomena can cause...

Read more
The Wildfire Challenge Winners Blog Series - Team Too Hot Encoder
by Team | May 10, 2022 AI4Good , Community

Note : this is a community blog post by Team Too Hot Encoder – one of the Wildfire Challenge winners. You can check out their app here .The ChallengeThe aim of the project is to predict the probability of wildfire occurrence in Turkey for each month in 2020. As a result of these predictions, it is aimed to carry out more intensive...

Read more
Bias and Debiasing
by Kim Montgomery | April 15, 2022 Explainable AI , H2O-3

An important aspect of practicing machine learning in a responsible manner is understanding how models perform differently for different groups of people, for instance with different races, ages, or genders. Protected groups frequently have fewer instances in a training set, contributing to larger error rates for those groups. Some models...

Read more
Comprehensive Guide to Image Classification using H2O Hydrogen Torch

In this article, we will learn how to build state-of-the-art models in computer vision and natural language processing within a couple of minutes using H2O Hydrogen Torch. Introduction to H2O Hydrogen Torch H2O Hydrogen Torch (HT) aims to simplify building and deploying deep learning models for a wide range of tasks in computer vision...

Read more
Democratizing Lending through AI
by Team | March 23, 2022 Financial Services , H2O AI Cloud

According to the Federal Reserve , nearly 40% of adults in the U.S. sought credit in 2020, only slightly fewer than those who applied in the previous pre-pandemic year; among those who applied more than 1 in 10 were denied credit or were approved for less than they had sought. The reasons behind these denials are many, however, the same r...

Read more
Setting Up Your Local Machine for H2O AI Cloud Wave App Development
by Michelle Tanco | March 17, 2022 H2O AI Cloud , H2O Hydrogen Torch , H2O Wave

This article is for users who would like to build H2O Wave  apps and publish them in the App Store within the H2O AI Cloud  (HAIC). We will walk through how to set up your local machine for HAIC Wave App development. Instructions Developing with Wave H2O Wave is a framework for building frontends using only python or R. In this article...

Read more
Data Science with An Introduction to Machine Learning and Predictive Modeling

Our own Jonathan Farland recently recorded a talk about machine learning and predictive modeling. In his talk, Jon also gave an overview of open source H2O and H2O AI Cloud . This video is a great resource for getting up to speed with the latest technology from H2O in half an hour. Some of you may prefer to go through the slides while l...

Read more
Vaccine NLP
by Team | March 14, 2022 H2O AI Cloud , Healthcare

A population and public health NLP solution from Health Powered by NVIDIA GPUs and NVIDIA AI Social media platforms such as Twitter and Reddit have become invaluable tools for communication between individuals or groups and are widely used globally. As messages on these platforms can instantly be accessed by all users and remain on...

Read more
Gene Mutation AI
by Team | March 14, 2022 H2O AI Cloud , Healthcare

A genomics AI solution from Health Powered by NVIDIA GPUs and NVIDIA AI As precision medicine becomes more widespread, both medical diagnosis and drug discovery are increasingly relying on and leveraging the individual’s genomic and phenotypic profiles. From the multiple types and subtypes of cancer to heart disease, to obesity or ...

Read more
Expression Biomarker AI
by Team | March 14, 2022 H2O AI Cloud , Healthcare

A drug discovery AI solution from Health Powered by NVIDIA GPUs and NVIDIA AI In a healthy individual, each cell type has its own metabolic program, carrying out specific functions. This organization is disrupted in disease, either as a cause or a result of it, or both, and this disruption is reflected in the patient’s gene exp...

Read more
Gene Mutation AI and the Future of Cancer Research
by Team | March 14, 2022 H2O AI Cloud , Healthcare

A genomics AI solution from Health Powered by NVIDIA GPUs and NVIDIA AI Cancer is a multifactorial disease with exact causes we have only recently begun to understand. While inherited germline mutations are understood to create a genetic predisposition to the disease, stochastic accumulation of somatic mutations over a person’s...

Read more
Tackling Illegal, Unreported, and Unregulated (IUU) Fishing with AI
by Ryan Chesler, Guanshuo Xu | February 28, 2022 AI4Good , Computer Vision , Deep Learning , H2O AI Cloud , Kaggle , Solutions

According to a report by the High-Level Panel for a Sustainable Ocean Economy, it is estimated that illegal, unreported, and unregulated (IUU) fishing accounts for 20 percent of the seafood and up to 50 percent in some areas. These activities not only affect the marine ecosystem but, in a way, are linked to climate change on the planet a...

Read more
Unsupervised Learning Metrics
by Adam Murphy | February 28, 2022 Machine Learning , Technical

That which is measured improves – Karl Pearson , Mathematician. Almost everyone has heard of accuracy, precision, and recall – the most common metrics for supervised learning . But not as many people know the metrics for unsupervised learning . So, in this article, we will take you through the most common methods and how to implement th...

Read more
Demand Sensing with H2O Wave : Supply Chain Intelligence and Inventory Optimization for Retail, CPG, and FMCG Industries

Demand Sensing can help optimize inventories by analyzing and modeling short-term and real-time signals The supply chains across the Consumer Packaged Goods (CPG), Fast-Moving Consumer Goods (FMCG) and Retail sectors need to continuously monitor the drivers that may impact their internal models and processes. These include systems around ...

Read more
AI Application to Demonstrate K-Means Clustering Using H2O Wave
by Shamil Prematunga | February 25, 2022 Community , H2O AI Cloud , H2O Hydrogen Torch

Note : this is a community blog post by Shamil Dilshan Prematunga . It was first published on Medium . In this blog, I am going to highlight how cool H2O Wave is, by demonstrating my application called “K means App” which was built using Wave 0.20.0 . This is a simple application I have created to demonstrate one of the unsupervised lea...

Read more
A Quick Introduction to PyTorch: Using Deep Learning for Stock Price Prediction

Torch is a scalable and efficient deep learning framework. It offers flexibility and speed to build large scale applications. It also includes a wide range of libraries for developing speech, image, and video-based applications. The basic building block of Torch is called a tensor. All the operations defined in Torch use a tensor. Ok, l...

Read more
Introducing H2O Hydrogen Torch: A No-code Deep Learning Framework
by Philipp Singer, Yauhen Babakhin | February 17, 2022 Computer Vision , H2O AI Cloud , H2O Hydrogen Torch , NLP , Product Updates

Over and over again we heard from customers, “deep learning is cool, but it’s hard and time consuming.” They kept asking “could someone just make it easier?” In typical “Maker” fashion, you ask, we deliver, H2O Hydrogen Torch . H2O Hydrogen Torch is a new product that enables data scientists and developers to train and deploy state-of-t...

Read more
How to Create Your Spotify EDA App with H2O Wave

In this article, I will show you how to build a Spotify Exploratory Data Analysis (EDA) app using H2O Wave from scratch.H2O Wave is an open-source Python development framework for interactive AI apps. You do not need to know Flask, HTML, CSS, etc. H2O Wave has ready-to-use user-interface components and charts, including dashboard templa...

Read more releases new H2O MLOps features that improves the explainability, flexibility and configuration of machine learning workflows.
by Abhishek Mathur | February 03, 2022 H2O AI Cloud , MLOps now provides data scientists and machine learning (ML) engineers even more powerful features that give greater control, governance, and scalability within their machine learning workflow – all available on our H2O AI Cloud. Now, H2O MLOps enables you to: Deploy model explanations in production Explainability is core to understa...

Read more
Mission Impossible: Improving Patient Care Through Automated Document Processing
by Prashant Natarajan | February 03, 2022 H2O AI Cloud , H2O Document AI , Healthcare

Don’t tell Bob Rogers’ team something can’t be done. When Rogers embarked on an ambitious project to automate the processing of the more than 1.4 million electronically faxed documents received annually by the Center for Digital Health Innovation at the University of California, San Francisco (UCSF CDHI), advisors and vendors initially t...

Read more
An Introduction to Unsupervised Machine Learning
by Adam Murphy | January 31, 2022 Machine Learning , Technical

There are three major branches of machine learning (ML): supervised, unsupervised, and reinforcement. Supervised learning makes up the bulk of the models businesses use, and reinforcement learning is behind front-page-news-AI such as AlphaGo . We believe unsupervised learning is the unsung hero of the three, and in this article, we brea...

Read more
Revisiting the Miracle of Istanbul
by Team | January 25, 2022 Data Journalism , Sports

IntroductionOn May 25th, 2005, the UEFA Champions League final between AC Milan and Liverpool was held at the Atatürk Olympic Stadium in Istanbul. The match is still considered one of the greatest finals in football history. AC Milan took a 3-0 lead in the first half but Liverpool made a miraculous comeback in the second half to tie the g...

Read more
Install H2O Wave on AWS Lightsail or EC2

Note : this blog post was first published on Thomas’ personal blog Neural Market Trends . I recently had to set up H2O’s Wave Server on AWS Lightsail and build a simple Wave App as a Proof of Concept. If you’ve never heard of H2O Wave then you have been missing out on a new cool app development framework. We use it at H2O to build AI-ba...

Read more
What Are Feature Stores and Why Are They Important?
by Adam Murphy | January 18, 2022 H2O AI Cloud , H2O AI Feature Store , Product Updates

Machine learning (ML) models are only as good as the data fed into them. In tabular problems, the data is a collection of rows (samples) and columns (features). So, you could say that tabular ML models are only as good as the features fed into them. But how do you manage features? Can you share them across the company? Can you easily reu...

Read more
A Beginner’s View of H2O MLOps
by Jo-Fai Chow | January 15, 2022 Community , H2O AI Cloud , MLOps

Note : this is a community blog post by Shamil Dilshan Prematunga . It was first published on Medium .When we step into the AI application world it is not one easy step. It has a series of tasks that are combined. To convert an idea to the workable stage we must fulfill the requirements in each stage. When we look at existing platforms, t...

Read more
Shapley Values - A Gentle Introduction
by Adam Murphy | January 11, 2022 Data Science , Shapley , Technical

If you can’t explain it to a six-year-old, you don’t understand it yourself. – Albert Einstein One fear caused by machine learning (ML) models is that they are blackboxes that cannot be explained. Some are so complex that no one, not even domain experts, can understand why they make certain decisions. This is of particular concern when s...

Read more
The Bond Market & AI: How MarketAxess Brings it All Together
by Ian Gomez | January 11, 2022 Customers , Financial Services

The vast majority of the equities market trades electronically while the bond market is still in its infancy by comparison, but MarketAxess is seeking to change that. Recently, we hosted a virtual event with the MarketAxess team where they explained how they were solving challenges in the world’s largest bond marketplace while leveraging ...

Read more
H2O Release 3.36 (Zorn)
by Michal Kurka | January 07, 2022 AutoML , H2O Release , H2O-3

There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release are Distributed Uplift Random Forest, an algorithm typically used in marketing and medicine to model uplift, and Infogram, a new research direction in machine learning that focuses on interpretability and fairness in...

Read more
1st Place Winner's Blog - Kaggle 2021 Data Science and Machine Learning Survey
by Shivam Bansal, KunHao Yeh | January 04, 2022 Data Journalism , Data Science , Kaggle

Kaggle, the largest global community of data scientists, conducted the 5th annual industry-wide survey that presented a truly comprehensive view of the state of data science and machine learning. A total of 25,973 responses were collected from participants from over 60 countries. Kaggle also launched the Data Science Survey Challenge in w...

Read more
Why Companies Need to Think About MLOps
by Adam Murphy | December 14, 2021 H2O AI Cloud , MLOps

For years machine learning (ML) researchers have focused on building outstanding models and figuring out how to squeeze every last drop of performance from them. But many have realized that creating top-performing models doesn’t necessarily equate to having them deliver business value. Often the best models can be very complex and costly ...

Read more
An Introduction to Time Series Modeling: Traditional Time Series Models and Their Limitations
by Adam Murphy | December 03, 2021 H2O AI Cloud , Time Series

In the first article in this series, we broke down the preprocessing and feature engineering techniques needed to build high-performing time series models. But we didn’t discuss the models themselves. In this article, we will dig into this. As a quick refresher, time series data has time on the x-axis and the value you are measuring (dema...

Read more
Announcing the Fully Managed H2O AI Cloud
by Michelle Tanco | December 01, 2021 H2O AI Cloud

The H2O AI Cloud is the leading platform to make and access your own AI models and apps. Customers have had access to the H2O AI Hybrid Cloud for the last year, where they could manage the platform themselves on their favorite cloud or on-prem infrastructure. Today, we’re excited to announce a fully managed version of the H2O AI Cloud. Y...

Read more Tools for a Beginner

Note : this is a community blog post by Shamil Dilshan Prematunga . It was first published on Medium .Hey, this is not a deep technical blog. I’d like to share the experience I had with H2O tools when I was studying Machine Learning. As a Research Engineer, I am currently working on an area based on Telecommunication. Day by day with my e...

Read more
Amazon Redshift Integration for Model Scoring
by Eric Gudgion | November 22, 2021 Data Science , H2O AI Cloud

We consistently work with our partners on innovative ways to use models in production here at, and we are excited to demonstrate our AWS Redshift integration for model scoring. Amazon Redshift is a very popular data warehouse on AWS. We wanted to expand on the existing capacities of using data from Redshift to train a model on the ...

Read more
Building Resilient Supply Chains with AI
by Adam Murphy | November 11, 2021 H2O AI Cloud

A global pandemic, a fundamental shift in the demand for goods and services worldwide, and the recent blockage of a major international trade route have all highlighted the need to build and maintain resilient supply chains.At the foundation of resilient supply chains lie accurate and reliable forecasts. The majority of traditional softwa...

Read more
Introducing the Wildfire Challenge
by Jo-Fai Chow | November 05, 2021 AI4Good , H2O AI Cloud , H2O Hydrogen Torch

We are excited to announce our first AI competition for good – Wildfire Challenge .We’ve structured this challenge to be a global collaborative effort to do good for the world that we share. We want teams to submit their ideas and applications freely, knowing that other teams will learn from what they’ve done to improve their AI ap...

Read more
MLB Player Digital Engagement Forecasting
by Jo-Fai Chow | October 29, 2021 Kaggle , Machine Learning

Are you a baseball fan? If so, you may notice that things are heating up right now as the Major League Baseball (MLB ) World Series between Houston Astros and Atlanta Braves tied at 1-1.MLB Postseason 2021 Results as of October 28 (source) This also reminded me of the MLB Player Digital Engagement Forecasting competition in which my coll...

Read more
Announcing the H2O AI Feature Store
by Vinod Iyengar | October 28, 2021 H2O AI Cloud , Product Updates

We’re really excited to announce the H2O AI Feature Store – The only intelligent feature store in the market. We’ve been working on this for many months with our co-development partner: AT&T. This enabled us to build a first-of-its-kind platform that is designed to be enterprise-grade from day 1. It is built with best-of-breed techno...

Read more
An Introduction to Time Series Modeling: Time Series Preprocessing and Feature Engineering
by Adam Murphy | October 26, 2021 Time Series

Time is the only nonrenewable resource – Sri Ambati, Founder and CEO, Prediction is very difficult, especially if it’s about the future – Niels Bohr, Nobel Prize-Winning Physicist. Despite its inherent difficulty, every business needs to make predictions. You may want to forecast sales or estimate demand or gauge future inventory ...

Read more
New Features Now Available with the Latest Release of the H2O AI Cloud 21.10
by Team | October 18, 2021 H2O AI Cloud , H2O Release

The Makers here at have been busy building new features and enhancing capabilities across our AI platform . Designed to support our core mission of democratizing AI, these additions to our platform simplify the ability to make AI you can trust, operate it efficiently and innovate with ready-made AI applications.Launched in January ...

Read more
Time Series Forecasting Best Practices
by Jo-Fai Chow | October 15, 2021 H2O AI Cloud , Technical , Time Series

Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best practices. The talk was well-received so we decided to turn it into a blog post. Below are some of the highlights from his talk. You can also follow the two software demos and try it yourself using our H2O AI Cloud .(Note : The video links with ...

Read more
Improving NLP Model Performance with Context-Aware Feature Extraction
by Jo-Fai Chow | October 08, 2021 H2O AI Cloud , NLP , Technical

I would like to share with you a simple yet very effective trick to improve feature engineering for text analytics. After reading this article, you will be able to follow the exact steps and try it yourself using our H2O AI Cloud .First of all, let’s have a look at the off-the-shelf natural language processing (NLP) recipes in H2O Driver...

Read more
Feature Transformation with the H2O AI Cloud
by Benjamin Cox | October 07, 2021 H2O AI Cloud

It is well known throughout the data science community that data preparation, pre-processing, and feature engineering are one of the most cumbersome parts of the data science workload. So as we continue to innovate here at with our end-to-end automated machine learning (autoML ) capabilities, we challenged ourselves to evolve the...

Read more
Introducing DatatableTon - Python Datatable Tutorials & Exercises
by Rohan Rao | September 20, 2021 Datatable , H2O-3 , Python , Tutorials

Datatable is a python library for manipulating tabular data. It supports out-of-memory datasets, multi-threaded data processing and has a flexible API.If this reminds you of R’s data.table , you are spot on because Python’s datatable package is closely related to and inspired by the R library.The release of v1.0.0 was done on 1st July,...

Read more
H2O Release 3.34 (Zizler)
by Michal Kurka | September 15, 2021 H2O Release , H2O-3

There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve added Extended Isolation Forest for improved results on anomaly detection problems, and we’ve implemented the Type III SS test (ANOVAGLM) and the MAXR method to GLM. For existing algorithms, we improved the pe...

Read more
From the game of Go to Kaggle: The story of a Kaggle Grandmaster from Taiwan
by Parul Pandey | September 13, 2021 Kaggle , Makers

In conversation with Kunhao Yeh: A Data Scientist and Kaggle Grandmaster In these series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand...

Read more
Visualizing Large Datasets with H2O-3
by Parul Pandey | September 09, 2021 H2O-3 , Tutorials

Exploratory data analysis is one of the essential parts of any data processing pipeline. However, when the magnitude of data is high, these visualizations become vague. If we were to plot millions of data points, it would become impossible to discern individual data points from each other. The visualized output in such a case is pleasing ...

Read more
Innovation with the H2O AI Cloud
by Team | September 02, 2021 H2O AI Cloud

Consumer expectations for responsiveness, personalization, and overall efficiency have risen dramatically over the past several years as technology has become ubiquitous across both our personal and professional lives. These rapidly growing expectations demand an expansion in focus from simply solving narrow use cases with machine learnin...

Read more
Interning with Robie Gonzales
by Team | August 31, 2021 Community

This blog post is by Robie Gonzales, who has interned with us for the last 8 months. Thank you for your awesome work, Robie! When I started my internship eight months ago, I had minimal knowledge about machine learning and artificial intelligence. Over the course of these months, my experience as a Full Stack Developer has allowed me to ...

Read more
AI-Driven Predictive Maintenance with H2O AI Cloud
by Parul Pandey, Asghar Ghorbani | August 02, 2021 AutoML , H2O AI Cloud , Machine Learning Interpretability , Manufacturing

According to a study conducted by Wall Street Journal , unplanned downtime costs industrial manufacturers an estimated $50 billion annually. Forty-two percent of this unplanned downtime can be attributed to equipment failure alone. These downtimes can cause unnecessary delays and, as a result, affect the business. A better and superior al...

Read more
What are we buying today?
by Rohan Rao | July 05, 2021 Community , H2O Hydrogen Torch

Note : this is a guest blog post by Shrinidhi Narasimhan .It’s 2021 and recommendation engines are everywhere. Be it online shopping, food, music, and even online dating, the race to provide personalized recommendations to the user has many contenders. The technology of giving users what they need based on their buying strategies or digit...

Read more
The Emergence of Automated Machine Learning in Industry
by Parul Pandey | June 30, 2021 AutoML , Company

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI, powered by NASSCOM. The link of the post can be found here. The concept of Automated Machine Learning has gained much traction recently. Automated Machine Le...

Read more
What does it take to win a Kaggle competition? Let's hear it from the winner himself.
by Parul Pandey | June 14, 2021 Data Science , Kaggle , Makers

In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster. In this interview, I shall be ...

Read more
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden from subject matter experts, in datasets that might not be apparent before, and use those 3rd party features to increase the accuracy of the model.A traditional way of doing this was to try and scrape and scour distributed, stagnant data sources on th...

Read more
Getting the best out of’s academic program
by Ana Visneski, Jo-Fai Chow | May 19, 2021 Academic Program

“ provides impressively scalable implementations of many of the important machine learning tools in a user-friendly environment. Allowing for free academic use sets a generous example for commercial software developers — it is also the way forward in the era of open-source software.” – Professor Trevor J. Hastie, John A. Overdeck ...

Read more
Regístrese para su prueba gratuita y podrá explorar H2O AI Cloud
by Ana Visneski, Jo-Fai Chow | May 17, 2021 H2O AI Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Cloud, lo que le brinda la oportunidad de obtener una experiencia práctica con nuestra plataforma más nueva de machine learning. H2O AI Cloud es una plataforma de inteligencia artificial de principio al fin que permite a las organizaciones crear, compartir y usar rápidam...

Read more
How Much is My Property Worth?

Note : this is a guest blog post by Jaafar Almusaad .How Much is My Property Worth?This is the million-dollar question – both figuratively and literally. Traditionally, qualified property valuers are tasked to answer this question. It’s a lengthy and costly process, but more critically, it’s inconsistent and largely subjective. Mind you, ...

Read more
Navegación más segura con Inteligencia Artificial
by Ana Visneski, Jo-Fai Chow, Kim Montgomery | May 10, 2021

El mes pasado, el mundo fue testigo de cómo socorristas intentaron liberar un buque de carga que había encallado en el Canal de Suez. Este incidente bloqueó el tráfico a través de una vía navegable que es esencial para el comercio. Aunque la ubicación fue inusual, las colisiones de buques, las colisiones de buques con objetos fijos y los...

Read more
What it takes to become a World No 1 on Kaggle
by Parul Pandey | May 03, 2021 Data Science , Kaggle , Machine Learning , Makers

In conversation with Guanshuo Xu: A Data Scientist, Kaggle Competitions Grandmaster, and a Ph.D. in Electrical Engineering. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at , who share their journey, inspirations, and accomplishments. The intention behind these interviews...

Read more
Unwrap Deep Neural Networks Using H2O Wave and Aletheia for Interpretability and Diagnostics

The use cases and the impact of machine learning can be observed clearly in almost every industry and in applications such as drug discovery and patient data analysis, fraud detection, customer engagement, and workflow optimization. The impact of leveraging AI is clear and understood by the business; however, AI systems are also seen as b...

Read more
Shapley summary plots: the latest addition to the’s Explainability arsenal

It is impossible to deploy successful AI models without taking into account or analyzing the risk element involved. Model overfitting, perpetuating historical human bias, and data drift are some of the concerns that need to be taken care of before putting the models into production. At, explainability is an integral part of our ML ...

Read more logra gran posicionamiento en integridad de visión en el cuadrante Visionarios del Cuadrante Mágico de Gartner 2021 para Data Science y Machine Learning
by Read Maloney | April 11, 2021 Business , Community , Gartner , H2O AI Cloud

En, nuestra misión es democratizar la IA y creemos que impulsar el valor de los datos es un esfuerzo de equipo. A menudo, los ingenieros de datos deben organizar y preparar los datos y luego los científicos de datos deben crear modelos. Los modelos, una vez creados, deben ponerse en producción y el personal de TI y de DevOps debe m...

Read more
Safer Sailing with AI
by Ana Visneski, Jo-Fai Chow, Kim Montgomery | April 01, 2021 Customers , Data Science , H2O Hydrogen Torch , H2O-3 , Machine Learning Interpretability

In the last week, the world watched as responders tried to free a cargo ship that had gone aground in the Suez Canal. This incident blocked traffic through a waterway that is critical for commerce. While the location was an unusual one, ship collisions, allisions , and groundings are not uncommon. With all the technology that mariners hav...

Read more
H2O AI Cloud: Democratizing AI for Every Person and Every Organization

Harnessing AI’s true potential by enabling every employee, customer, and citizen with sophisticated AI technology and easy-to-use AI applications. Democratization is an essential step in the development of AI, and AutoML technologies lie at the heart of it. AutoML tools have played a pivotal role in transforming the way we consume an...

Read more é a mais avançada por sua capacidade de execução no quadrante dos visionários no relatório do Gartner de Ciências de Dados e Machine Learning em 2021
by Read Maloney | March 16, 2021 Business , Community , Gartner , H2O AI Cloud

*Este artigo foi originalmente escrito em inglês pelo SVP de Marketing, Read Maloney, e traduzido para português por Bruna Smith. Na, nossa missão é democratizar a Inteligência Artificial e acreditamos que o valor agregado, gerado a partir dos dados, é um trabalho em equipe. Os dados devem ser organizados e preparados, geralmente ...

Read more Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant.
by Read Maloney | March 09, 2021 Business , Gartner , H2O Hydrogen Torch

At, our mission is to democratize AI, and we believe driving value from data is a team sport. Data needs to be organized and prepared, often by data engineers, and then models need to be built by data scientists. With models built, they need to be put into production and maintained by IT and DevOps personnel. Finally, these models...

Read more
Learning from others is imperative to success on Kaggle says this Turkish GrandMaster
by Parul Pandey | February 15, 2021 Makers

In conversation with Fatih Öztürk: A Data Scientist and a Kaggle Competition Grandmaster. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want...

Read more
H2O-3 Improvements from Two University Projects
by Veronika Maurerova | February 08, 2021 Academic Program , H2O-3

In September 2019 became a silver partner of the Faculty of Informatics at Czech Technical University in Prague. The main goal of this partnership is to make connections between students and companies to prepare an environment where students can use their knowledge in practice and gain real-work experiences. In general, within th...

Read more
Data to Production Ready Models to Business Apps in Just a Few Steps
by Shivam Bansal | February 05, 2021 H2O Hydrogen Torch , Solutions

Building a Credit Scoring Model and Business App using H2OIn the journey of a successful credit scoring implementation, multiple stakeholders and different personas are involved at different steps – Business Inputs, Dataset procurement, Data Analysis, Predictive Machine Learning, Data Storytelling, and Dashboarding. H2O.AI platforms such ...

Read more
Using Python's datatable library seamlessly on Kaggle
by Parul Pandey, Rohan Rao | February 03, 2021 Data Munging , Data Science , Datatable

Managing large datasets on Kaggle without fearing about the out of memory error Datatable is a Python package for manipulating large dataframes. It has been created to provide big data support and enable high performance. This toolkit resembles pandas very closely but is more focused on speed.It supports out-of-memoy datasets, multi-thr...

Read more
Successful AI: Which Comes First, the Data or the Question?
by Ellen Friedman | February 02, 2021 Business , H2O Driverless AI

Successful AI is a business process. Even the most sophisticated models, the latest algorithms, and highly experienced AI experts cannot make AI a practical success unless it is connected to a meaningful business goal . To make that happen, you need a good interaction between those with knowledge of the business and with the AI team. But ...

Read more
Introducing H2O AI Cloud
by Benjamin Cox, Jo-Fai Chow | January 26, 2021 Cloud , H2O AI Cloud , Kubernetes

Organizations have made large investments in modernizing their data infrastructure and operations, but most still struggle to drive maximum value from their data. Many companies experimented with building large teams of expert data scientists, and while this approach did produce some valuable models, the cost was high and the timeframes ...

Read more
Using AI to unearth the unconscious bias in job descriptions
by Parul Pandey, Shivam Bansal | January 19, 2021 H2O Hydrogen Torch , Responsible AI

“Diversity is the collective strength of any successful organization Unconscious Bias in Job DescriptionsUnconscious bias is a term that affects us all in one way or the other. It is defined as the prejudice or unsupported judgments in favor of or against one thing, person, or group as compared to another, in a way that is usually con...

Read more
H2O Driverless AI 1.9.1: Continuing to Push the Boundaries for Responsible AI
by Benjamin Cox | January 18, 2021 H2O Driverless AI , Responsible AI

At, we have been busy. Not only do we have our most significant new software launch coming up (details here ), but we also are thrilled to announce the latest release of our flagship enterprise platform H2O Driverless AI 1.9.1. With that said, let’s jump into what is new: Faster Python scoring pipelines with embedded MOJOs for r...

Read more
Meet the Data Scientist who just cannot stop winning on Kaggle.
by Parul Pandey | January 15, 2021 Kaggle

In conversation with Philipp Singer: A Data Scientist, Kaggle Double Grandmaster, and a Ph.D. in Computer Science. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate an...

Read more Speeds Credit Scoring for Fair Lending with
by Eve-Anne Trehin | January 12, 2021 Customers , Financial Services , Solutions is a technological and innovative company developing a platform for leasing equipment for small and medium enterprises. As part of its business to provide a variety of credit options for companies that want to finance capital purchases, needs to rapidly and accurately assess the credit risk and scoring of a customer in o...

Read more
New Improvements in H2O
by Veronika Maurerova | December 17, 2020 H2O Release , XGBoost

There is a new minor release of H2O that introduces two useful improvements to our XGBoost integration: interaction constraints and feature interactions.Interaction ConstraintsFeature interaction constraints allow users to decide which variables are allowed to interact and which are not.Potential benefits: Better predictive performance...

Read more
Introducing H2O Wave
by Jo-Fai Chow, Benjamin Cox | December 15, 2020 H2O Hydrogen Torch , H2O-3 , Product Updates , Python

For almost a decade, has worked to build open source and commercial products that are on the leading edge of innovation in machine learning, from AutoML to Explainable AI . We are thrilled to announce the release of what we believe to be the future of AI Applications: H2O Wave . Wave is an open source, lightweight Python developmen...

Read more
Grandmaster Series: The inspiring journey of the ‘Beluga’ of Kaggle World 🐋
by Parul Pandey | December 14, 2020 Kaggle , Machine Learning

In conversation with Gábor Fodor: A Data Scientist at and a Kaggle Competitions’ Grandmaster. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage othe...

Read more
Automate your Model Documentation using H2O AutoDoc
by Parul Pandey | November 19, 2020 Data Science , H2O Driverless AI

Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes.The Federal Reserve’s 2011 guidelines state that without adequate documentation, model risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus ...

Read more
Mitos e verdades sobre o AutoML
by Alan Silva, Bruna Smith | November 10, 2020 AutoML , Beginners , Business , Community , Machine Learning

Todas as revoluções que tivemos até hoje, tanto as tecnológicas quanto industriais, possuem uma semelhança: elas estão ligadas à forma como os seres humanos lidam com as máquinas. Antes, os processos eram feitos de forma muito manual e, com o tempo, acabaram sofrendo uma evolução natural voltada para a automação. Com o aprendizado de máqu...

Read more
Maximizing your Value from AI
by Eve-Anne Trehin | November 09, 2020 Business , Community , Machine Learning , Partners

Some organizations have already identified the benefits that can be gained from Artificial Intelligence and Data Science, bringing in talented resources to enable them to build AI models and solutions. But more often than not, the business doesn’t understand the capabilities and huge potential of AI well enough, nor the investments that a...

Read more
AI in the Financial Industry: 8 Key Takeaways from the + Fireside Chat
by Bruna Smith | November 05, 2020 Community , Customers , Financial Services

The current global pandemic crisis presents various challenges to businesses in all industries, including financial services institutions, who are monitoring and dealing with the effects of COVID-19 across the world. At a time of a pandemic, it is important that teams get together to share their insights and experience, with the goal of i...

Read more
The Importance of Explainable AI

This blog post was written by Nick Patience, Co-Founder & Research Director, AI Applications & Platforms at 451 Research, a part of S&P Global Market Intelligence From its inception in the mid-twentieth century, AI technology has come a long way. What was once purely the topic of science fiction and academic discussion is now...

Read more
Building an AI Aware Organization

Responsible AI is paramount when we think about models that impact humans, either directly or indirectly. All the models that are making decisions about people, be that about creditworthiness, insurance claims, HR functions, and even self-driving cars, have a huge impact on humans. We recently hosted James Orton, Parul Pandey, and Sudala...

Read more
H2O on Kubernetes using Helm
by Team | October 16, 2020 H2O-3 , Kubernetes , Technical

Deploying real-world applications using bare YAML files to Kubernetes is a rather complex task, and H2O is no exception. As demonstrated in one of the previous blog posts . Greatly simplified, a cluster of H2O open source machine learning nodes is brought up in the following manner: A headless service to make initial node discovery and ...

Read more
Making AI a Reality

This blog post focuses on the content discussed in more depth in the free ebook “ Practical Advice for Making AI Part of Your Company’s Future”. Do you want to make AI a part of your company? You can’t just mandate AI. But you can lead by example.All too often, especially in companies new to AI and machine learning, team leaders may be ta...

Read more
Combining the power of KNIME and in a single integrated workflow
by Rafael Coss, Stefan Pacinda | October 14, 2020 AutoML , Community , H2O Driverless AI , Partners , Technical , Tutorials

KNIME and , the two data science pioneers known for their open source platforms, have partnered to further democratize AI. Our approaches are about being open, transparent, and pushing the leading edge of AI. We believe strongly that AI is not for the select few but for everyone. We are taking another step in democratizing AI by ...

Read more
The Challenges and Benefits of AutoML
by Eve-Anne Trehin | October 14, 2020 AutoML , H2O Driverless AI , Machine Learning , Responsible AI

Machine Learning and Artificial Intelligence have revolutionized how organizations are utilizing their data. AutoML or Automatic Machine Learning automates and improves the end-to-end data science process. This includes everything from cleaning the data, engineering features, tuning the model, explaining the model, and deploying it into p...

Read more
H2O Release 3.32 (Zermelo)
by Michal Kurka | October 14, 2020 H2O Release , H2O-3

There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve added RuleFit — an interpretable machine learning algorithm , introduced a new toolbox for model explainability, made Target Encoding work for all classes of problems, and integrated it in our AutoML framewor...

Read more
5 Key Elements to Detecting Fraud Quicker With AI
by Ashrith Barthur | October 13, 2020 Financial Services , Fraud Detection , H2O Driverless AI

The number of transactions using electronic financial instruments has been increasing by about 23% year over year. The global COVID-19 pandemic has only accelerated that process. Electronic means have become the primary vehicle of how people purchase their goods. With this sudden increase in transactions, fraud detection systems are stres...

Read more
Empowering Snowflake Users with AI using SQL
by Vinod Iyengar, Yves Laurent | October 12, 2020 Community , Machine Learning , Partners , Technical , Tutorials

At we work with many enterprise customers, all the way from Fortune 500 giants to small startups. What we heard from all these customers as they embark on their data science and machine learning journey is the need to capture and manage more data cost-effectively, and the ability to share that data across their organization to mak...

Read more
3 Ways to Ensure Responsible AI Tools are Effective

Since we began our journey making tools for explainable AI (XAI) in late 2016, we’ve learned many lessons, and often the hard way. Through headlines, we’ve seen others grapple with the difficulties of deploying AI systems too. Whether it’s: a healthcare resource allocation system that likely discriminated against millions of black peop...

Read more
Accelerating AI Transformation in Healthcare

The healthcare industry is evolving rapidly with volumes of data and increasing challenges. Early adopters of AI and machine learning in the healthcare space have embraced new data-driven initiatives and are reaping the benefits not only in terms of patient care but also in their own operations. Hospitals, physicians, and laboratories can...

Read more
5 Key Considerations for Machine Learning in Fair Lending

This month, we hosted a virtual panel with industry leaders and explainable AI experts from Discover, BLDS, and to discuss the considerations in using machine learning to expand access to credit fairly and transparently and the challenges of governance and regulatory compliance. The event was moderated by Sri Ambati, Founder and CE...

Read more
The Benefits of Budget Allocation with AI-driven Marketing Mix Models
by Michael Proksch | September 17, 2020 AutoML , Business , Customers , GBM , GLM , Machine Learning , Solutions

Excerpt of the white paper: “The Latest in AI Technologies Reinvent Media and Marketing Analytics @ Allergan” Authors: Akhil Sood, Associate Director @ Marketing Sciences, Allergan Dr. Michael Proksch, Senior Director @ Vijay Raghavan, Associate Vice President @ Marketing Sciences, AllerganIntroductionThe call for accountability in...

Read more
My Experience at the World’s Best AI Company
by Jo-Fai Chow | September 15, 2020 Makers

Blog post by Spencer Loggia When H2O announced that remote work would continue through the summer due to Covid-19, I was a little disappointed. I expected that it would be difficult to connect with others as a new employee, especially as an intern. My internship now comes to an end, and I realize how completely wrong I was. I’ve met and w...

Read more
What it is like to intern at
by Jo-Fai Chow | September 15, 2020 Makers

Blog post by Jasmine Parekh Let’s be honest, 2020 is not going to go down as a glory year in history, unless something absolutely miraculous happens in the next few months. Generations of highschoolers down the line will sit in history class learning about the pandemic that halted the world. In the face of the virus, everyone around the w...

Read more
Desmistificando a Inteligência Artificial e seu papel no sucesso dos negócios
by Bruna Smith | September 14, 2020 Business , Makers

A Inteligência Artificial tem sido um termo bastante utilizado atualmente, mas será que todos sabem, na prática, o que ela significa e como se beneficiar dessa tecnologia inovadora? Assim como toda buzzword, a IA também gera muitos mitos. Entre eles, a crença de que a aprendizagem de máquinas irá substituir o trabalho dos cientistas de da...

Read more
Modèles NLP avec BERT
by Badr Chentouf | September 02, 2020 H2O Driverless AI , NLP

H2O Driverless AI 1.9 vient de sortir, et je vous propose une série d’articles sur les dernières fonctionnalités innovantes de cette solution d’Automated Machine Learning, en commençant par l’implémentation de BERT pour les tâches NLPBERT , ou “Bidirectional Encoder Representations from Transformers” est considéré aujourd’hui comme l’éta...

Read more
Exploring the Next Frontier of Automatic Machine Learning with H2O Driverless AI
by Jo-Fai Chow | July 28, 2020 AutoML , H2O Driverless AI

At, it is our goal to democratize AI by bridging the gap between the State-of-the-Art (SOTA) in machine learning and a user-friendly, enterprise-ready platform. We have been working tirelessly to bring the SOTA from Kaggle competitions to our enterprise platform Driverless AI since its very first release. The growing list of Driver...

Read more
In a World Where… AI is an Everyday Part of Business

Imagine a dramatically deep voice-over saying “In a world where…” This phrase from old movie trailers conjures up all sorts of futuristic settings, from an alien “world where the sun burns cold”, a Mad Max “world without gas” to a cyborg “world of the not too distant future”.Often the epic science fiction or futuristic stories also have a...

Read more
Running Sparkling Water in Kubernetes
by Jakub Hava | July 10, 2020

Sparkling Water can now be executed inside the Kubernetes cluster. Sparkling Water provides a Beta version of Kubernetes support in a form of nightlies. Both Kubernetes deployment modes, cluster and client, are supported. Also, both Sparkling Water backends and all clients are also ready to be tested. Sparkling Water in Kubernetes is ...

Read more
From GLM to GBM – Part 2

How an Economics Nobel Prize could revolutionize insurance and lending Part 2: The Business Value of a Better ModelIntroductionIn Part 1 , we proposed better revenue and managing regulatory requirements with machine learning (ML). We made the first part of the argument by showing how gradient boosting machines (GBM), a type of ML, can mat...

Read more
A Inteligência Artificial está transformando e alavancando negócios. Entenda como e por quê
by Daniel Garbuglio | June 26, 2020

Você sabia que inteligência artificial e machine learning não são conceitos novos? Pois eles surgiram pela primeira vez em 1956 na universidade de Dartmouth, nos Estados Unidos, mas vêm mudando e evoluindo significativamente ao longo do tempo. Hoje, a quantidade de dados que uma empresa dispõe para análise é gigantesca e seu crescimento é...

Read more
On-Ramp to AI
by Rafael Coss | June 11, 2020

The path to democratize AI starts with one class Artificial Intelligence (AI) is like a superhighway, it’s moving fast, evolving, and growing quickly. Like most things in life, data scientists are not born with AI and Machine Learning (ML) knowledge. They learn it. Learning is a journey. At, we are on a mission to democratize AI...

Read more
From GLM to GBM - Part 1

How an Economics Nobel Prize could revolutionize insurance and lending Part 1: A New Solution to an Old ProblemIntroductionInsurance and credit lending are highly regulated industries that have relied heavily on mathematical modeling for decades. In order to provide explainable results for their models, data scientists and statisticians i...

Read more
Sparkling Water is out
by Jakub Hava | June 04, 2020 H2O-3 , Sparkling Water

Sparkling Water is about making machine learning simple, speedy, and scalable with Apache Spark. This blog provides an overview of the following new features: No H2O Client on Spark Driver Speedups Automatic String conversion to Categoricals No H2O Client on Spark DriverPreviously, Sparkling Water always started worker nodes eith...

Read more
Are All Your AI and ML Models Wrong?
by James Orton | May 05, 2020 Machine Learning , Makers

We are living in unprecedented times. Our society and economy are experiencing shocks beyond anything we have seen in living history. Beyond the human cost, there is a data science and machine learning elephant in the room (hopefully 2 meters away): Are your predictive models still doing the job you expect them to do?The challenge here i...

Read more
Lessons of COVID-19 and Moving Forward: Key Takeaways
by Ingrid Burton | May 01, 2020 AI4Good , Community , Company , Data Science

This week, we hosted our second virtual panel focused on how AI can empower healthcare organizations to make better decisions and save lives. Improved forecasting and predictions lead to higher chances in managing and mitigating adverse events, such as the COVID-19 pandemic. I’m proud to acknowledge that is committed to helping cus...

Read more
Running H2O cluster on a Kubernetes cluster
by Team | April 14, 2020 H2O-3 , Kubernetes

H2O is an open-source, in-memory platform for distributed, scalable machine learning. A perfect match for deployment on a Kubernetes cluster, the very modern way of deploying, serving & scaling applications. With the major release, released in Q1 2020, H2O obtained first class Kubernetes support .This article explains how t...

Read more
H2O Release 3.30 (Zahradnik)
by Michal Kurka | April 07, 2020 H2O Release

There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve introduced support for Generalized Additive Models, added an option to build many models in parallel on segments of your dataset, improved support for deploying on Kubernetes, upgraded XGBoost with newly added...

Read more
Brief Perspective on Key Terms and Ideas in Responsible AI

INTRODUCTIONAs fields like explainable AI and ethical AI have continued to develop in academia and industry, we have seen a litany of new methodologies that can be applied to improve our ability to trust and understand our machine learning and deep learning models. As a result of this, we’ve seen several buzzwords emerge. In this short po...

Read more
Three Ways Data and AI is Helping Against COVID19
by Niki Athanasiadou | April 01, 2020 AI4Good , Data Science , Healthcare , Machine Learning

We are in the midst of a global crisis that epidemiologists have warned us about. As of today, 180 countries and sovereign regions have confirmed cases of patients infected with COVID19 (from here ). Putting aside evidence that indicates the virulence of the disease could be much worse, the fast spread of the virus and the presence of hi...

Read more
Modelling Currently Infected Cases of COVID-19 Using H2O Driverless AI
by Marios Michailidis | March 30, 2020 Healthcare

In response to the wake of the pandemic called COVID-19, organized a panel discussion to cover AI in healthcare, and some best practices to put in place in order to achieve better outcomes. The attendees had many questions that we did not have the time to cover thoroughly throughout the course of that 1-hour discussion. We hope ...

Read more
Deploying Models to Maximise the Impact of Machine Learning — Part 1
by Stefan Pacinda, John Spooner | March 29, 2020 H2O Driverless AI , Machine Learning , ModelOps

Introduction to the 4 key pillars of considerations for model deployment (1st part of a blog series)So you have built a machine learning (ML) model which delivers a high level of accuracy and does not overfit. What value does it have now? Well, at the moment, nothing, zero, diddly squat. There is no economic value in a machine learning mo...

Read more
Igniting the AI in Healthcare Community
by David Engler | March 28, 2020 AI4Good , Community , Data Science , Healthcare

Yesterday we held our first Community Discussion on AI in Healthcare. Our CEO and founder, Sri Ambati led the discussion between Niki Athanasiadou, Marios Michailidis, one of our Grandmasters , and myself. We had nearly 1,300 participants registered from over 45 countries, and over half of those joined live others are viewing the replay. ...

Read more
COVID-19: Doing Good with Data + AI
by David Engler, Marios Michailidis | March 26, 2020 AI4Good , Data Science , Healthcare , Machine Learning , Time Series

During times of severe societal strain, individuals have historically shown an inclination to offer aid and assistance. Often these sacrifices have been at great cost to life or livelihood. In other cases, the efforts have been seemingly more mundane but nevertheless still essential. The efforts of the over 10,000 women code breakers of W...

Read more
Take Your Pega CRM on the Road to AI Transformation
by Yves Laurent | March 24, 2020 Business , Cloud , H2O Driverless AI , Solutions

How well does your company know its customers and prospects? Are your people empowered with relevant information when they interact with clients? What guides your employees at every step of the customer journey? Every successful company depends on how well it can address each of these questions. Investments in Customer Relationship Manage...

Read more
How is Reinventing Healthcare with AI
by Parul Pandey | March 23, 2020 AI4Good , Data Science , Healthcare is hosting a virtual Meetup on AI and Healthcare: Best Practices for Better Outcomes. Join us on 26th March, for a community discussion to collaborate with us and leading healthcare organizations to share ideas and best practices including predicting hospital staffing needs, ICU transfers, as well as sepsis detection and more. Reg...

Read more
Summary of a Responsible Machine Learning Workflow

A paper resulting from a collaboration between H2O.AI and BLDS, LLC was recently published in a special “Machine Learning with Python” issue of the journal, Information ( In “A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing...

Read more
It is a privilege to serve the world in its hour of need – response to the COVID-19 pandemic

During the COVID-19 pandemic, our world, our nations, states, counties, cities and communities face an unprecedented challenge with an urgent need to help our citizens and ultimately our national and global economy. At highest risk are senior citizens, at-risk populations (individuals with immunodeficiency, hypertension, diabetes) and our...

Read more
Health Outcomes and the Miracle of Data
by David Engler | March 16, 2020 AI4Good , Healthcare , Machine Learning

In 1846, a physician named Ignatz Semmelweis, located at the Allgemeine Krankenhaus in Vienna, faced a dire healthcare crisis. He observed that the maternity ward in his own hospital (as well as those in other area hospitals) had a maternal mortality rate of over 15%. That is, one out of every six mothers who came to his hospital to give ...

Read more
Detecting Money Laundering Networks Using H2O Driverless AI
by Parul Pandey, Ashrith Barthur, Sandip Sharma | March 05, 2020 Anti-Money Laundering , Data Science , Financial Services , H2O Driverless AI

Note: Dr. Ashrith Barthur (Principal Security Scientist, and Sandip Sharma (Director of Solution Engineering, will be speaking about solving money laundering and other real-world problems using machine learning at our upcoming webinar. You can grab a spot here. Artificial Intelligence has evolved from being a buzz word t...

Read more
A Letter to the Makers at
by Sri Ambati | March 05, 2020 Makers

To TeamAll,Many of you have already seen this alert from me in different variations over the last few weeks. Some of you are already remote and following some of these precautions.Starting today please make all meetings default to virtual or remote. Use Zoom, Webex, Slack, FaceTime, WhatsApp and WeChat to keep in touch with your teammates...

Read more
Insights From the New 2020 Gartner Magic Quadrant For Cloud AI Developer Services

We are excited to be named a Visionary in the new Gartner Magic Quadrant for Cloud AI Developer Services (Feb 2020), and have been recognized for both our completeness of vision and ability to execute in the emerging market for cloud-hosted artificial intelligence (AI) services for application developers. This is the second Gartner MQ tha...

Read more
AI & ML Platforms: My Fresh Look at Technology

2020: A new year, a new decade, and with that, I’m taking a new and deeper look at the technology offers for building AI and machine learning systems. I’ve been interested in since its early days as a company (it was 0xdata back then) in 2014. My involvement had been only peripheral, but now I’ve begun to work with this comp...

Read more
Interview with Patrick Hall | Machine Learning, & Machine Learning Interpretability

Audio Link: In this episode of Chai Time Data Science , Sanyam Bhutani interviews Patrick Hall, Sr. Director of Product at Patrick has a background in Math and has completed a MS Course in Analytics.In this interview they talk all about Patrick’s journey into ML, ML Interpretability and his journey at, how his work has ev...

Read more
Key Takeaways from the 2020 Gartner Magic Quadrant for Data Science and Machine Learning

We are named a Visionary in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms (Feb 2020). We have been positioned furthest to the right for completeness of vision among all the vendors evaluated in the quadrant. So let’s walk you through the key strengths of our machine learning platforms. Automatic Machine Learn...

Read more
Blink: Data to AI/ML Production Pipeline Code in Just a Few Clicks
by Karthik Guruswamy | February 11, 2020 H2O Driverless AI , Machine Learning , Python , Technical

You have the data and now want to build a really really good AI/ML model and deliver to production. There are three options available today: Write the code yourself in a Jupyter notebook/R Studio etc., for training/validation and dev-ops model handoff. You decided to do the feature engineering also. Build your own features like above,...

Read more
Speed up your Data Analysis with Python’s Datatable package
by Parul Pandey | February 05, 2020 Data Munging , Data Science , Datatable , H2O Driverless AI

A while ago, I did a write up on Python’s Datatable library . The article was an overview of the datatable package whose focus is on big data support and high performance. The article also compared datatable’s performance with the pandas’ library on certain parameters. This is the second article in the series with a two-fold objective: ...

Read more
Parallel Grid Search in H2O

H2O-3 is, at its core, a platform for distributed, in-memory computing. On top of the distributed computation platform, the machine learning algorithms are implemented. At, we design every operation, be it data transformation, training of machine learning models or even parsing to utilize the distributed computation model. In ord...

Read more
The Super Bowl and Data Science: Changing the NFL with the Power of Machine Learning
by Rafael Coss | January 31, 2020 Data Science , H2O-3 , Kaggle , Machine Learning

Super Bowl LIV came and went. The San Francisco 49ers vs the Kansas City Chiefs. Personally, being from the The Bay, I was rooting for the 49ers, but you can’t always get what you want. Whoever came out on top, though, we were all looking forward to a great game full of fantastic plays and the kind of gridiron tenacity where players lay i...

Read more
Grandmaster Series: How a Passion for Numbers Turned This Mechanical Engineer into a Kaggle Grandmaster

In conversation with Sudalai Rajkumar: A Kaggle Double Grandmaster and a Data Scientist at H2O.aiIt is rightly said that one should never seek praise. Instead, let the effort speak for itself. One of the essential traits of successful people is to never brag about their success but instead keep learning along the way. In the data science ...

Read more
How H2O propels data scientists ahead of itself: enhancing Driverless AI models with advanced options, recipes and visualizations
by Gregory Kanevsky | January 06, 2020 engineers continually innovate and introduce new techniques by adopting latest research, working on cutting edge use cases, and participating in and winning machine learning competitions like Kaggle. But thanks to the explosion of AI research and applications even the most advanced automated machine learning platform like H2O Drive...

Read more
H2O Release 3.28 (Yu)
by Michal Kurka | December 20, 2019 H2O Release

There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve introduced support for Hierarchical GLM, added an option to parallelize Grid Search, upgraded XGBoost with newly added features, and improved our AutoML framework. The release is named after Bin Yu .Hierarchi...

Read more
Why you should care about debugging machine learning models
by Team | December 12, 2019 Explainable AI , Machine Learning

This blog post was originally published here. Authors: Patrick Hall and Andrew Burt For all the excitement about machine learning (ML), there are serious impediments to its widespread adoption. Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing p...

Read more
Interview with Arno Candel | AutoML | Physics | CTDS.Show
by Sanyam Bhutani | December 12, 2019 Community , Company , Data Science

In this episode, Sanyam Bhutani interviews Dr. Arno Candel: CTO at They talk about Arno’s journey into the field with amazing comments and insights by Arno applicable to the field. They talk all about Arno’s journey and ML, Automated Machine Learning Broadly speaking. Arno’s journey from Physics to Software Engineering to Machine L...

Read more
How to Effectively Employ an AI Strategy in your Business
by Parul Pandey | December 11, 2019 Beginners , Business , Machine Learning

Artificial Intelligence has evolved from being a buzz word to a reality today. Companies with expertise in machine learning systems are looking to graduate to Artificial Intelligence-based technologies. The enterprises that do not yet have a machine learning culture are trying to devise a strategy to put one in place. Amidst t...

Read more
Scalable AutoML in H2O
by Sanyam Bhutani | November 27, 2019 AutoML , H2O World , Machine Learning , Technical

Note: I’m grateful to Dr. Erin LeDell for the suggestions, corrections with the writeup. All of the images used here are from the talks’ slides. Erin Ledell’s talk was aimed at AutoML : Automated Machine Learning , broadly speaking, followed by an overview of H2O’s Open Source Project and the library. H2O AutoML provides an easy-to-use ...

Read more
Meet Yauhen Babakhin: The first and the only Kaggle Grandmaster from Belarus
by Parul Pandey | November 22, 2019 Makers

There is more to competitive Data Science than simply applying algorithms to get the best possible model. The main takeaway from participating in these competitions is that they provide an excellent opportunity for learning and skill-building. The learnings can then be utilized in one’s academic or professional life. Kaggle is one of th...

Read more
Climbing the AI and ML Maturity Model Curve
by Karthik Guruswamy | November 19, 2019 Data Science , Machine Learning , Technical

AI/ML Maturity Model Curve/StepsAI/ML Maturity models are published and updated periodically by a lot of vendors. The end goal is almost always about effecting transformation and automate processes in a short period and making AI the DNA/core of the business.One of the biggest challenges for businesses today is to clearly define what succ...

Read more
How to write a Transformer Recipe for Driverless AI
by Ashrith Barthur | November 18, 2019 H2O Driverless AI , Machine Learning , Recipes

What is a transformer recipe? A transformer (or feature) recipe is a collection of programmatic steps, the same steps that a data scientist would write a code to build a column transformation. The recipe makes it possible to engineer the transformer in training and in production. The transformer recipe, and recipes, in general, provide a...

Read more
Novel Ways To Use Driverless AI

I am biased when I write that Driverless AI is amazing, but what’s more amazing is how I see customers using it. As a Sales Engineer, my job has been to help our customers and prospects use our flagship product. In return, they give us valuable feedback and talk about how they used it. Feedback is gold to us. Driverless AI has evolved in...

Read more
Image Tasks on H2O Driverless AI
by Sanyam Bhutani | November 12, 2019 H2O Driverless AI , H2O World , Makers

I’d like to thank Grandmaster Yauhen Babakhin for reviewing the drafts and the very useful corrections & suggestions. Link to the video. IntroductionIn this talk Kaggle GrandMaster and Data Scientist at Yauhen Babakhin shows us a few prototype demos of how DriverlessAI’s upcoming release will work with Image Data and the relat...

Read more
Accelerate Machine Learning workflows with Driverless AI on Red Hat OpenShift, Enterprise Kubernetes Platform
by Nicholas Png | November 12, 2019 H2O Driverless AI , Kubernetes

Organizations globally are operationalizing containers and Kubernetes to accelerate Machine Learning lifecycles as these technologies provide data scientists and software developers with much needed agility, flexibility, portability, and scalability to train, test, and deploy ML models in production. Red Hat OpenShift is the industry’s mo...

Read more
Importing, Inspecting, and Scoring With MOJO Models Inside H2O
by Team | November 08, 2019 H2O-3 , Technical

Machine-learning models created with H2O may be exported in two basic ways: Binary format, Model Object, Optimized (MOJO). An H2 O model can be saved in a binary format, which is tied to the very specific version of H2 O it has been created with. There are multiple reasons for such a restriction. One of the important reasons is that...

Read more
Natural Language Processing in H2O’s Driverless AI
by Sanyam Bhutani | November 06, 2019 Community , H2O Driverless AI , H2O World , Makers , NLP

Note: I’d like to thank Grandmaster SRK for a lot of suggestions and corrections with the writeup.Note: All images used here are from the talk. Link to the slides Link to the video Note 2: All of the discussion here is related to NLP. DriverlessAI also supports other domains that are covered in other talks and posts (releasing soon). Driv...

Read more
Highlights of H2O World New York 2019
by Team | November 02, 2019 Community , H2O World , Makers

H2O World New York happened a few days ago and we are still in awe of the conference. It is rewarding to see such a strong community and recognized industry professionals making meaningful connections and learning with each other. We are grateful for having so many makers and customers joining us – in person and via live stream – for a fu...

Read more
Takeaways from the World’s largest Kaggle Grandmaster Panel

Disclaimer: We were made aware by Kaggle of adversarial actions by one of the members of this panel. This panelist is no longer a Kaggle Grandmaster and no longer affiliated with as of January 10th, 2020. Personally, I’m a firm believer and fan of Kaggle and definitely look at it as the home of Data Science. ...

Read more
A Full-Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer
by Sanyam Bhutani | October 17, 2019 Data Science , Machine Learning Interpretability , Makers

Content originally posted in HackerNoon and Towards Data Science 15th of October, 2019 marks a special milestone, actually quite a few milestones. So I considered sharing it in the form a blog post, on a publication that has been home to all of my posts The online community has been too kind to me and these blog posts have been a method ...

Read more
The Data Scientist who rules the "Data Science for Good" competitions on Kaggle
by Parul Pandey | October 17, 2019 Makers

In conversation with Shivam Bansal: A Data Scientist, a Kaggle Kernel’s Grandmaster, and three times winner of Kaggle’s Data Science for Good Competition. Communication is an art and a useful tool in the Data Science domain. Being able to communicate the insights is necessary so that others can take the required actions based on the resu...

Read more
A Deep Dive into H2O’s AutoML
by Parul Pandey | October 16, 2019 AutoML , H2O-3 , Technical

The demand for machine learning systems has soared over the past few years. This is majorly due to the success of Machine Learning techniques in a wide range of applications. AutoML is fundamentally changing the face of ML-based solutions today by enabling people from diverse backgrounds to use machine learning models to address complex ...

Read more
Make your own AI — Add Your Game to Auto-ML Models
by Karthik Guruswamy | October 15, 2019 AutoML , H2O Driverless AI , Machine Learning , Technical

When Features and Algorithms compete, your Business Use Case(s) wins! H2O Driverless AI is an Automatic Feature Engineering /Machine Learning platform to build AI/ML models on tabular data. Driverless AI can build supervised learning models for Time Series forecasts, Regression , Classification , etc. It supports a myriad of built-i...

Read more
H2O World New York: The Countdown is On!
by Team | October 14, 2019 Community , Company , Events , H2O World , Makers

Every H2O World is magical. The preparation for the conference starts many months in advance and we put a lot of effort and love in every single detail to provide our beloved community with the best experience possible. Our upcoming H2O World New York on October 22 is the third edition I work on as part of the marketing team at My...

Read more
5 Key Takeaways On Overcoming Gender and Diversity Barriers
by Team | October 04, 2019

Overcoming gender and diversity barriers in the workplace is a challenge for many industries. Therefore, listening to women and discussing the topic is the first step towards finding out how to address gender bias and possible inequalities. Last month, organized a panel in New York: Breaking gender and diversity barriers in machi...

Read more
Predicting Failures from Sensor Data using AI/ML — Part 2
by Karthik Guruswamy | September 27, 2019 H2O Driverless AI , Recipes , Technical

This is Part 2 of the blog post series and continuation of the original post, Predicting Failures from Sensor Data using AI/ML — Part 1 .Missing Values & Data ImbalanceOne of the things to note is that the hard-disk data set has a lot of missing values across its columns. Check out the Missing Data Heat Map on the training data set — ...

Read more
H2O Driverless AI: The Workbench for Data Science

This blog was written by Rohan Gupta and originally published here. 1. IntroductionIn today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you can get by with just intuitive knowledge. Especially if you’re on H2O’s Driverle...

Read more
H2O Driverless AI Acceleration with Intel DAAL
by Rafael Coss | September 25, 2019 Data Science , H2O Driverless AI , Machine Learning

This week at Strata NY 2019 we will be demoing a custom recipe that incorporates the Intel Data Analytics Acceleration Libraray (DAAL) algorithm into Driverless AI. This blog will provide an introduction to Intel DAAL and how the Make-Your-Own-Recipe capability extends H2O Driverless AI. If you are at Strata NY 2019, stop by the Intel bo...

Read more
Custom recipes for Driverless AI: Prophet and pmdarima cases
by Marios Michailidis | September 24, 2019 H2O Driverless AI , Recipes , Technical

Last updated: 09/23/19 H2O Driverless AI provides a great new feature called “custom recipes”. These recipes are essentially custom snippets of code which can incorporate any machine learning algorithm , any scorer/metric and any feature transformer. A user can create custom recipes using python utilizing any external library or his/her o...

Read more
From Academia to Kaggle and How a Physicist found love in Data Science
by Parul Pandey | September 16, 2019 H2O Driverless AI , Machine Learning , Makers

Learning and taking inspirations from others is always helpful. It makes even more sense in the Data Science realm, which is continuously being bombarded with new courses, MOOCs, and recommendations with every passing day. Not only such a lot of choices become overwhelming but also perplexing at times. With this thought in mind, we bring...

Read more
Regression Metrics' Guide
by Marios Michailidis | September 09, 2019

Introduction As part of my role within the automated machine learning space with H2O.AI and Driverless AI, I have seen that many times people struggle to find the right optimization metric for their data science problems. This process is even more challenging in regression problems where the errors are often not bounded like you norma...

Read more
Series ‘D’emocratize
by Thomas Ott | September 07, 2019 Community , H2O Driverless AI , Makers

Last month was very emotional for me and I suspect it was the same for many of my fellow Makers at The news broke that raised its Series D funding of $72.5 million led by Goldman Sachs and Ping An. While some of my friends were ecstatic for me, I felt like a big weight had been lifted off me. The best word to describe what ...

Read more
Driverless AI can help you choose what you consume next
by Parul Pandey | September 06, 2019

Last updated: 09/06/19 Steve Jobs once said, “A lot of times, people don’t know what they want until you show it to them’. This makes sense, especially in this era of constant choice overload. Consumers today have access to a plethora of products just at the click of their mouse. These innumerable choices can sometimes turn out to be ...

Read more
Startup Aims to Democratize AI
by Ingrid Burton | September 05, 2019 Community , Company , Events , Guest Posts , Makers

Adam Janofsky at the Wall Street Journal wrote a wonderful article about our company, and our eloquent and philosophical CEO and Founder, Sri Ambati. The makers at believe deeply in our mission to democratize AI for everyone, and we can see a future where every company can be an AI company. Read more below, and enjoy! Startup Aims ...

Read more
Predicting Failures from Sensor Data using AI/ML— Part 1
by Karthik Guruswamy | August 26, 2019 H2O Driverless AI , Machine Learning , Technical

Last updated: 08/26/19 Whether it’s healthcare, manufacturing or anything that we depend on either personal or in business, Prevention of a problem is always known to be better than cure! Classic prevention techniques involve time-based checks to see how things are progressing, positively or negatively. Time-based chec...

Read more
New Innovations in Driverless AI

What’s new in Driverless AIWe’re super excited to announce the latest release of H2O Driverless AI . This is a major release with a ton of new features and functionality. Let’s quickly dig into all of that: Make Your Own AI with Recipes for Every Use Case: In the last year, Driverless AI introduced time-series and NLP recipes to meet the...

Read more
Interns Gonna Make
by Team | August 16, 2019 Community

Blog post by Megan Chan When I first walked through the front doors of the Mountain View office, I have to admit, thoughts of robots, cyborgs, and Arnold Schwarzenegger as The Terminator were in the back of my mind. However, my initial preconceived notions were quickly put to rest.I am a third-year college intern studying Psycholog...

Read more
A Maker Data Scientist’s journey: from Sudoku to Kaggle
by Parul Pandey | August 16, 2019 H2O Driverless AI , Machine Learning , Makers

If you put enough smart people together in one space, good things happen. Erik Hersman One of the perks of being a part of is that you get to work with some of the brightest minds on the planet. Here you get to closely engage with people who have a great deal of experience, as well as expertise. One such set of specialists here ar...

Read more
My Summer Internship at
by Priya Jain | August 10, 2019 Community

I can’t believe the summer is nearing an end. What an amazing experience I have had at As I reflect back, I am so fortunate to have learned so much, formed meaningful relationships, developed people skills and applied my creativity. The whole team has been so encouraging, supportive, and inviting throughout my internship, makin...

Read more
Detecting Sarcasm is difficult, but AI may have an answer
by Parul Pandey | August 05, 2019 H2O Driverless AI , NLP , Recipes , Technical , Tutorials

Recently, while shopping for a laptop bag, I stumbled upon a pretty amusing customer review: “This is the best laptop bag ever. It is so good that within two months of use, it is worthy of being used as a grocery bag.” The innate sarcasm in the review is evident as the user isn’t happy with the quality of the bag. However, as the sentence...

Read more
Mitigating Bias in AI/ML Models with Disparate Impact Analysis

Everyone understands that the biggest plus of using AI/ML models is a better automation of day-to-day business decisions, personalized customer service, enhanced user experience, waste elimination, better ROI, etc. The common question that comes up often though is — How can we be sure that the AI/ML decisions are free from bias/discrimina...

Read more
H2O Release 3.26 (Yau)
by Michal Kurka | July 30, 2019 H2O Release

There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve introduced the ability to define a Custom Loss Function in our GBM implementation, and we’ve extended the portfolio of our machine learning algorithms with the implementation of the SVM algorithm. The release...

Read more
A Driverless Approach to Make Forecasting Easy — Part 1
by Karthik Guruswamy | July 25, 2019 H2O Driverless AI

You are from the supply chain department or in a role in charge of creating future estimates on Product Sales, Patient admission, Retail Store Staffing, Energy use, Ticket sales, etc., based on historical data. A common problem is to forecast numbers one week, 4 weeks, 6 months or 1–5 year, etc., in future — basically short term &a...

Read more
Custom Machine Learning Recipes: The ingredients for success

Last updated: 07/23/19Machine learning is akin to cooking in several ways. A perfect dish originates from a tried-and-tested recipe, has the right combination of ingredients, and is baked at just the right temperature. Successful AI solutions work on the same principle. One needs fresh and right quality ingredients in the form of data, ...

Read more
AI for Smarter Manufacturing
by Vinod Iyengar | July 19, 2019 H2O Driverless AI , Manufacturing , Solutions

Code 3Manufacturing is a centuries old industry and has seen significant changes dating back to the first Industrial Revolution in the late 18th century. The use of conveyor belt assembly lines to replace assembly workers, newer precision robot technologies to further reduce manufacturing time, advances in ERP, historian databases, stora...

Read more
Leads to Leases

There is such a large amount of unstructured data being produced by companies. I personally find it so interesting that there is so much meaning and hidden value in text, audio, and visual content. Until recently, much of this data would go unused. However, since the rise of machine learning and artificial intelligence, it became possibl...

Read more
Getting started with H2O using Flow
by Parul Pandey | July 16, 2019 Flow , H2O-3 , Technical

This blog was originally published on towardsdatascience: look into H2O’s open-source UI for combining code execution, text, plots, and rich media in a single document. Data collection is easy. Decision making is hard. Today, we have access to a humungous...

Read more
ArmadaHealth Uses AI to Match Patients with Specialists to Improve Health Outcomes
by Priya Jain | July 09, 2019 Customers , Data Science , Healthcare

As an intern for, I am amazed to see how instrumental AI has been in transforming people’s lives for the better. Especially in healthcare, AI is bringing increased efficiency, ease, and helping people lead healthier lives. In this blog, I learned about how AI is helping potential patients find the right specialist for their needs a...

Read more
Toward AutoML for Regulated Industry with H2O Driverless AI

Predictive models in financial services must comply with a complex regime of regulations including the Equal Credit Opportunity Act (ECOA), the Fair Credit Reporting Act (FCRA), and the Federal Reserve’s S.R. 11-7 Guidance on Model Risk Management. Among many other requirements, these and other applicable regulations stipulate predictive ...

Read more Transforms Credit Risk Decision-Making Using AI

Determining credit has been done by traditional techniques for decades. The challenge with traditional credit underwriting is that it doesn’t take into account all of the various aspects or features of an individual’s credit ability., a new credit startup, saw this as an opportunity to apply machine learning and AI to impro...

Read more
The Reproductive Science Center of SF Bay Area uses AI to Treat Infertility

Having your own baby may be a dream that many people have but some cannot realize until they seek specialized help. The Reproductive Science Center of SF Bay Area is one of the pioneer organizations conducting in-vitro fertilization. They strive to produce healthy babies for their patients. However, every patient has their own set of obst...

Read more
Machine Learning on VMware: Training a Model with Tools, Inference using a REST Server and Kubernetes
by Vinod Iyengar | June 10, 2019 Community

This blog was originally posted by Justin Murray of VMware and can be accessed here. In this article, we explore the tools and process for (1) training a machine learning model on a given dataset using the H2O Driverless AI (DAI) tool, and (2) deploying a trained model, as part of a scoring pipeline, to a REST server for use by busi...

Read more
An Overview of Python’s Datatable package

This blog originally appeared on “There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days”: Eric Schmidt If you are an R user, chances are that you have already been using the data.ta...

Read more
Building an Interpretable & Deployable Propensity AI/ML Model in 7 Steps…

To start with, you may have a tabular data set with a combination of: Dates/Timestamps Categorical Values Text strings Numeric Values A business sponsor wants to build a Propensity to Buy model from historical data.How many Steps does it take? Let’s find out. We are going to use H2O’s Driverless AI instance with 1 GPU (optional...

Read more
Forrester Research recognizes as a leader in the New Automatic Machine Learning Wave
by Rafael Coss | May 28, 2019 Community , Customers , H2O Driverless AI

Today, The Forrester New Wave™ : Automation-Focused Machine Learning Solutions, Q2 2019 was published by Forrester Research. We are thrilled that this leading analyst firm recognized us as a clear leader in their Automatic Machine Learning evaluation. We could not be prouder of our unwavering strategy and hard work that we believe is prop...

Read more Automatic Machine Learning on Red Hat OpenShift Container Platform Delivers Data Science Ease and Flexibility at Scale
by Vinod Iyengar | May 14, 2019 Cloud , Data Science , Demos , H2O Driverless AI

Last week at Red Hat Summit in Boston, Sri Ambati, CEO and Founder, demonstrated how to use our award-winning automatic machine learning platform, H2O Driverless AI , on Red Hat OpenShift Container Platform. You can watch the replay here .What we showed not only helps data scientists achieve results, it also enables them to scale their ...

Read more
6 Tips to Having it All
by Ingrid Burton | May 12, 2019 Community , Events

I posted this blog on Medium two years ago, thought I’d share a slight rework of it with all the Mothers and Makers out there again.It’s Mother’s Day, and today is when I count my blessings. I am the mother of a wonderful blended family. I have four children of my own, and three stepchildren. Do the math… that’s 7! They are all great you...

Read more
AI/ML Projects — Don’t get stymied in the last mile

Data Scientists build AI/ML models from data, and then deploy it to production – in addition to a plethora of tasks around data insights, data cleansing etc., Part of the Data Scientist job description/requirement is making models available for transparency, auditability as well as explainability for both regulators as well as internal bu...

Read more
Hortifrut uses AI to Determine the Freshness of Blueberries

Who doesn’t love sweet, delicious blueberries?Providing a steady supply of beautiful, tasty berries to the market is no small effort and Hortifrut, based in Chile, has been growing and distributing berries for the last 30 years. Today, they are using AI to provide fresh berries to the world everyday.Hortifrut, the largest global producer ...

Read more
Can Your Machine Learning Model Be Hacked?!

I recently published a longer piece on security vulnerabilities and potential defenses for machine learning models. Here’s a synopsis.IntroductionToday it seems like there are about five major varieties of attacks against machine learning (ML) models and some general concerns and solutions of which to be aware. I’ll address them one-by-o...

Read more
H2O Driverless AI Updates
by Venkatesh Yadav | April 25, 2019 H2O Driverless AI , Product Updates

We are excited to announce the new release of H2O Driverless AI with lots of improved features.Below are some of the exciting new features we have added:Version 1.6.1 LTS (April 18, 2019) – Available here Several improvements for MLI (partial dependence plots, Shapley values) Improved documentation for model deployment, time-series ...

Read more
H2O World Explainable Machine Learning Discussions Recap

Earlier this year, in the lead up to and during H2O World, I was lucky enough to moderate discussions around applications of explainable machine learning (ML) with industry-leading practitioners and thinkers. This post contains links to these discussions, written answers and pertinent resources for some of the most common questions asked ...

Read more
H2O-3, Sparkling Water and Enterprise Steam Updates
by Venkatesh Yadav | April 10, 2019 Community , Data Science , H2O Release , Technical

We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.Below are some of the new features we have added:H2O-3 Yates ( – 3/31/2019Download at: Bug [PUBDEV-6159] – The test suite now runs correctly on a local mach...

Read more
H2O Release 3.24 (Yates)
by Michal Kurka | April 02, 2019 H2O Release

There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve introduced cross-version support for model import, added new features for model interpretation, provided much-improved support for reading data from Apache Hive, and included various algorithm and AutoML impr...

Read more
Building AI/ML models on Lending Club Data, with — Part 1
by Karthik Guruswamy, Vinod Iyengar | March 28, 2019 Beginners , Community , Data Journalism , Data Science , Technical , Tutorials

Lending Club publishes its basic loan databases to the public and a full version to its customers — anonymized of course. You can find the download page from this link (screenshot below): The publicly downloadable loan data has various attributes — roughly 150+ columns that have categorical, numeric, text and date fields. It also has a ‘...

Read more
AI/ML Model Scoring - What Good Looks Like in Production
by Karthik Guruswamy | March 10, 2019 H2O Driverless AI , Machine Learning , Technical

One of the main reasons why we build AI/Machine Learning models is for it to be used in production to support expert decision making. Whether your business is deciding what creatives your customers should be getting on emails or determining a product recommendation for a web page, AI/Models provide relevance/context to customers to drive ...

Read more
Machine Learning with H2O – the Benefits of VMware
by Vinod Iyengar | March 06, 2019 Cloud , Community , H2O Driverless AI

This blog was originally posted by Justin Murray of VMware and can be accessed here. This brief article introduces a short 4.5 minute video that explains the reasons why VMware vSphere is a great platform for data scientists/engineers to use as their base operating platform. The video then demonstrates an example of this, showing a data...

Read more
How to explain a model with H2O Driverless AI

The ability to explain and trust the outcome of an AI-driven business decision is now a crucial aspect of the data science journey. There are many tools in the marketplace that claim to provide transparency and interpretability around machine learning models but how does one actually explain a model? H2O Driverless AI provides robust inte...

Read more
Boosting your ROI with AutoML & Automatic Feature Engineering
by Karthik Guruswamy | February 25, 2019 AutoML , Machine Learning

If your business has started using AI/ML tools or just started to think about it, this blog is for you. Whether you are a data scientist, VP of data science or a line of a business owner, you are probably wondering how AI will impact your organization in various ways or why your current strategies are not working somehow. If you are not ...

Read more
What is Your AI Thinking? Part 3

In the past two posts we’ve learned a little about interpretable machine learning in general. In this post, we will focus on how to accomplish interpretable machine learning using H2O Driverless AI . To review, the past two posts discussed: Exploratory data analysis (EDA) Accurate and interpretable models Global explanations Local...

Read more
8 Tips to Make AI Happen Without Getting Fired
by Ingrid Burton | February 15, 2019 H2O World

“AI is the fastest growing workload on the planet,” Mike Gualtieri of Forrester Research.Last week, during H2O World San Francisco, we had the privilege to hear featured speaker Mike Gualtieri from Forrester Research offer tips on how to make AI happen without getting fired. This knowledge, he explained, was acquired by talking to enterp...

Read more
The Journey of Pi and AI: An AI conference with heart
by Thomas Ott | February 08, 2019 H2O World , Makers

I was in San Francisco this (past) week as part of H2O World 2019. I flew in the week before and took a red-eye flight back home right after the conference on Tuesday night. Like any technology conference, this one had fantastic presentations, training, and product roadmap presentations. We even live streamed it if you couldn’t be there i...

Read more
Key Takeaways from the Gartner Magic Quadrant For Data Science & Machine Learning

The Gartner Magic Quadrant for Data Science and Machine Learning Platforms (Jan 2019) is out and has been named a Visionary. The Gartner MQ evaluates platforms that enable expert data scientists, citizen data scientists and application developers to create, deploy and manage their own advanced analytic Key Highlights...

Read more
What is Your AI Thinking? Part 2

Explaining AI to the Business PersonWelcome to part 2 of our blog series: What is Your AI Thinking? We will explore some of the most promising testing methods for enhancing trust in AI and machine learning models and systems. We will also cover the best practice of model documentation from a business and regulatory standpoint.More Techniq...

Read more
H2O New Year releases
by Team | January 18, 2019 H2O Release , H2O-3 , Python , R

There were two releases shortly after each other. First, on December 21st, there was a minor (fix) release . Immediately followed by a more major release (but still on 3.22 branch) codename Xu, named after mathematician Jinchao Xu , whose work is focused on deep neural networks, besides many other fields of research.Of course, th...

Read more
What is Your AI Thinking? Part 1

Explaining AI to the Business PersonExplainable AI is in the news, and for good reason. Financial services companies have cited the ability to explain AI-based decisions as one of the critical roadblocks to further adoption of AI for their industry . Moreover, interpretability, fairness, and transparency of data-driven decision support sy...

Read more
Finally, You Can Plot H2O Decision Trees in R
by Gregory Kanevsky | January 15, 2019

Creating and plotting decision trees (like one below) for the models created in H2O will be the main objective of this post: Figure 1. Decision Tree Visualization in R Decision Trees with H2O With release H2O-3 (a.k.a. open source H2O or simply H2O) added to its family of tree-based algorithms (which already included DR...

Read more
What Business Leaders Need to Know About AI
by Ingrid Burton | January 11, 2019 Beginners , Community , Data Journalism , Data Science

The interest around artificial intelligence (AI) is at an all-time fevered pitch right now, and it’s important to understand why.AI can solve real business problems and address very complex situations. Organizations and business leaders should start with the idea of how AI can help by identifying a business problem or use case that they c...

Read more
Celebrating our community and wins!
by Team | January 11, 2019 Community , Machine Learning , Makers

The last year was an amazing year at We organized two H2O World’s, gathering thousands of attendees in person and online both in New York and London. Throughout the year, we garnered multiple industry awards and honors for AI and machine learning, but our customers received awards as well for the work they are doing with our techn...

Read more
Finding Clarity in the Automated Modeling Space
by Thomas Ott | December 12, 2018 H2O Driverless AI

There is an arms race happening in Data Science and Machine Learning space. It’s the race toward automation. Granted, the questions we as Data Scientists are asked to solve for will never be automated, but many of the routine tasks will be. What are these routine tasks? They range from data ingestion to feature generation. Then we have l...

Read more
For Today’s BI Analyst - Accelerating your AI/ML efforts with Driverless AI
by Karthik Guruswamy | December 10, 2018 Data Science , H2O Driverless AI

Whether you are starting out as a novice data scientist or a veteran in AI and Machine Learning, modern tools can guide you in creating some of the best models from your data. Not to mention, ease of moving models to production.Also don’t forget the experienced BI Analysts in your organization, who wants to play with data science , only t...

Read more
The Making of H2O Driverless AI - Automatic Machine Learning
by Arno Candel | December 05, 2018 Community , H2O Driverless AI , H2O World , H2O4GPU , Makers

It is my pleasure to share with you some never before exposed nuggets and insights from the making of H2O Driverless AI, our latest automatic machine learning product on our mission to democratize AI. This has been truly a team effort, and I couldn’t be more proud of our brilliant makers who continue to relentlessly create and innovate. T...

Read more
Gratitude and thank you, makers!
by Sri Ambati | November 21, 2018 Community , Makers

Makers,Happy Thanksgiving – Hope you get to spend time with your loved ones this week.Thank them on our behalf, on your own, thank our neighbors, thank our teachers, thank our firemen, doctors, our farmers, our uber/lyft drivers, our engineers, our assistants, painters, news writers, bartenders, our chefs and a million others who play the...

Read more
New features in H2O 3.22
by Erin LeDell, Michal Kurka | November 12, 2018 H2O Release

Xia Release (H2O 3.22)There’s a new major release of H2O and it’s packed with new features and fixes! Among the big new features in this release, we introduce Isolation Forest to our portfolio of machine learning algorithms and integrates the XGBoost algorithm into our AutoML framework. The release is named after Zhihong Xia .Isolation ...

Read more
Top 5 things you should know about H2O World London
by Team | November 06, 2018 Community , Events , H2O World

We had a blast at H2O World London last week! With a record number of attendees on-site and through the live stream, it’s clear that our AI and machine learning conference was indeed a huge success and we strongly believe this achievement is a result of dedicated preparation and great love – for and from – our community and makers. So, fi...

Read more
Anomaly Detection with Isolation Forests using H2O
by Martin Barus | November 06, 2018 Data Science , H2O-3

IntroductionAnomaly detection is a common data science problem where the goal is to identify odd or suspicious observations, events, or items in our data that might be indicative of some issues in our data collection process (such as broken sensors, typos in collected forms, etc.) or unexpected events like security breaches, server failu...

Read more
Welcome's Driverless AI Community!
by Team | October 30, 2018 Beginners , Community , H2O Driverless AI , H2O-3

I am very excited to announce the formation of the inaugural community for H2O Driverless AI users. The Driverless AI Community is open for anyone looking to engage with other users as well as experts from’s Driverless AI, Driverless AI is an award-winning automatic machine learning platform that does “AI to do AI” to solve re...

Read more
Launching the Academic Program … OR ... What Made My First Four Weeks at so Special!

We just launched the Academic Program at our sold-out H2O World London. With nearly 1000 people in attendance, we received the first online sign-up forms submitted by professors and students alike. This program will massively democratize AI in academia, increasing the number of AI-skilled graduates – with both technical and busine...

Read more
How This AI Tool Breathes New Life Into Data Science

Ask any data scientist in your workplace. Any Data Science Supervised Learning ML/AI project will go through many steps and iterations before it can be put in production. Starting with the question of “Are we solving for a regression or classification problem?” Data Collection & Curation Are there Outliers? What is the Distribu...

Read more
What does NVIDIA’s Rapids platform mean for the Data Science community?

Today NVIDIA announced the launch of the RAPIDS suite of software libraries to enables GPU acceleration for data science workflows and we’re excited to partner with NVIDIA to bring GPU accelerated open source technology for the machine learning and AI community. “Machine learning is transforming businesses and NVIDIA GPUs are speeding...

Read more
Automatic Feature Engineering for Text Analytics - The Latest Addition to Our Kaggle Grandmasters' Recipes
by Jo-Fai Chow, Sudalai Rajkumar | September 12, 2018 Data Science , GPU , H2O Driverless AI , NLP

According to Kaggle’s ‘The State of Machine Learning and Data Science ’ survey , text data is the second most used data type at work for data scientists. There are a lot of interesting text analytics applications like sentiment prediction, product categorization, document classification and so on. In the latest version (1.3) of our Driver...

Read more
Key Takeaways from the Forrester Notebook Wave
by Vinod Iyengar | September 07, 2018

The Forrester Wave: Notebook-Based Predictive Analytics and Machine Learning Solutions, Q3 2018 is out, and is a Strong Performer! The report looks at machine learning platforms centered on R and Python languages using notebooks like Jupyter and Zeppelin. Vendors are evaluated along three dimensions including market presence, curre...

Read more
H2O for Inexperienced Users
by Team | August 24, 2018 Beginners , Data Science , H2O-3 , Machine Learning

Some background: I am a rising senior in highschool, and the summer of 2018, I interned at With no ML experience beyond Andrew Ng’s Introduction to Machine Learning course on Coursera and a couple of his deep learning courses, I initially found myself slightly overwhelmed by the variety of new algorithms H2O has to offer in both ...

Read more
Interpretability: The missing link between machine learning, healthcare, and the FDA?

Recent advances enable practitioners to break open machine learning’s “black box”.From machine learning algorithms guiding analytical tests in drug manufacture, to predictive models recommending courses of treatment, to sophisticated software that can read images better than doctors, machine learning has promised a new world of healthcar...

Read more
The different flavors of AutoML
by Erin LeDell | August 15, 2018 AutoML , Data Science , H2O Driverless AI , H2O-3

In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software (e.g. H2O , scikit-learn , keras ). Although these tools have made it easy to train and evaluate ma...

Read more
H2O’s AutoML in Spark
by Jakub Hava | July 23, 2018 AutoML , Sparkling Water , Technical , Tutorials

This blog post demonstrates how H2O’s powerful automatic machine learning can be used together with the Spark in Sparkling Water.We show the benefits of Spark & H2O integration, use Spark for data munging tasks and H2O for the modelling phase, where all these steps are wrapped inside a Spark Pipeline. The integration between Spark and...

Read more
H2O-3 on FfDL: Bringing deep learning and machine learning closer together
by Vinod Iyengar | June 25, 2018 Community , Deep Learning , H2O-3 , Technical

This post originally appeared in the IBM Developer blog here. This post is co-authored by Animesh Singh, Nicholas Png, Tommy Li, and Vinod Iyengar. Deep learning frameworks like TensorFlow, PyTorch, Caffe, MXNet, and Chainer have reduced the effort and skills needed to train and use deep learning models. But for AI developers and data ...

Read more
How to Frame Your Business Problem for Automatic Machine Learning

Over the last several years, machine learning has become an integral part of many organizations’ decision-making at various levels. With not enough data scientists to fill the increasing demand for data-driven business processes, has developed a product called Driverless AI that automates several time consuming aspects of a typica...

Read more
Time is Money! Automate Your Time-Series Forecasts with Driverless AI
by Jo-Fai Chow | June 12, 2018 H2O Driverless AI

Time-series forecasting is one of the most common and important tasks in business analytics. There are many real-world applications like sales, weather, stock market, energy demand, just to name a few. We strongly believe that automation can help our users deliver business value in a timely manner. Therefore, once again we translated our ...

Read more and IBM build a Strategic Partnership to bring AI innovation to the market together
by Sri Ambati | June 07, 2018

Excited to announce our strategic partnership with IBM that allows them to resell and take to market H2O Driverless AI to businesses worldwide. This partnership makes AI economical – faster, cheaper and easier to do experiments. H2O Driverless AI and IBM POWER9 GPU Systems are bringing together the best of breed AI innovation. We have b...

Read more
AI in Healthcare - Redefining Patient & Physician Experiences
by Team | May 14, 2018 Community , Data Science , Deep Learning

Register for the Meetup Here Patients, physicians, nurses, health administrators and policymakers are beneficiaries of the rapid transformations in health and life sciences. These transformations are being driven by new discoveries (etiology, therapies, and drugs/implants), market reconfiguration and consolidation, a movement to value-bas...

Read more
From Kaggle Grand Masters’ Recipes to Production Ready in a Few Clicks
by Jo-Fai Chow | May 09, 2018 H2O Driverless AI , Tutorials

Introducing Accelerated Automatic Pipelines in H2O Driverless AIAt H2O, we work really hard to make machine learning fast, accurate, and accessible to everyone. With H2O Driverless AI, users can leverage years of world-class, Kaggle Grand Masters experience and our GPU-accelerated algorithms (H2O4GPU ) to produce top quality predictive ...

Read more
H2O World coming to NYC
by Team | May 08, 2018 Community

Whether you’re just starting out learning how machine learning and can supercharge your business or a veteran looking for more, we want to invite you to join some of greatest minds in the field to learn how AI and can transform your business. Our flagship event, H2O World is back and it’s going to be bigger than ever! We’re ...

Read more
Democratize care with AI — AI to do AI for Healthcare
by Team | April 23, 2018 Customers , Healthcare , Machine Learning

Very excited to have Prashant Natarajan (@natarpr) join us along with Sanjay Joshi on our vision to change the world of healthcare with AI. Health is wealth. And one worth saving the most. They bring invaluable domain knowledge and context to our cause. As one of our customers would like to say, Healthcare should be optimized for health...

Read more
Sparkling Water 2.3.0 is now available!
by Team | April 12, 2018 Sparkling Water

Hi Makers! We are happy to announce that Sparkling Water now fully supports Spark 2.3 and is available from our download page . If you are using an older version of Spark, that’s no problem. Even though we suggest upgrading to the latest version possible, we keep the Sparkling Water releases for Spark 2.2 and 2.1 up-to-date with the lates...

Read more
H2O + Kubeflow/Kubernetes How-To
by Team | March 29, 2018 H2O-3

Today, we are introducing a walkthrough on how to deploy H2O 3 on Kubeflow. Kubeflow is an open source project led by Google that sits on top of the Kubernetes engine. It is designed to alleviate some of the more tedious tasks associated with machine learning. Kubeflow helps orchestrate deployment of apps through the full cycle of devel...

Read more
Makers in Action: Community, Partners and Team Members at #GTC18
by Team | March 28, 2018 Events

NVIDIA’s GPU Technology Conference (GTC) has been incredible! Folks from all over the world are exploring the latest breakthroughs in self-driving cars, smart cities, healthcare, high performance computing, virtual reality, and more, all propelled by the AI movement. If you’re attending GTC and would like to see our solutions in action (r...

Read more
H2O4GPU now available in R
by Team | March 27, 2018 GPU , R

In September, released a new open source software project for GPU machine learning called H2O4GPU . The initial release (blog post here ) included a Python module with a scikit-learn compatible API, which allows it to be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. ...

Read more
Come meet the Makers!
by Team | March 26, 2018 Data Science , Events , H2O Driverless AI , H2O4GPU

NVIDIA’s GPU Technology Conference (GTC) Silicon Valley, March 26-29th is the premier AI and deep learning event, providing you with training, insights, and direct access to the industry’s best and brightest. It’s where you will see the latest breakthroughs in self-driving cars, smart cities, healthcare, high-performance computing, virtu...

Read more
How Driverless AI Prevents Overfitting and Leakage
by Team | March 23, 2018 H2O Driverless AI

By Marios Michailidis , Competitive Data Scientist, In this post, I’ll provide an overview of overfitting, k-fold cross-validation, and leakage. I’ll also explain how Driverless AI avoids overfitting and leakage.An Introduction to OverfittingA common pitfall that causes machine learning models to fail when tested in a real-world e...

Read more
Sparkling Water 2.2.10 is now available!
by Team | March 22, 2018 AutoML , Sparkling Water

Hi Makers! There are several new features in the latest Sparkling Water. The major new addition is that we now publish Sparkling Water documentation as a website which is available here . This link is for Spark 2.2. We have also documented and fixed a few issues with LDAP on Sparkling Water. Exact steps are provided in the documentation...

Read more
Congratulations - H2O is a leader in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms
by Team | February 25, 2018 Community , Customers , Gartner , H2O-3

Congratulations – Thanks to the support of our customer community over the past years, is a leader and one with the most completeness of vision in Gartner Magic Quadrant for Data Science and Machine Learning Platforms. It is an ecosystem we dedicated a good part of this decade to open up and spring. This is testimony to the incr...

Read more
New features in H2O 3.18
by Team | February 22, 2018 AutoML , Ensembles , H2O Release , XGBoost

Wolpert Release (H2O 3.18)There’s a new major release of H2O and it’s packed with new features and fixes! We named this release after David Wolpert , who is famous for inventing Stacking (aka Stacked Ensembles ). Stacking is a central component in H2O AutoML , so we’re very grateful for his contributions to machine learning! He is also fa...

Read more
Developing and Operationalizing Models with Azure
by Team | January 24, 2018 Machine Learning

This post originally appeared here. It was authored by Daisy Deng, Software Engineer, and Abhinav Mithal, Senior Engineering Manager, at Microsoft. The focus on machine learning and artificial intelligence has soared over the past few years, even as fast, scalable and reliable ML and AI solutions are increasingly viewed as being vital to...

Read more
Happy Holidays from
by Team | December 31, 2017 Deep Learning , Machine Learning

Dear Community, Your intelligence, support and love have been the strength behind an incredible year of growth, product innovation, partnerships, investments and customer wins for H2O and AI in 2017. Thank you for answering our rallying call to democratize AI with our maker culture. Our mission to make AI ubiquitous is still fresh as da...

Read more
It’s all Water (or should I say H2O) to me!
by Team | December 24, 2017 H2O World

By Krishna Visvanathan, Co-founder & Partner, Crane Venture Partners In the career of any venture capitalist, one dreads the “oh shit moment” . For those unfamiliar with this most technical of terms – it is that moment of clarity when a VC, in the immediate aftermath of closing one’s latest investment (often at the first post invest...

Read more
H2O4GPU Hands-On Lab (Video) + Updates
by Team | December 23, 2017 GPU , H2O4GPU

Aggregator DBSCAN Kalman Filters K-nearest neighbors Quantiles Sort If you’d like to learn more about H2O4GPU, I invite you to explore these helpful links: H2O4GPU README Open Source License (Apache 2.0) Happy Holidays! Rosalie ...

Read more
Driverless AI - Introduction, Hands-On Lab and Updates
by Team | December 15, 2017 H2O Driverless AI

#H2OWorld was an incredible experience. Thank you to everyone who joined us! There were so many fascinating conversations and interesting presentations. I’d love to invite you to enjoy the presentations by visiting our YouTube channel . Over the next few weeks, we’ll be highlighting many of the talks. Today I’m excited to share two prese...

Read more
New versions of H2O-3 and Sparkling Water available
by Team | December 02, 2017 H2O Release , Sparkling Water

Dear H2O Community, #H2OWorld is on Monday and we can’t wait to see you there! We’ll also be live streaming the event starting at 9:25am PST. Explore the agenda here . Today we’re excited to share that new versions of H2O-3 and Sparkling Water are available. We invite you to download them here: H2O-3.16 – MO...

Read more Raises $40 Million to Democratize Artificial Intelligence for the Enterprise
by Team | November 30, 2017 Data Science , Machine Learning

November 30, 2017 | Data Science, Machine Learning | Raises $40 Million to Democratize Artificial Intelligence for the Enterprise

Read more
Laying a Strong Foundation for Data Science Work
by Team | November 24, 2017 Data Science , IT

By William Merchan, CSO, In the past few years, data science has become the cornerstone of enterprise companies’ efforts to understand how to deliver better customer experiences. Even so, when commissioned Forrester to survey over 200 data-driven businesses last year, only 22% reported they were leverag...

Read more Releases H2O4GPU, the Fastest Collection of GPU Algorithms on the Market, to Expedite Machine Learning in Python
by Team | September 26, 2017 GBM , GLM , GPU , k-Means

H2O4GPU is an open-source collection of GPU solvers created by It builds on the easy-to-use scikit-learn Python API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algor...

Read more
Driverless AI Blog
by Team | July 13, 2017 AutoML , GPU , H2O Driverless AI

In today’s market, there aren’t enough data scientists to satisfy the growing demand for people in the field. With many companies moving towards automating processes across their businesses (everything from HR to Marketing), companies are forced to compete for the best data science talent to meet their needs. A report by McKinsey says th...

Read more
Scalable Automatic Machine Learning: Introducing H2O's AutoML
by Team | June 21, 2017 AutoML , Ensembles , H2O Release , Technical

Prepared by: Erin LeDell, Navdeep Gill & Ray Peck In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts...

Read more
XGBoost in the H2O Machine Learning Platform
by Team | June 20, 2017 XGBoost

The new H2O release brings a shiny new feature – integration of the powerful XGBoost library algorithm into H2O Machine Learning Platform! XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. XGBoost provides parallel tree boosting (also known as GBDT, GBM) that ...

Read more
H2O Platform Extensibility
by Team | June 17, 2017

The latest H2O release,, introduced several new concepts to improve extensibility and modularity of the H2O machine learning platform . This blog post will clarify motivation, explain design decisions we made, and demonstrate the overall approach for this release.MotivationThe H2O Machine Learning platform was designed as a mono...

Read more
Machine Learning on GPUs
by Team | May 08, 2017 GPU

With H2O GPU Edition, seeks to build the fastest artificial intelligence (AI) platform on GPUs. While deep learning has recently taken advantage of the tremendous performance boost provided by GPUs, many machine learning algorithms can benefit from the efficient fine-grained parallelism and high throughput of GPUs. Importantly, G...

Read more
The Race for Intelligence: How AI is Eating Hardware - Towards an AI-defined hardware world
by Team | May 08, 2017 GPU

With the AI arms race reaching a fever pitch, every data-driven company is (or at least should be) evaluating its approach to AI as a means to make their owned datasets as powerful as they can possibly be. In fact, any business that’s not currently thinking about how AI can transform its operations risks falling behind its competitors and...

Read more
H2O announces GPU Open Analytics Initiative with MapD & Continuum
by Team | May 08, 2017 Community , GPU , Technical, Continuum Analytics, and MapD Technologies have announced the formation of the GPU Open Analytics Initiative (GOAI) to create common data frameworks enabling developers and statistical researchers to accelerate data science on GPUs. GOAI will foster the development of a data science ecosystem on GPUs by allowing resident applicat...

Read more
Use on Azure HDInsight
by Team | April 18, 2017 Cloud , Sparkling Water , Technical , Tutorials

This is a repost from this article on MSDN. We’re hosting an upcoming webinar to present you how to use H2O on HDInsight and to answer your questions. Sign up for our upcoming webinar on combining H2O and Azure HDInsight. We recently announced that H2O and Microsoft Azure HDInsight have integrated to provide Data Scientists with a Lead...

Read more
Sparkling Water on the Spark-Notebook
by Team | April 10, 2017 Guest Posts , Sparkling Water , Technical

This is a guest post from our friends at Kensu. In the space of Data Science development in enterprises, two outstanding scalable technologies are Spark and H2O. Spark is a generic distributed computing framework and H2O is a very performant scalable platform for AI. Their complementarity is best exploited with the use of Sparkling Wat...

Read more
Stacked Ensembles and Word2Vec now available in H2O!

Prepared by: Erin LeDell and Navdeep Gill MathJax.Hub.Config({ tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]} }); Stacked Ensembles ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = my_models) Python:ensemble = H2OStackedEnsembleEstimator(base_models=my_models) ensemble.train(x=x, y=y, training...

Read more
Artificial Intelligence Is Already Deep Inside Your Wallet – Here’s How
by Team | January 12, 2017 Financial Services , Fraud Detection

Artificial intelligence (AI) is the key for financial service companies and banks to stay ahead of the ever-shifting digital landscape, especially given competition from Google , Apple , Facebook , Amazon and others moving strategically into fintech. AI startups are building data products that not only automate the ingestion of vast amou...

Read more
Football Flowers
by Team | January 10, 2017

function resizeIframe() { document.getElementById('cheese').style.height = document.getElementById('cheese').contentWindow.document.body.scrollHeight + 'px'; setInterval(resizeIframe, 1000); } ...

Read more
Start Off 2017 with Our Stanford Advisors
by Team | January 09, 2017 Community , Technical

We were very excited to meet with our advisors (Prof. Stephen Boyd, Prof. Rob Tibshirani and Prof. Trevor Hastie) at H2O.AI on Jan 6, 2017. Professors Boyd, Tibshirani & Hastie in the house! @h2oai #elementsofstatisticallearning #MachineLearning — (@h2oai) January 6, 2017 Our CEO, Sri Ambati, ma...

Read more
What is new in Sparkling Water 2.0.3 Release?
by Team | January 05, 2017 Community , H2O Release , Sparkling Water

This release has H2O core – Feature:This architectural change allows to connect to existing h2o cluster from sparkling water. This has a benefit that we are no longer affected by Spark killing it’s executors thus we should have more stable solution in environment with lots of h2o/spark node. We are working on article on ...

Read more
Behind the scenes of CRAN
by Team | December 28, 2016 R , R-Bloggers

(Just from my point of view as a package maintainer.) New users of R might not appreciate the full benefit of CRAN and new package maintainers may not appreciate the importance of keeping their packages updated and free of warnings and errors. This is something I only came to realize myself in the last few years so I thought I would write...

Read more
What is new in H2O latest release (Tutte) ?
by Team | December 23, 2016 Community , H2O Release

Today we released H2O version (Tutte). It’s available on our Downloads page, and release notes can be found here . Photo Credit: Top enhancements in this release: GLM MOJO Support: GLM now supports our smaller, faster, more efficient MOJO (Model ObJect, Optimized) format for model pu...

Read more
Using Sentiment Analysis to Measure Election Surprise
by Team | December 01, 2016 Data Journalism

Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset. One recent event that ...

Read more
Indexing 1 Billion Time Series with H2O and ISax
by Team | November 11, 2016 Solutions , Technical , Tutorials

At H2O, we have recently debuted a new feature called ISax that works on time series data in an H2O Dataframe. ISax stands for Indexable Symbolic Aggregate ApproXimation, which means it can represent complex time series patterns using a symbolic notation and thereby reducing the dimensionality of your data. From there you can run H2O’s ML...

Read more
Why We Bought A Happy Diwali Billboard
by Team | October 21, 2016

It’s been a dark year in many ways, so we wanted to lighten things up and celebrate Diwali — the festival of lights! Diwali is a holiday that celebrates joy, hope, knowledge and all that is full of light — the perfect antidote for some of the more negative developments coming out of the Silicon Valley recently. Throw in a polarizing pre...

Read more
Creating a Binary Classifier to Sort Trump vs. Clinton Tweets Using NLP
by Team | October 17, 2016 Community , Data Journalism , Flow , Python

The problem : Can we determine if a tweet came from the Donald Trump Twitter account (@realDonaldTrump) or the Hillary Clinton Twitter account (@HillaryClinton) using text analysis and Natural Language Processing (NLP) alone? The Solution : Yes! We’ll divide this tutorial into three parts, the first on how to gather the necessary data, t...

Read more
sparklyr: R interface for Apache Spark
by Team | October 07, 2016 Community , R , Sparkling Water

This post is reposted from Rstudio’s announcement on sparklyr – Rstudio’s extension for Spark Connect to Spark from R. The sparklyr package provides a complete dplyr backend. Filter and aggregate Spark datasets then bring them into R for analysis and visualization. Use Spark’s distributed machine learning library from R. Create...

Read more
When is the Best Time to Look for Apartments on Craigslist?
by Team | October 06, 2016 Data Journalism

A while ago I was looking for an apartment in San Francisco. There are a lot of problems with finding housing in San Francisco, mostly stemming from the fierce competition. I was checking Craigslist every single day. It still took me (and my girlfriend) a few months to find a place — and we had to sublet for three weeks in between. Thankf...

Read more
by Team | September 23, 2016 Community

———- Forwarded message ——— From: SriSatish Ambati Date: Thu, Sep 15, 2016 at 10:17 PM Subject: changes and all hands tomorrow. To: team Team, Our focus has changed towards larger fewer deals & deeper engagements with handful of finance and insurance customers. We took a hard look at our marketing spend, pr programs and personnel. We l...

Read more
Distracted Driving
by Team | September 16, 2016 Data Journalism

Last week, we started to examine the 7.2% increase in traffic fatalities from 2014 to 2015, the reversal of a near decade-long downward trend. We then broke out the data by various accident classifications , such as “speeding” or “driving with a positive BAC,” and identified those classifications that had the greatest increase. One label...

Read more
Introducing H2O Community & Support Portals
by Team | September 09, 2016 Community , Customers

At H2O, we enjoy serving our customers and the community, and we take pride in making them successful while using H2O products. Today, we are very excited to announce two great platforms for our customers and for the community to better communicate with H2O. Let’s start with our community first: The success of every open source project ...

Read more
Fatal Traffic Accidents Rise in 2015
by Team | September 07, 2016 Data Journalism

On Tuesday, August 30th, the National Highway Traffic Safety Administration released their annual dataset of traffic fatalities asking interested parties to use the dataset to identify the causes of an increase of 7.2% in fatalities from 2014 to 2015. As part of ‘s vision of using artificial intelligence for the betterment of soci...

Read more
IoT - Take Charge of Your Business and IT Insights Starting at the Edge
by Team | August 22, 2016 IoT , Solutions

Instead of just being hype, the Internet of Things (IoT) is now becoming a reality. Gartner forecasts that 6.4 billion connected devices will be in use worldwide, and 5.5 million new devices will get connected every day, in 2016. These devices range from wearables, to sensors in vehicles the can detect surrounding obstacles, to sensors in...

Read more
Hyperparameter Optimization in H2O: Grid Search, Random Search and the Future
by Team | June 16, 2016 R-Bloggers , Technical , Tutorials

“Good, better, best. Never let it rest. ‘Til your good is better and your better is best.” – St. Jerome tl;drH2O now has random hyperparameter search with time- and metric-based early stopping. Bergstra and Bengio[1] write on p. 281: Compared with neural networks configured by a pure grid search, we find that random search over the s...

Read more
H2O GBM Tuning Tutorial for R
by Team | June 16, 2016

  In this tutorial, we show how to build a well-tuned H2O GBM model for a supervised classification task. We specifically don’t focus on feature engineering and use a small dataset to allow you to reproduce these results in a few minutes on a laptop. This script can be directly transferred to datasets that are hundreds of GBs large and H...

Read more
Spam Detection with Sparkling Water and Spark Machine Learning Pipelines
by Team | June 15, 2016 Sparkling Water , Technical , Tutorials

This short post presents the “ham or spam” demo, which has already been posted earlier by Michal Malohlava , using our new API in latest Sparkling Water for Spark 1.6 and earlier versions, unifying Spark and H2O Machine Learning pipelines. It shows how to create a simple Spark Machine Learning pipeline and a model based on the fitted pipe...

Read more
Interview with Carolyn Phillips, Sr. Data Scientist, Neurensic
by Team | May 27, 2016 Community , Customers , Events

During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the second of a multipart series recapping our conversations. Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai. How did you become a data scientist? Phillips: Until ...

Read more
Interview with Svetlana Kharlamova, ­Sr. Data Scientist, Grainger
by Team | May 25, 2016 Community , Customers , Events

During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the first of a multipart series recapping our conversations. Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai. How did you become a data scientist? Kharlamova: I’m a...

Read more
H2O Day at Capital One
by Team | May 11, 2016 Community , Customers , Events

Here at one of our most important partners is Capital One, and we’re proud to have been working with them for over a year. One of the world’s leading financial services providers, Capital One has a strong reputation for being an extremely data and technology-focused organization. That’s why when the Capital One team invited us to t...

Read more
Red herring bites
by Team | May 06, 2016 Data Munging , R-Bloggers , Technical

At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them or not; and ii) show how it scales with...

Read more
Fast csv writing for R
by Team | April 24, 2016 Data Munging , R , R-Bloggers , Technical

R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2...

Read more
Apache Spark and H2O on AWS
by Team | April 20, 2016 Community , Guest Posts

This is a guest post re-published with permission from our friends at Datapipe. The original lives here. One of the advantages of public cloud is the ability to experiment and run various workloads without the need to commit to purchasing hardware. However, to meet your data processing needs, a well-defined mapping between your objecti...

Read more
Connecting to Spark & Sparkling Water from R & Rstudio
by Team | March 24, 2016

Sparkling Water offers the best of breed machine learning for Spark users. Sparkling Water brings all of H2O’s advanced algorithms and capabilities to Spark. This means that you can continue to use H2O from Rstudio or any other ide of your choice. This post will walk you through the steps to get running on plain R or R studio from Spark. ...

Read more
Drink in the Data with H2O at Strata SJ 2016
by Team | March 21, 2016 Community , Demos , Events

It’s about to rain data in San Jose when Strata + Hadoop World comes to town March 29 – March 31st. H2O has a waterfall of action happening at the show. Here’s a rundown of what’s on tap. Keep it handy so you have less chance of FOMO (fear of missing out). Hang out with H2O at Booth #1225 to learn more about how machine learning can hel...

Read more
Road Ahead and BTUs
by Team | March 03, 2016 – Road Ahead – keynote presentation by Sri Ambati from Sri Ambati ...

Read more
Thank you, Cliff
by Team | February 24, 2016

Cliff resigned from the Company last week – He is parting on good terms and supports our success in future. Cliff and I worked closely since 2004 so this is a loss for me. It ends an era of prolific work supporting my vision as a partner. Let’s take this opportunity to congratulate Cliff on his work, in helping me build something from not...

Read more
The Top 10 Most Watched Videos From H2O World 2015
by Team | January 08, 2016 Community , Customers , Events , H2O World

Now that we’re a few months out from H2O World we wanted to share with you all what the most popular talks were by online viewership. The talks covered a variety of topics from introductions, to in-depth examinations of use cases, to wide-ranging panels. Introduction to Data Science Featuring Erin LeDell, Statistician and Machine Learnin...

Read more
Compressing Zip Codes with Generalized Low Rank Models
by Team | December 07, 2015 GLRM , R

This tutorial introduces the Generalized Low Rank Model (GLRM) [1 ], a new machine learning approach for reconstructing missing values and identifying important features in heterogeneous data. It demonstrates how to build a GLRM in H2O that condenses categorical information into a numeric representation, which can then be used in other mo...

Read more
Databricks and H2O Make it Rain with Sparkling Water
by Team | December 01, 2015 Demos , Sparkling Water

**This blog post was first posted on the Databricks blog hereDatabricks provides a cloud-based integrated workspace on top of Apache Spark for developers and data scientists. has been an early adopter of Apache Spark and has developed Sparkling Water to seamlessly integrate’s machine learning library on top of Spark. In thi...

Read more
H2O World from an Attendee's Perspective
by Team | November 18, 2015 Community , Events , Guest Posts , H2O World

Data Science is like Rome, and all roads lead to Rome. H2O WORLD is the crossroad, pulling in a confluence of math, statistics, science and computer science and incorporating all avenues of business. From the academic, research oriented models to the business and computer science analytics implementations of those ideas, H2O WORLD inform...

Read more at ODSC SF 2015!
by Team | November 16, 2015 Events

As promised, we’re here reporting from the floor of the ( Open Data Science Conference (ODSC). It’s been another wild day for us, with an early start at 7:30am to set up ahead of the show. However, the long days are all worth it for a chance to see you all in the field. While we thought bringing two boxes of booklets woul...

Read more
H2O at ML Conf SF 2015
by Team | November 13, 2015 Community , Events

H2O is ubiquitous, and just like H2O, our team is everywhere! Today we attended the ( 2015 Machine Learning Conference in San Francisco. Located at the gorgeous Julia Morgan Ballroom the ML Conference brought together some of the world’s foremost experts on machine learning, including the tireless Xavier Amatriain, VP of...

Read more
H2O World Third Day Wrap-Up
by Team | November 12, 2015 Events , H2O World

H2O fans, we know that distance and the twin holidays of Veteran’s Day and Diwali kept many of you from attending the grand finale of H2O World, but we want to at least give you a taste of all that went on at the Computer History Museum in Mountain View. Day 3 of H2O World got off to a strong start with a massive panel on creating a cultu...

Read more
H2O World Second Day Wrap-Up
by Team | November 11, 2015 Events , H2O World

H2O fans, we didn’t think that our second day could top our first, but somehow it did! Still, although we had record attendance, we know that a lot of you aren’t here. While we can’t hope to get across all that’s happened, we do want to share some of the highlights. The morning started off with CEO Sri Ambati welcoming attendees and givin...

Read more
H2O World First Day Wrap-Up
by Team | November 10, 2015 Events , H2O World

H2O fans, we wish that all of you were here, but we also know that our community is spread across the globe and not all of you could make it to H2O World. However, those of you not able to attend the conference are just as much a part of our community as those that are. While we can’t hope to convey the energy and excitement of H2O World,...

Read more
Pre-H2O World, Part 2
by Team | November 09, 2015 Community , Events , H2O World

H2O fans, we have a day of data delights in store you for you tomorrow! The first day of H2O World is totally devoted to demos and walkthroughs designed to help YOU get the most out of your data. In fact, we have so many sessions planned that unless you have Hermione’s Time Turner, you won’t be able to attend them all. So choose wisely! A...

Read more
A Newbie's Guide to H2O in Python - Guest Post
by Team | November 09, 2015 Community , Guest Posts , Python

This blog was originally posted hereI created this guide to help fellow newbies get their feet wet with H2O, an open-source predictive analytics platform that is fast, powerful, and easy to use. Using a combination of extraordinary math and high-performance parallel processing, H2O allows you to quickly create models for big data. The st...

Read more
Pre-H2O World, Part 1
by Team | November 08, 2015 Community , Customers , Events , H2O World

H2O fans, the team is burning the midnight oil to get H2O World ready for you all. With an audience size twice that of last year’s event we’re going to pack the house at the Computer History Museum! This year’s event will feature 70+ speakers spread out over 41 talks, 22 training sessions and eight panels during the course of the m...

Read more
How to Build a Machine Learning App Using Sparkling Water and Apache Spark
by Team | October 03, 2015

The Sparkling Water project is nearing its one-year anniversary, which means Michal Malohlava, our main contributor, has been very busy for the better part of this past year. The Sparkling Water project combines H2O machine-learning algorithms with the execution power of Apache Spark. This means that the project is heavily dependent on tw...

Read more
How I used H2O to crunch through a bank's customer data
by Team | September 20, 2015

This entry was originally posted here Six months back I gingerly started exploring a few data science courses. After having successfully completed some of the courses I was restless. I wanted to try my data hacking skills on some real data (read kaggle). I find competing in hackathons, helps you to benchmark yourself against your fellow ...

Read more
Fast, Scalable Machine Learning- Now with New and Improved Python API
by Team | September 04, 2015

H2O now has a new Python API, based on valuable feedback provided by our community. Newest features include: – pandas-like dataframes, but for large, distributed computing – scikit learn integration – machine learning pipeline API Check out the tutorial below: ...

Read more
An Introduction to Data Science: Meetup Summary Guest Post by Zen Kishimoto
by Team | August 28, 2015

Originally posted on Tek-Tips forums by Zen here I went to two meetups at H2O , which provides an open source predictive analytics platform. The second meetup was full of participants because its theme was an introduction to data science. Data science is a new buzzword, and I feel like everyone claims to be a data scientist or somethin...

Read more
The Definitive Performance Tuning Guide for H2O Deep Learning (Ported scripts to H2O-3, results are taken from February's blog)
by Team | August 28, 2015

  Introduction This document gives guidelines for performance tuning of H2O Deep Learning, both in terms of speed and accuracy. It is intended for existing users of H2O Deep Learning (which is easy to change if you’re not), as it assumes some familiarity with the parameters and use cases. Motivation This effort was in part motivated b...

Read more
KMeans Diagnostics with H2O Cluster Models
by Team | August 05, 2015


Read more
Lending Club : Predict Bad Loans to Minimize Loss to Defaulted Accounts
by Team | August 03, 2015

As a sales engineer on the team I get asked a lot about the value add of H2O. How do you put a price tag on something that is open source? This typically revolves around the use cases; if a use case pertains to improving user experience or making apps that can improve internal operations then there’s no straightforward way of monet...

Read more
Introduction to Data Science using H2O - Chicago
by Team | August 03, 2015

Thank you to Chicago for the great meetup on 29 July 2015. Slides have been posted on GitHub . The links to the sample scripts and data is contained in the slides. If you have any further questions about H2O, please join our GoogleGroup or chat with us on Gitter . The slides are also available on the H2O Slideshare : Also, thank you t...

Read more
useR! Aalborg 2015 conference
by Team | July 16, 2015

The H2O team spent most of the useR! Aalborg 2015 conference at the booth giving demos and discussing H2O. Amy had a 16 node EC2 cluster running with 8 cores per node, making a total of 128 CPUs. The demo consisted of loading large files in parallel and then running our distributed machine learning algos in parallel. At an R conference, m...

Read more
KFold Cross Validation With H2O-3 and R
by Team | July 09, 2015

This blog is also explains the solution to a Google Stream question we received Note: KFold Cross Validation will be added to H2O-3 as an argument soonThis is a terse guide to building KFold cross-validated models with H2O using the R interface. There’s not very much R code needed to get up and running, but it’s by no means the one-magic-...

Read more
'Ask Craig'- Determining Craigslist Job Categories with Sparkling Water, Part 2
by Team | July 02, 2015

This is the second blog in a two blog series. The first blog is on turning these models into a Spark streaming applicationThe presentation on this application can be downloaded and viewed at SlideshareIn the last blog post we learned how to build a set of H2O and Spark models to predict categories for jobs posted on Craigslist using Spar...

Read more
Sparkling Water Tutorials Updated
by Team | July 01, 2015

This is updated version of Sparkling Water tutorials originally published by Amy Wang here For the newest examples, and updates, please visit Sparkling Water GitHub page The blog post introduces 3 tutorials: Running Sparkling Water Locally Running Sparkling Water on Standalone Spark Cluster Running H2O Commands from Spark Shell ...

Read more
'Ask Craig'- Determining Craigslist Job Categories with Sparkling Water
by Team | June 15, 2015

This is the first blog in a two blog series. The second blog is on turning these models into a Spark streaming applicationThe presentation on this application can be downloaded and viewed at SlideshareOne question we often get asked at Meetups or conferences is: “How are you guys different than other open-source machine-learning toolkits?...

Read more
Scaling R with H2O
by Team | June 10, 2015

In the advent of H2O 3.0 it seems appropriately timed to reintroduce the R API for H2O to help users better understand the differences between R dataframes and H2OFrames. Typically some of the first questions we get include: Does H2O support all R packages and functions? Is H2OFrame an extension of data.frame? Are H2O supported algo...

Read more
Using H2O for Kaggle: Guest Post by Gaston Besanson and Tim Kreienkamp
by Team | May 05, 2015

This post also appears on the GSE Data Science BlogIn this special H2O guest blog post, Gaston Besanson and Tim Kreienkamp talk about their experience using H2O for competitive data science . They are both students in the new Master of Data Science Program at the Barcelona Graduate School of Economics and used H2O in an in-class Kaggle...

Read more
PyData Dallas 2015
by Team | May 04, 2015

H2O was in attendance last week at PyData in Dallas, Texas. Our CTO, Cliff Click, spoke at PyData about driving H2O from Python to perform feature-engineering, group by, quantiles, and model building with H2O’s GBM, GLM, and Distributed Random Forest . We met a lot of great people and we are really excited to see the enthusiasm for H2O w...

Read more
Deep Learning for Public Safety
by Team | April 22, 2015

This article first appeared on KDnuggetsContributors: Alex Tellez, Michal Malohlava, Prithvi Prabhu, Hank Roark, Amy Wang.Download full report We’ve seen some incredible applications of Deep Learning with respect to image recognition and machine translation but this particular use case has to do with public safety; in particular, how De...

Read more
by Team | April 11, 2015

———- Forwarded message ———- From: SriSatish Ambati Date: Sun, Jun 1, 2014 at 12:29 PM Subject: Re: jirassic hierarchy. To: Kevin Cc: Tom Kraljevic, engr, team The best cultures are ones where it feels like there isn’t any. Not saying scrum won’t fit,...

Read more
Sparkling Water Certified by Cloudera
by Team | March 03, 2015

Last month before the team publicly announced Sparkling Water at Strata San Jose we made sure that the product was backed and certified by some major partners. This includes approval from databricks itself as well as Cloudera . Integration Testing for ClouderaFor Cloudera, testing was mainly geared toward deployment and sustaina...

Read more
The Definitive Performance Tuning Guide for H2O Deep Learning
by Team | February 27, 2015

This document gives guidelines for performance tuning of H2O Deep Learning, both in terms of speed and accuracy. It is intended for existing users of H2O Deep Learning (which is easy to change if you’re not), as it assumes some familiarity with the parameters and use cases. Motivation This effort was in part motivated by a Deep Learn...

Read more
Strata San Jose 2015
by Team | February 25, 2015

I had a great time at Strata SJ 2015! I had a lot of fun answering questions and talking to enthusiastic and curious H2O users at our booth. It was great seeing how many people are involved in the H2O community and I also really enjoyed drinking free margaritas at the booth crawl. The H2O team met some really great people with lots of dif...

Read more
How does Java Both Optimize Hot Loops and Allow Debugging
by Team | February 22, 2015

This blog came about because an old friend is trying to figure out how Java can do aggressive loop and inlining optimizations, while allowing the loading of new code and setting breakpoints… in that optimized code. On 2/21/2015 11:04 AM, IG wrote: Safepoint. I’m still confused because I don’t understand what is the required state at a s...

Read more
Introducing first-Fridays Hackathon with H2O
by Team | February 11, 2015

Greetings fellow ML/AI enthusiasts! This blog post serves two purposes: 1) Introduction of our First Fridays initiative 2) Recap our first 12-hour Hackathon!WHAT:The first Friday of each month, will hold a Hack-A-Thon from 1pm – 10pm (yep, you read correctly!) whereby we invite ANYONE to come hack through a data problem with the H2...

Read more
Launching H2O with Docker
by Team | January 09, 2015

Hello world, again. H2O is already relatively easy to launch, all the user needs is a compatiable Java version but now that level of difficulty is reduce to nil. Jeff, our DevOps engineer, presented me with a Docker container for H2O making shipping H2O possible regardless of your environment setup. You can now launch H2O in an isolated e...

Read more
H2O vs R - Winning KDDCup98 in 10 minutes with H2O
by Team | December 17, 2014

H2O is a scalable and open-source math and machine learning platform for big data. It can handle much bigger datasets and run a lot faster than R/SAS even on a single machine. How does the modeling experience with H2O differ from the experience using traditional tools such as R/SAS? This blog answers exactly this question. In particular, ...

Read more
H2O WORLD 2014 Machine Learning IS Fun.
by Team | December 03, 2014

Earlier this year I found myself sitting among 100 or so data scientists at a meetup , eating a taco and listening to how a former particle physicist found the Higgs Boson particle over a weekend using commodity hardware and open source software . Even more impressive was his ability to answer the unrelenting questions from the audience ...

Read more
What if the S language had been copyrighted?
by Team | December 01, 2014

At H2O World 2014, we were fortunate to have Josh Bloch give a reprise of his A Brief, Opinionated History of the API talk that he first delivered at SPLASH 2014 . (For those with the time, you can watch a 47 minute 21 second recording of this talk on the YouTube channel.) This is one of those subjects that I wish I could say m...

Read more
Key Takeaways from the World's Top Kagglers
by Team | November 25, 2014

Ever wondered why data science is so competitive? After a highly successful H2O World event last week, we’re shining some light on what we’ve learned from some of the world’s best data scientists and how they go about winning these data science challenges such as Kaggle . In case you missed it, we held a Competitive Data Science Panel ...

Read more
Predictive Modeling at Scale: Cisco Modernizes Predictive Model Production with H2O (joint work with Lou Carvalheira)
by Team | November 21, 2014

Cisco’s ChallengesCisco is the global leader in networking. It is a company that has long embraced the power of predictive analytics. On a regular quarter, Cisco’s Strategic Marketing Organization builds and deploys around 60,000 predictive models to treat each of 160M+ companies it maintains in its database. These models generate predict...

Read more
Introducing Flow!
by Team | November 19, 2014

After several weeks of active development, we’re proud to unveil H2O Flow, our brand new, open-source user interface for H2O! We used it live during our H2O World keynote today, and this blog post is a brief introduction to some of the core ideas behind H2O Flow.H2O Flow is a web-based interactive computational environment where you can ...

Read more
Competitive Data Science, Kaggle, Kdd and other Sports
by Team | November 16, 2014

Panelists:This panel promises to be just brilliant and full of sparks!Guocong Song Jose Guerrero Mark Landry Arno Candel

Read more
Hacking Algorithms in H2O With Cliff
by Team | November 16, 2014

Interested in Hacking Algorithms with me? I’ll be at H2 O World all day Tuesday looking to join you in doing some fun hacking. Here are 3 sample starter hacks to help you get over the H2O learning curve – Hacking KMeans Hacking Quantiles Hacking Grep All 3 take you step-by-step through the process of building a new algorithm into H2O’...

Read more
Hacking Algorithms into H2O: Grep
by Team | November 11, 2014

This is a presentation of hacking a simple algorithm into the new dev-friendly branch of H2O, h2o-dev. This is one of three “Hacking Algorithms into H2O” blogs. All of these blogs start out the same: getting the h2o-dev code and building it. They are the same until the section titled Building Our Algorithm: Copying from the Example, and ...

Read more
Hacking Algorithms into H2O: Quantiles
by Team | November 10, 2014

This is a presentation of hacking a simple algorithm into the new dev-friendlybranch of H2O, H2O 3.0. This is one of three “Hacking Algorithms into H2O” blogs. All three blogsstart out the same: getting the h2o-3 code and building it. They are the same until the section titled Building Our Algorithm: Copying from theExample, and then ...

Read more
Hacking Algorithms into H2O: KMeans
by Team | November 08, 2014

This is a presentation of hacking a simple algorithm into the new dev-friendlybranch of H2O, h2o-dev. This is one of three “Hacking Algorithms into H2O” blogs. All blogsstart out the same – getting the h2o-dev code and building it. They are thesame until the section titled Building Our Algorithm: Copying from theExample, and then the ...

Read more
Sparkling Water on YARN Example
by Team | November 01, 2014

Follow these easy steps to get your first Sparkling Water example to run on a YARN cluster. This example uses Hortonworks HDP 2.1. 1. Assumptions Installed: Java 1.7+ YARN cluster Note: In the current version of Sparkling Water running on YARN, the cluster formation requires multicast to work for the H2O nodes to find each oth...

Read more
Running Your First Droplet on H2O
by Team | October 28, 2014

A number of us were at Strata in New York City this October, and one of the major benefits of these events is getting lots of in-person time with people who use your product.Michal and Amy spent some time with a developer who was trying to build on top of the h2o-dev repo, and we realized that we didn’t have a really basic example yet of ...

Read more
Sparkling Water Tutorials
by Team | September 29, 2014

Please follow the updated version of tutorials here H2O is hosting a meetup tomorrow at our officewhere attendees are encourage to hack away with us as we run Deep Learning on Sparkling Water. If you haven’t already read allabout H2 O’s integration into Spark then get started withHow Sparkling Water Brings H2O to Spark and Sparkling W...

Read more
How to use R, H2O, and Domino for a Kaggle competition
by Team | September 23, 2014

Guest post by Jo-Fai Chow The sample project (code and data) described below is available on Domino. If you’re in a hurry, feel free to skip to: Tutorial 1: Using Domino Tutorial 2: Using H2O to Predict Soil Properties Tutorial 3: Scaling up your analysis IntroductionThis blog post is the sequel to TTTAR1 a.k.a. An Introduction t...

Read more
How Sparkling Water Brings H2O to Spark
by Team | September 22, 2014

This post provides a high-level introduction to the current integration plan between H2 O and Spark. This is an ongoing engineering effort involving collaboration between the open source teams, and describes what is currently underway.1. Overall ApproachThe first question one might ask is “Why”? What does one, as a user, gain from trying ...

Read more
Sparkling Water!
by Team | September 05, 2014

H2O & Scala & SparkSpark is an up and coming new big data technology; it’s a whole lot faster andeasier than existing Hadoop-based solutions. H2 O does state-of-the-art MachineLearning algorithms over Big Data – and does them Fast. We are happy toannounce that H2 O now has a basic integration with Spark – Sparkling Water! This is...

Read more
Introducing H2O Lagrange ( to R
by Team | August 26, 2014

From my perspective the most important event that happened atuseR! 2014 was that I got to meetthe 0xdata team and now, long story short,here I am introducing the latest version of H2 O, labeledLagrange ( ,to the R and greater data science communities. Beforejoining 0xdata, I was working at a competitor on a rival project and w...

Read more
useR! 2014
by Team | July 15, 2014

Two weeks ago we attended the useR! conference hosted on the UCLA campus. I landed in Los Angeles at 8:30 P.M on Sunday June 29, and met up with Amy — another math hacker at 0xdata. After a harrowing cab ride we arrived on the UCLA campus at Sunset Village where we would be lodging for the next 3 evenings. Having just got the h2o R packag...

Read more
Learn to manage, munge, and model big data with H2O on the Hortonworks Sandbox
by Team | June 26, 2014

Working with big data might seem like a daunting task if like me, you’ve spent the majority of your college years doing pencil and paper proofs. Big data for me was anything that took longer than 30 minutes to ingest into single threaded R. For mathematicians and statisticians looking to understand widely used data platforms like Hadoop f...

Read more
H2O - The Killer-App on Spark
by Team | June 25, 2014

object AirlinesDemo extends Demo { override def run(conf: DemoConf): Unit = { // Prepare data // Dataset val dataset = “data/allyears2k_headers.csv” // Row parser val rowParser = AirlinesParser // Table name for SQL val tableName = “airlines_table” // Select all flights with destination == SFO val query = “””SELECT * FROM airlin...

Read more
A K/V Store For In-Memory Analytics, Part 2
by Team | May 23, 2014

This is a continuation of a prior blog on the H2O K/V Store, Part 1. A quick review on key bits going into this next blog: H2O supports a very high performance in-memory Distributed K/V store The store honors the full Java Memory Model with exact consistency by default Keys can be cached locally for both reads & writes A typi...

Read more
SJSU Tutorial on H2O and Random Forest
by Team | April 25, 2014

Our friends over at SJSU added this post to their course website after the H2O team stopped by earlier this semester to talk about H2O. We’ve reposted it here, but you can find the original at: Oxdata (H2O) TutorialPosted on April 24, 2014 by bigsjsu Oxdata (H2O) Tutori...

Read more
Tableau: Math Hacker Amy Talks Big Data Visualization TONIGHT
by Team | April 17, 2014

Anqi and I are back from NY, and we brought Amy with us – she's incredible, and she's giving a presentation at our meet up tonight, where she will talk about Big Data, visualization, and presenting interpretable graphics. So we're looking forward to seeing you tonight – the details are here:#meetup_oembed .mu_clearfix:after { visibility...

Read more
MLConf NY - Friday, April 11: Demo of Workflow and Collective Use Case
by Team | April 07, 2014

This Friday H2O will be at MLconf ( to give a live demo, introduce a customer use case, and talk about the implications of model specification in production. If you don’t get a chance to stop by our booth, or come see our demo, you can find the presentation slides on the MLconf website (they will be posted on Friday, Apr...

Read more
Google-scale Machine Learning & Deep Learning gets principal platform in Apache Mahout with Spark and H2O
by Team | March 27, 2014

H2O’s vision is direct and simple: scaling machine learning for powering intelligent applications. Our focus is distributed machine learning and a fully-featured set of industrial grade algorithms. Apache Mahout is where people learn their chops in Machine Learning. Like R, It’s the “hello world” first place many new users get exposed to ...

Read more
Hang out with us tomorrow- Mar 26: H2O Math Hackers Present: Model Specification
by Team | March 26, 2014

Anqi and Irene present a hack along preview of their upcoming talk at MLConf. Come join us as we talk about the implications of model specification, and walk through how to frame models when asking different questions of the same data. #meetup_oembed .mu_clearfix:after { visibility: hidden; display: block; font-size: 0; content: " "; cl...

Read more
Meetup TONIGHT - Arno Presents: Deep Learning: Theory and Practice!
by Team | March 26, 2014

If you were unable to join us on Thursday 3/21 because of the high volume of interest, we are offering the same meeting again!In this talk, Arno Candel, Physicist & Hacker at will breakdown the basics of deep learning in theory & present implementation, early results from using MLP with Adaptive learning as implemented in...

Read more
In-memory Big Data: Spark + H2O
by Team | March 25, 2014

Big Data has moved in-memory. Customers using SQL in their Join & Munging efforts via SHARK and Apache Spark need to use Regressions and Deep Learning. To make their experiences great & seamlessly weave SQL workflows with Data Science and Machine Learning, we are architecting a simple RDD data import-export in H2O. This brings c...

Read more
Data Munging in H2O+R
by Team | March 24, 2014

Over the weekend we fielded a question from one of our users about the basics of data munging in H2O through R – and it was a good question, so I wanted to share the response with a wider audience – namely you guys.There are a few quick things about data munging in H2O+R: – It often looks and feels like you are manipulating data in R; we...

Read more
H2O Architecture
by Team | March 20, 2014

This is a top-level overview of the H2O architecture. H2O does in-memory analytics on clusters with distributed parallelized state-of-the-art Machine Learning algorithms . However, the platform is very generic, and very very fast. We’re building Machine Learning tools with it, because we think they’re cool and interesting, but the plat...

Read more
H2O at Code Mesh - API for in-memory Analytics - Cliff
by Team | February 25, 2014

Video link here:API for in-Memory Analytics – CodeMesh ...

Read more
Hanging out at ShareThis
by Team | February 24, 2014 We spent some time with the engineers and data scientists at ShareThis last week, and had a great time learning about their use cases, and getting H2O running on their data. It's nice to know that the ShareThis team had ...

Read more
Generate A Mandelbrot Set In H2O
by Team | February 15, 2014

Roses are red, Violets are ~ Blue, H2O is sweet, And fractals are too! $$z_n = z_{n-1}^P + c$$ Where c is a “candidate” complex number. (Typically you’ll see $$P = 2$$ — that’s what we’ll do too). We set the the size of the sequence to the number of iterations we want, and measure convergence by looking at the modulus of $$z_n$$ ...

Read more
And you know, we're on each other's team - Lorde
by Team | February 15, 2014

Walking past giant anti-burner consumerist strata booths, i was struck by Lorde's recent masterpiece. The Big Data Palace needs a release. No hype, it needs product. Product is the release. The emperor has no clothes and no one seems to dare. You see the propaganda machine. Working lock-step to Strata / stage setters. Darling startups tha...

Read more
A K/V Store For In-Memory Analytics: Part 1
by Team | February 06, 2014 is building in-memory analytics (no surprise, see . What may be a surprise, though, is that there’s a full-fledged high-performance Key/Value store built into H2O and that is central to both our data management and our control logic. We use the K/V store in two main ways: All the Big Data is stored striped acros...

Read more
I'll let you be in my model, if I can be in yours.
by Team | February 05, 2014

Bob Dylan* said that.User-centric modeling is here to stay. Rich insights are available when we combine, knowledge of the world with knowledge of your customer. Yes, one at a time. However, users tangle in a network of events and overlap & become part of each others models. Sensor data can avoid granularity mismatch by building models...

Read more
Come visit H2O at Strata Booth 919
by Team | February 03, 2014

Greetings H2O friends and fans! Let’s do the data dance at Strata Santa Clara, Feb. 11-13 and check out our latest H2O Prediction Engine demo. We will be exhibiting at booth 919 and offering a 20% discount off registration. The show is slated to sell out, so be sure to register today and get your 20% discount with our code: 0XDATA20 ,...

Read more
Hack data with our resident data scientist, Earl
by Team | January 30, 2014

This last thursday of every month event: Hack data with Earl Hathaway – our resident data scientist.#meetup_oembed .mu_clearfix:after { visibility: hidden; display: block; font-size: 0; content: " "; clear: both; height: 0; }* html #meetup_oembed .mu_clearfix, *:first-child+html #meetup_oembed .mu_clearfix { zoom: 1; }#meetup_oembed { bac...

Read more
Pathology of Data
by Team | December 31, 2013

Stephen Boyd's favorite way of summarizing a dataset at hand: “Understand the pathology of data. Sometimes it's not the pathology.” It's structure: dimensions, factors, outliers and principal components.It's very much what data scientists want from Adhoc Analytics – Scope the data from enough angles and with different tools to get real in...

Read more
All models are wrong, but some models are useful!
by Team | December 26, 2013

George Box said that.There is no best model that works for all of your data. Wolpert reiterates that as the No free lunch theorem. Model predictive performance is domain specific. What works in one data domain has sometimes very little consequence in another one. Predictably, the rise of Domain Science: Data science needs to get closer ...

Read more
Hack data with R + H2O ( aka, the last thursday of the 2013 meetup!)
by Team | December 26, 2013

Come join us and 32 other Data Scientistas to Hack airline dataset with R. This is our small intimate open house setup that we did every last thursday of each month – And this is the season finale! And what a year it has been for H2O! Nidhi will walk you through RStudio – don't forget to bring your tool belt (& a laptop with R install...

Read more
R & Scala for fast in-memory predictions on Hadoop via H2O!
by Team | December 11, 2013

Three of our best and brightest gave a talk last night on H2O, R, Scala and Hadoop (yes -all together and yes highlighting the integration).If you missed the talk last night the slides are linked here, and we're doing an encore next week ( ) Tom Kraljevic presents using H2O on Hadoop – how w...

Read more
Scalala on H2O at Typesafe
by Team | December 11, 2013

Please come catch us, catch up with us, and meet up with us next week, on the 17th. The makers & maintainers of Scala, Typesafe, is hosting us, where Adriaan Moors and the H2O team will be talking about Scala, working with data at scale, and getting the most out of your big data and domain. Meetup's in San Francisco, the details can ...

Read more
R & Scala for fast in-memory predictions on Hadoop via H2O!
by Team | December 05, 2013

Take R and Scala to Big Data using in-memory Algorithms from H2O. In this Triple Header for SF Big Data Science Anqi Fu , our resident R wiz, will present data munging and R adhoc analytics at scale. Be prepared for fireworks with R in RStudio and not a ton of powerpoint. Scala has reached tremendous adoption amongst Machine learning &...

Read more
Machine Learning for Adtech
by Team | November 19, 2013

Characteristics of advertising data: tens of thousands of columns or more (top 100k or 1 m sites) high collinearity factors: eg demographics, with a strong correlation between eg income and education collinearity: sports fans follow nfl + espn + bleacher report + fox sports; users of ravelry also shop etsy. Those features are certa...

Read more
Making films is not too different from startups
by Team | November 19, 2013

Quentin Tarantino, Ang Lee and other great directors discuss making films, creative process, attention to detail and inspiring & directing one's team to do great work. ...

Read more
H2O goes to CodeMesh in London
by Team | November 18, 2013

An API for Distributed Computing We have defined an API and built an open-source platform for dealing with in-memory distributed data. We’ve used it to built state-of-the-art predictive modeling and analytics (e.g. GLMNET, GBM, Random Forest ) that’s 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it’s...

Read more
H2O goes to qconsf
by Team | November 13, 2013

Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework,...

Read more
Distributed Deep Learning with H2O in the Cloud @ Ebay
by Team | November 12, 2013

Cyprien Noel will present hand-picked algorithms that work on H2O at scale and a survey of the space. We will walk users through the a couple of datasets (mnist) and demonstrate the power of Multi-layer Neural Networks at Scale in EC2. Learn more and sign up at ...

Read more
Predictable Rise of Physicists: Domain Science
by Team | November 08, 2013

For years, I secretly suspected that a lot of our math came from Physics . Some of the greatest leaps in math were made closely alongside the greatest discoveries in Physics. Calculus. QED. Turing.The physics of our businesses is grounded in a complex systems understanding of domain. When Data science gets finally freed from time-sapping...

Read more
Pivotal hosts 0xdata - Distributed Random Forest, GBM, GLM & API for Big Data Algos
by Team | November 04, 2013

Distributed Machine Learning has come of age, just in time to meet the challenges of Big Data. We will present an API for extending and rolling your own Algorithms or use powerful contest-winning Gradient Boosting Machine, Generalized Linear Modeling and Random Forest at scale. Demo and Fireworks using big datasets from within ...

Read more
Frontier Big Data Meetup - Scalability & Availability
by Team | November 04, 2013

Come see Sri present on November 5th! 1. Sam Hamilton , Vice President of Data Technology at PayPal 2. SriSatish Ambati , Co-founder & CEO, 0xData 3. Sourav Mazumder, Technology Head of Big Data Practices, Infosys 4. Bruce Templeton, Co-founder & CEO, NephoScale At Room B3 in Mission City Ballroom, Santa Clara Convention Center...

Read more
0xdata and Yelp - Machine Learning for Relevance and Serendipity/Distributed Gradient Boosting
by Team | October 31, 2013

Join us and Yelp for a chat on Machine Learning, and make sure not to miss Sri’s lightning talk on Distributed Gradient Boosting!Main Talk: Machine Learning for Relevance and Serendipity Speaker: Aria Haghighi (Prismatic ) Abstract: Careful use of well-designed machine learning systems can transform products by providing highly perso...

Read more
Our data, our math // our tools, our science!
by Team | October 30, 2013

Big data has always been with us. Our race's answer to data explosion was through math & computation. Whether it was Newton's calculus, Einstein's Relativity or Shannon's Information Theory, each generation's answer to it's big data problem arose from it's best and brightest.Our generation's challenge is here. Our lives are mired in d...

Read more
Building a Distributed GBM on H2O
by Team | October 29, 2013

At 0xdata we build state-of-the-art distributed algorithms – and recently we embarked on building GBM , and algorithm notorious for being impossible to parallelize much less distribute. We built the algorithm shown in Elements of Statistical Learning II , Trevor Hastie, Robert Tibshirani, and Jerome Friedman on page 387 (shown at the bo...

Read more
An API For Distributed Analytics
by Team | October 28, 2013

There are so many APIs to choose from…Features of the space: Lots of data – which I’ll qualify as “bigger than 1 machine” and thus needing parallel i.o, parallel memory, & parallel compute – and distributed algorithms. Ease of programming; hide details (but expose when want to). High level for ease-of-use, but “under the covers” ...

Read more
Strata NYC & Hadoop World: How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O
by Team | October 25, 2013

How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O Srisatish Ambati (0xdata Inc), Cliff Click (0xdata Inc) 5:05pm Tuesday, 10/29/2013 Data Science Beekman Parlor – Sutton North Data Modeling has been constrained through scale; Sampling still rules the day for Adhoc Analytics. Scale brings much needed change t...

Read more
NYC Big Data Meetup - Distributed Random Forest, GBM, GLM & API for Big Data Algos
by Team | October 22, 2013

Distributed Machine Learning has come of age. Just in time to meet the challenges of Big Data, we present an API for extending and rolling your own Algorithms or using powerful contest-winning Gradient Boosting Machine, Generalized Linear Modeling and Random Forest at scale. Demo and Fireworks using big datasets from within the familiar...

Read more
GBM on Ecology - Recreating a model made for R
by Team | October 22, 2013

In the last couple of weeks we’ve had two meetups on GBM (gradient boosted classification and regression ), and hence a lot of excitement about running the algorithm as presented by Cliff, Earl and Dr. Hastie. You can find the hella cool videos of both presentations here: One of my favorite articles on GBM ...

Read more
Join Us Tomorrow at Trulia - Distributed GBM!
by Team | October 16, 2013

Hi hackers! Just a quick reminder we’ll be joining our friends at Trulia tomorrow for a meetup on machine learning discussing Distributed GBM.GBM is one of the most popular machine learning algorithms used in data mining competitions. Most of us use GBM through R implementation. However, we have recently written a distributed version fo...

Read more
H2O & LiblineaR: A tale of L2-LR
by Team | October 10, 2013

tl;dr: H2O and LiblineaR have nearly identical predictive performance. OverviewIn this blog, we examine the single-node implementations of L2-regularized logistic regression (LR) by H2O and LiblineaR . Both LibR and H2O are driven from the R console on the same hardware and evaluated on the same datasets. We compare regression coeffici...

Read more
0xdata + Vendavo = Awesome
by Team | October 03, 2013

For those of you who missed our recent meetup at Vendavo, our data scientist Earl Hathaway, CTO & Architect of Distributed Gradient Boosting, Cliff Click spoke on GBM that was (without exaggeration) totally awesome! Eric, the Algorithms and Data Science guru at Vendavo and their hacker-CEO, Neil Lustig, have been partnering with us d...

Read more
Running a GLM Model in H2O + R (notes from the hands-on meetup Sept. 26)
by Team | September 27, 2013

This is a walk through of running H2O through R. Before you get started you will need three things: R (a recent version), H2O (wich you can get through github: or directly from our website:, and the h2oWrapper R package, which is the tool that makes H2O talk to R, and lets you talk to ...

Read more