Return to page

H2O.ai Blog

Filter By:

103 results Category: Year:
LLM DataStudio - V6.0 Release
by Nishaanthini Gnanavel, Genevieve Richards, Laksika Tharmalingam, Prathushan Inparaj | September 13, 2024 Data Preparation , Generative AI

H2O LLM DataStudio is a no-code application created to streamline data preparation tasks for Large Language Models (LLMs). The tool features three main components: Curate, Prepare, and Custom Eval. Curate - Conversion of documents (PDFs, DOC & audio/video files) into question-answer pairs and summarization pairs Prepare - Prepar...

Read more
H2O LLM DataStudio: V4.1 Release
by Nishaanthini Gnanavel, Genevieve Richards, Tarique Hussain | January 16, 2024 Data Preparation , Generative AI

H2O LLM DataStudio is a comprehensive no-code application designed to simplify data preparation tasks for Large Language Models (LLMs). This tool comprises three key components: Curate, Prepare, and Augment. Curate - Conversion of documents (PDFs, DOC & audio/video files) into question-answer pairs and summarization pairs Prepare ...

Read more
Reducing False Positives in Financial Transactions with AutoML

In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems we deploy to safeguard ourselves are raising too many false alarms, with over 90% of fraud alerts being false positives. These false positives, not only frustrating for consumers but also costly for financial institutions, can eclipse t...

Read more
Winner's Insight: Navigating the Parkinson's Disease Prediction Challenge with AI

Parkinson’s disease, a condition affecting movement, cognition, and sleep, is escalating rapidly. By 2037, it is projected that around 1.6 million U.S. residents will be confronting this disease, resulting in significant societal and economic challenges. Studies have hinted that disruptions in proteins or peptides could be instrumental in...

Read more
Generating LLM Powered Apps using H2O LLM AppStudio – Part1: Sketch2App

sketch2app is an application that let users instantly convert sketches to fully functional AI applications. This blog is Part 1 of the LLM AppStudio Blog Series and introduces sketch2app The H2O.ai team is dedicated to democratizing AI and making it accessible to everyone. One of the focus areas of our team is to simplify the adoption of...

Read more
H2O LLM DataStudio: Streamlining Data Curation and Data Preparation for LLMs related tasks
by Shivam Bansal, Sanjeepan Sivapiran, Nishaanthini Gnanavel | June 14, 2023 Data , Data Preparation , H2O LLM Studio , Large Language Models , NLP , h2oGPT

A no-code application and toolkit to streamline data preparation tasks related to Large Language Models (LLMs) H2O LLM DataStudio is a no-code application designed to streamline data preparation tasks specifically for Large Language Models (LLMs). It offers a comprehensive range of preprocessing and preparation functions such as text cl...

Read more
Navigating the challenges of time series forecasting
by Jon Farland | April 12, 2023 Time Series

Jon Farland is a Senior Data Scientist and Director of Solutions Engineering for North America at H2O.ai. For the last decade, Jon has worked at the intersection of research, technology and energy sectors with a focus on developing large scale and real-time hierarchical forecasting systems. The machine learning models that drive these for...

Read more
10 Consejos para Convertirte en un Científico de Datos Exitoso
by Favio Vazquez | January 19, 2023 AutoML , Beginners , Data Science

La ciencia de datos llegó para quedarse. Los científicos de datos utilizan sus habilidades para ayudar a las empresas a tomar mejores decisiones sobre sus productos, servicios, a optimizar procesos, ahorrar y mejorar rentabilidad. Convertirse en un científico de datos de éxito implica muchos aspectos y el estudio continuo, ya que es un...

Read more
H2O.ai at NeurIPS 2022
by Marcos V. | December 06, 2022 AI4Good , Data Science , Machine Learning

H2O.ai is proud to participate in the 36th Conference on Neural Information Processing Systems (NeurIPS) 2022, one of the biggest and most prestigious international conferences in artificial intelligence. NeurIPS 2022 will be a Hybrid Conference from Monday, November 28th through Friday, December 9th, with an in-person event at the New Or...

Read more
Make with H2O.ai Recap: Validation Scheme Best Practices

Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with H2O.ai session on validation scheme best practices, our second accuracy masterclass. The session covered key concepts, different validation methods, data leaks, practical examples, and validation and ensembling. Key Concepts While the validation topics cove...

Read more
Data Science with H2O.ai: An Introduction to Machine Learning and Predictive Modeling

Our own Jonathan Farland recently recorded a talk about machine learning and predictive modeling. In his talk, Jon also gave an overview of open source H2O and H2O AI Cloud . This video is a great resource for getting up to speed with the latest technology from H2O in half an hour. Some of you may prefer to go through the slides while l...

Read more
Tackling Illegal, Unreported, and Unregulated (IUU) Fishing with AI
by Ryan Chesler, Guanshuo Xu | February 28, 2022 AI4Good , Computer Vision , Deep Learning , H2O AI Cloud , Kaggle , Solutions

According to a report by the High-Level Panel for a Sustainable Ocean Economy, it is estimated that illegal, unreported, and unregulated (IUU) fishing accounts for 20 percent of the seafood and up to 50 percent in some areas. These activities not only affect the marine ecosystem but, in a way, are linked to climate change on the planet a...

Read more
Revisiting the Miracle of Istanbul
by H2O.ai Team | January 25, 2022 Data Journalism , Sports

IntroductionOn May 25th, 2005, the UEFA Champions League final between AC Milan and Liverpool was held at the Atatürk Olympic Stadium in Istanbul. The match is still considered one of the greatest finals in football history. AC Milan took a 3-0 lead in the first half but Liverpool made a miraculous comeback in the second half to tie the g...

Read more
Shapley Values - A Gentle Introduction
by Adam Murphy | January 11, 2022 Data Science , Shapley , Technical

If you can’t explain it to a six-year-old, you don’t understand it yourself. – Albert Einstein One fear caused by machine learning (ML) models is that they are blackboxes that cannot be explained. Some are so complex that no one, not even domain experts, can understand why they make certain decisions. This is of particular concern when s...

Read more
1st Place Winner's Blog - Kaggle 2021 Data Science and Machine Learning Survey
by Shivam Bansal, KunHao Yeh | January 04, 2022 Data Journalism , Data Science , Kaggle

Kaggle, the largest global community of data scientists, conducted the 5th annual industry-wide survey that presented a truly comprehensive view of the state of data science and machine learning. A total of 25,973 responses were collected from participants from over 60 countries. Kaggle also launched the Data Science Survey Challenge in w...

Read more
An Introduction to Time Series Modeling: Traditional Time Series Models and Their Limitations
by Adam Murphy | December 03, 2021 H2O AI Cloud , Time Series

In the first article in this series, we broke down the preprocessing and feature engineering techniques needed to build high-performing time series models. But we didn’t discuss the models themselves. In this article, we will dig into this. As a quick refresher, time series data has time on the x-axis and the value you are measuring (dema...

Read more
Amazon Redshift Integration for H2O.ai Model Scoring
by Eric Gudgion | November 22, 2021 Data Science , H2O AI Cloud

We consistently work with our partners on innovative ways to use models in production here at H2O.ai, and we are excited to demonstrate our AWS Redshift integration for model scoring. Amazon Redshift is a very popular data warehouse on AWS. We wanted to expand on the existing capacities of using data from Redshift to train a model on the ...

Read more
MLB Player Digital Engagement Forecasting
by Jo-Fai Chow | October 29, 2021 Kaggle , Machine Learning

Are you a baseball fan? If so, you may notice that things are heating up right now as the Major League Baseball (MLB ) World Series between Houston Astros and Atlanta Braves tied at 1-1.MLB Postseason 2021 Results as of October 28 (source) This also reminded me of the MLB Player Digital Engagement Forecasting competition in which my coll...

Read more
An Introduction to Time Series Modeling: Time Series Preprocessing and Feature Engineering
by Adam Murphy | October 26, 2021 Time Series

Time is the only nonrenewable resource – Sri Ambati, Founder and CEO, H2O.ai. Prediction is very difficult, especially if it’s about the future – Niels Bohr, Nobel Prize-Winning Physicist. Despite its inherent difficulty, every business needs to make predictions. You may want to forecast sales or estimate demand or gauge future inventory ...

Read more
Time Series Forecasting Best Practices
by Jo-Fai Chow | October 15, 2021 H2O AI Cloud , Technical , Time Series

Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best practices. The talk was well-received so we decided to turn it into a blog post. Below are some of the highlights from his talk. You can also follow the two software demos and try it yourself using our H2O AI Cloud .(Note : The video links with ...

Read more
From the game of Go to Kaggle: The story of a Kaggle Grandmaster from Taiwan
by Parul Pandey | September 13, 2021 Kaggle , Makers

In conversation with Kunhao Yeh: A Data Scientist and Kaggle Grandmaster In these series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand...

Read more
What does it take to win a Kaggle competition? Let's hear it from the winner himself.
by Parul Pandey | June 14, 2021 Data Science , Kaggle , Makers

In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster. In this interview, I shall be ...

Read more
What it takes to become a World No 1 on Kaggle
by Parul Pandey | May 03, 2021 Data Science , Kaggle , Machine Learning , Makers

In conversation with Guanshuo Xu: A Data Scientist, Kaggle Competitions Grandmaster, and a Ph.D. in Electrical Engineering. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. The intention behind these interviews...

Read more
Safer Sailing with AI
by Ana Visneski, Jo-Fai Chow, Kim Montgomery | April 01, 2021 Customers , Data Science , H2O Hydrogen Torch , H2O-3 , Machine Learning Interpretability

In the last week, the world watched as responders tried to free a cargo ship that had gone aground in the Suez Canal. This incident blocked traffic through a waterway that is critical for commerce. While the location was an unusual one, ship collisions, allisions , and groundings are not uncommon. With all the technology that mariners hav...

Read more
H2O AI Cloud: Democratizing AI for Every Person and Every Organization

Harnessing AI’s true potential by enabling every employee, customer, and citizen with sophisticated AI technology and easy-to-use AI applications. Democratization is an essential step in the development of AI, and AutoML technologies lie at the heart of it. AutoML tools have played a pivotal role in transforming the way we consume an...

Read more
Using Python's datatable library seamlessly on Kaggle
by Parul Pandey, Rohan Rao | February 03, 2021 Data Munging , Data Science , Datatable

Managing large datasets on Kaggle without fearing about the out of memory error Datatable is a Python package for manipulating large dataframes. It has been created to provide big data support and enable high performance. This toolkit resembles pandas very closely but is more focused on speed.It supports out-of-memoy datasets, multi-thr...

Read more
Meet the Data Scientist who just cannot stop winning on Kaggle.
by Parul Pandey | January 15, 2021 Kaggle

In conversation with Philipp Singer: A Data Scientist, Kaggle Double Grandmaster, and a Ph.D. in Computer Science. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate an...

Read more
Grandmaster Series: The inspiring journey of the ‘Beluga’ of Kaggle World 🐋
by Parul Pandey | December 14, 2020 Kaggle , Machine Learning

In conversation with Gábor Fodor: A Data Scientist at H2O.ai and a Kaggle Competitions’ Grandmaster. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage othe...

Read more
Automate your Model Documentation using H2O AutoDoc
by Parul Pandey | November 19, 2020 Data Science , H2O Driverless AI

Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes.The Federal Reserve’s 2011 guidelines state that without adequate documentation, model risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus ...

Read more
From GLM to GBM – Part 2

How an Economics Nobel Prize could revolutionize insurance and lending Part 2: The Business Value of a Better ModelIntroductionIn Part 1 , we proposed better revenue and managing regulatory requirements with machine learning (ML). We made the first part of the argument by showing how gradient boosting machines (GBM), a type of ML, can mat...

Read more
From GLM to GBM - Part 1

How an Economics Nobel Prize could revolutionize insurance and lending Part 1: A New Solution to an Old ProblemIntroductionInsurance and credit lending are highly regulated industries that have relied heavily on mathematical modeling for decades. In order to provide explainable results for their models, data scientists and statisticians i...

Read more
Lessons of COVID-19 and Moving Forward: Key Takeaways
by Ingrid Burton | May 01, 2020 AI4Good , Community , Company , Data Science

This week, we hosted our second virtual panel focused on how AI can empower healthcare organizations to make better decisions and save lives. Improved forecasting and predictions lead to higher chances in managing and mitigating adverse events, such as the COVID-19 pandemic. I’m proud to acknowledge that H2O.ai is committed to helping cus...

Read more
Brief Perspective on Key Terms and Ideas in Responsible AI

INTRODUCTIONAs fields like explainable AI and ethical AI have continued to develop in academia and industry, we have seen a litany of new methodologies that can be applied to improve our ability to trust and understand our machine learning and deep learning models. As a result of this, we’ve seen several buzzwords emerge. In this short po...

Read more
Three Ways Data and AI is Helping Against COVID19
by Niki Athanasiadou | April 01, 2020 AI4Good , Data Science , Healthcare , Machine Learning

We are in the midst of a global crisis that epidemiologists have warned us about. As of today, 180 countries and sovereign regions have confirmed cases of patients infected with COVID19 (from here ). Putting aside evidence that indicates the virulence of the disease could be much worse, the fast spread of the virus and the presence of hi...

Read more
Igniting the AI in Healthcare Community
by David Engler | March 28, 2020 AI4Good , Community , Data Science , Healthcare

Yesterday we held our first Community Discussion on AI in Healthcare. Our CEO and founder, Sri Ambati led the discussion between Niki Athanasiadou, Marios Michailidis, one of our Grandmasters , and myself. We had nearly 1,300 participants registered from over 45 countries, and over half of those joined live others are viewing the replay. ...

Read more
COVID-19: Doing Good with Data + AI
by David Engler, Marios Michailidis | March 26, 2020 AI4Good , Data Science , Healthcare , Machine Learning , Time Series

During times of severe societal strain, individuals have historically shown an inclination to offer aid and assistance. Often these sacrifices have been at great cost to life or livelihood. In other cases, the efforts have been seemingly more mundane but nevertheless still essential. The efforts of the over 10,000 women code breakers of W...

Read more
How H2O.ai is Reinventing Healthcare with AI
by Parul Pandey | March 23, 2020 AI4Good , Data Science , Healthcare

H2O.ai is hosting a virtual Meetup on AI and Healthcare: Best Practices for Better Outcomes. Join us on 26th March, for a community discussion to collaborate with us and leading healthcare organizations to share ideas and best practices including predicting hospital staffing needs, ICU transfers, as well as sepsis detection and more. Reg...

Read more
Summary of a Responsible Machine Learning Workflow

A paper resulting from a collaboration between H2O.AI and BLDS, LLC was recently published in a special “Machine Learning with Python” issue of the journal, Information (https://www.mdpi.com/2078-2489/11/3/137). In “A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing...

Read more
It is a privilege to serve the world in its hour of need – H2O.ai response to the COVID-19 pandemic

During the COVID-19 pandemic, our world, our nations, states, counties, cities and communities face an unprecedented challenge with an urgent need to help our citizens and ultimately our national and global economy. At highest risk are senior citizens, at-risk populations (individuals with immunodeficiency, hypertension, diabetes) and our...

Read more
Detecting Money Laundering Networks Using H2O Driverless AI
by Parul Pandey, Ashrith Barthur, Sandip Sharma | March 05, 2020 Anti-Money Laundering , Data Science , Financial Services , H2O Driverless AI

Note: Dr. Ashrith Barthur (Principal Security Scientist, H2O.ai) and Sandip Sharma (Director of Solution Engineering, H2O.ai) will be speaking about solving money laundering and other real-world problems using machine learning at our upcoming webinar. You can grab a spot here. Artificial Intelligence has evolved from being a buzz word t...

Read more
AI & ML Platforms: My Fresh Look at H2O.ai Technology

2020: A new year, a new decade, and with that, I’m taking a new and deeper look at the technology H2O.ai offers for building AI and machine learning systems. I’ve been interested in H2O.ai since its early days as a company (it was 0xdata back then) in 2014. My involvement had been only peripheral, but now I’ve begun to work with this comp...

Read more
Interview with Patrick Hall | Machine Learning, H2O.ai & Machine Learning Interpretability

Audio Link: In this episode of Chai Time Data Science , Sanyam Bhutani interviews Patrick Hall, Sr. Director of Product at H2O.ai. Patrick has a background in Math and has completed a MS Course in Analytics.In this interview they talk all about Patrick’s journey into ML, ML Interpretability and his journey at H2O.ai, how his work has ev...

Read more
Key Takeaways from the 2020 Gartner Magic Quadrant for Data Science and Machine Learning

We are named a Visionary in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms (Feb 2020). We have been positioned furthest to the right for completeness of vision among all the vendors evaluated in the quadrant. So let’s walk you through the key strengths of our machine learning platforms. Automatic Machine Learn...

Read more
Speed up your Data Analysis with Python’s Datatable package
by Parul Pandey | February 05, 2020 Data Munging , Data Science , Datatable , H2O Driverless AI

A while ago, I did a write up on Python’s Datatable library . The article was an overview of the datatable package whose focus is on big data support and high performance. The article also compared datatable’s performance with the pandas’ library on certain parameters. This is the second article in the series with a two-fold objective: ...

Read more
Parallel Grid Search in H2O

H2O-3 is, at its core, a platform for distributed, in-memory computing. On top of the distributed computation platform, the machine learning algorithms are implemented. At H2O.ai, we design every operation, be it data transformation, training of machine learning models or even parsing to utilize the distributed computation model. In ord...

Read more
The Super Bowl and Data Science: Changing the NFL with the Power of Machine Learning
by Rafael Coss | January 31, 2020 Data Science , H2O-3 , Kaggle , Machine Learning

Super Bowl LIV came and went. The San Francisco 49ers vs the Kansas City Chiefs. Personally, being from the The Bay, I was rooting for the 49ers, but you can’t always get what you want. Whoever came out on top, though, we were all looking forward to a great game full of fantastic plays and the kind of gridiron tenacity where players lay i...

Read more
Grandmaster Series: How a Passion for Numbers Turned This Mechanical Engineer into a Kaggle Grandmaster

In conversation with Sudalai Rajkumar: A Kaggle Double Grandmaster and a Data Scientist at H2O.aiIt is rightly said that one should never seek praise. Instead, let the effort speak for itself. One of the essential traits of successful people is to never brag about their success but instead keep learning along the way. In the data science ...

Read more
Interview with Arno Candel | AutoML | Physics | CTDS.Show
by Sanyam Bhutani | December 12, 2019 Community , Company , Data Science

In this episode, Sanyam Bhutani interviews Dr. Arno Candel: CTO at H2O.ai They talk about Arno’s journey into the field with amazing comments and insights by Arno applicable to the field. They talk all about Arno’s journey and ML, Automated Machine Learning Broadly speaking. Arno’s journey from Physics to Software Engineering to Machine L...

Read more
Climbing the AI and ML Maturity Model Curve
by Karthik Guruswamy | November 19, 2019 Data Science , Machine Learning , Technical

AI/ML Maturity Model Curve/StepsAI/ML Maturity models are published and updated periodically by a lot of vendors. The end goal is almost always about effecting transformation and automate processes in a short period and making AI the DNA/core of the business.One of the biggest challenges for businesses today is to clearly define what succ...

Read more
How to write a Transformer Recipe for Driverless AI
by Ashrith Barthur | November 18, 2019 H2O Driverless AI , Machine Learning , Recipes

What is a transformer recipe? A transformer (or feature) recipe is a collection of programmatic steps, the same steps that a data scientist would write a code to build a column transformation. The recipe makes it possible to engineer the transformer in training and in production. The transformer recipe, and recipes, in general, provide a...

Read more
Takeaways from the World’s largest Kaggle Grandmaster Panel

Disclaimer: We were made aware by Kaggle of adversarial actions by one of the members of this panel. This panelist is no longer a Kaggle Grandmaster and no longer affiliated with H2O.ai as of January 10th, 2020. Personally, I’m a firm believer and fan of Kaggle and definitely look at it as the home of Data Science. ...

Read more
A Full-Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer
by Sanyam Bhutani | October 17, 2019 Data Science , Machine Learning Interpretability , Makers

Content originally posted in HackerNoon and Towards Data Science 15th of October, 2019 marks a special milestone, actually quite a few milestones. So I considered sharing it in the form a blog post, on a publication that has been home to all of my posts The online community has been too kind to me and these blog posts have been a method ...

Read more
Predicting Failures from Sensor Data using AI/ML — Part 2
by Karthik Guruswamy | September 27, 2019 H2O Driverless AI , Recipes , Technical

This is Part 2 of the blog post series and continuation of the original post, Predicting Failures from Sensor Data using AI/ML — Part 1 .Missing Values & Data ImbalanceOne of the things to note is that the hard-disk data set has a lot of missing values across its columns. Check out the Missing Data Heat Map on the training data set — ...

Read more
H2O Driverless AI: The Workbench for Data Science

This blog was written by Rohan Gupta and originally published here. 1. IntroductionIn today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you can get by with just intuitive knowledge. Especially if you’re on H2O’s Driverle...

Read more
H2O Driverless AI Acceleration with Intel DAAL
by Rafael Coss | September 25, 2019 Data Science , H2O Driverless AI , Machine Learning

This week at Strata NY 2019 we will be demoing a custom recipe that incorporates the Intel Data Analytics Acceleration Libraray (DAAL) algorithm into Driverless AI. This blog will provide an introduction to Intel DAAL and how the Make-Your-Own-Recipe capability extends H2O Driverless AI. If you are at Strata NY 2019, stop by the Intel bo...

Read more
Custom recipes for Driverless AI: Prophet and pmdarima cases
by Marios Michailidis | September 24, 2019 H2O Driverless AI , Recipes , Technical

Last updated: 09/23/19 H2O Driverless AI provides a great new feature called “custom recipes”. These recipes are essentially custom snippets of code which can incorporate any machine learning algorithm , any scorer/metric and any feature transformer. A user can create custom recipes using python utilizing any external library or his/her o...

Read more
New Innovations in Driverless AI

What’s new in Driverless AIWe’re super excited to announce the latest release of H2O Driverless AI . This is a major release with a ton of new features and functionality. Let’s quickly dig into all of that: Make Your Own AI with Recipes for Every Use Case: In the last year, Driverless AI introduced time-series and NLP recipes to meet the...

Read more
Detecting Sarcasm is difficult, but AI may have an answer
by Parul Pandey | August 05, 2019 H2O Driverless AI , NLP , Recipes , Technical , Tutorials

Recently, while shopping for a laptop bag, I stumbled upon a pretty amusing customer review: “This is the best laptop bag ever. It is so good that within two months of use, it is worthy of being used as a grocery bag.” The innate sarcasm in the review is evident as the user isn’t happy with the quality of the bag. However, as the sentence...

Read more
Custom Machine Learning Recipes: The ingredients for success

Last updated: 07/23/19Machine learning is akin to cooking in several ways. A perfect dish originates from a tried-and-tested recipe, has the right combination of ingredients, and is baked at just the right temperature. Successful AI solutions work on the same principle. One needs fresh and right quality ingredients in the form of data, ...

Read more
Leads to Leases

There is such a large amount of unstructured data being produced by companies. I personally find it so interesting that there is so much meaning and hidden value in text, audio, and visual content. Until recently, much of this data would go unused. However, since the rise of machine learning and artificial intelligence, it became possibl...

Read more
ArmadaHealth Uses AI to Match Patients with Specialists to Improve Health Outcomes
by Priya Jain | July 09, 2019 Customers , Data Science , Healthcare

As an intern for H2O.ai, I am amazed to see how instrumental AI has been in transforming people’s lives for the better. Especially in healthcare, AI is bringing increased efficiency, ease, and helping people lead healthier lives. In this blog, I learned about how AI is helping potential patients find the right specialist for their needs a...

Read more
Toward AutoML for Regulated Industry with H2O Driverless AI

Predictive models in financial services must comply with a complex regime of regulations including the Equal Credit Opportunity Act (ECOA), the Fair Credit Reporting Act (FCRA), and the Federal Reserve’s S.R. 11-7 Guidance on Model Risk Management. Among many other requirements, these and other applicable regulations stipulate predictive ...

Read more
Underwrite.ai Transforms Credit Risk Decision-Making Using AI

Determining credit has been done by traditional techniques for decades. The challenge with traditional credit underwriting is that it doesn’t take into account all of the various aspects or features of an individual’s credit ability. Underwrite.ai, a new credit startup, saw this as an opportunity to apply machine learning and AI to impro...

Read more
The Reproductive Science Center of SF Bay Area uses AI to Treat Infertility

Having your own baby may be a dream that many people have but some cannot realize until they seek specialized help. The Reproductive Science Center of SF Bay Area is one of the pioneer organizations conducting in-vitro fertilization. They strive to produce healthy babies for their patients. However, every patient has their own set of obst...

Read more
An Overview of Python’s Datatable package

This blog originally appeared on Towardsdatascience.com “There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days”: Eric Schmidt If you are an R user, chances are that you have already been using the data.ta...

Read more
Building an Interpretable & Deployable Propensity AI/ML Model in 7 Steps…

To start with, you may have a tabular data set with a combination of: Dates/Timestamps Categorical Values Text strings Numeric Values A business sponsor wants to build a Propensity to Buy model from historical data.How many Steps does it take? Let’s find out. We are going to use H2O’s Driverless AI instance with 1 GPU (optional...

Read more
H2O.ai Automatic Machine Learning on Red Hat OpenShift Container Platform Delivers Data Science Ease and Flexibility at Scale
by Vinod Iyengar | May 14, 2019 Cloud , Data Science , Demos , H2O Driverless AI

Last week at Red Hat Summit in Boston, Sri Ambati, CEO and Founder, demonstrated how to use our award-winning automatic machine learning platform, H2O Driverless AI , on Red Hat OpenShift Container Platform. You can watch the replay here .What we showed not only helps data scientists achieve results, it also enables them to scale their ...

Read more
AI/ML Projects — Don’t get stymied in the last mile

Data Scientists build AI/ML models from data, and then deploy it to production – in addition to a plethora of tasks around data insights, data cleansing etc., Part of the Data Scientist job description/requirement is making models available for transparency, auditability as well as explainability for both regulators as well as internal bu...

Read more
Hortifrut uses AI to Determine the Freshness of Blueberries

Who doesn’t love sweet, delicious blueberries?Providing a steady supply of beautiful, tasty berries to the market is no small effort and Hortifrut, based in Chile, has been growing and distributing berries for the last 30 years. Today, they are using AI to provide fresh berries to the world everyday.Hortifrut, the largest global producer ...

Read more
Can Your Machine Learning Model Be Hacked?!

I recently published a longer piece on security vulnerabilities and potential defenses for machine learning models. Here’s a synopsis.IntroductionToday it seems like there are about five major varieties of attacks against machine learning (ML) models and some general concerns and solutions of which to be aware. I’ll address them one-by-o...

Read more
H2O World Explainable Machine Learning Discussions Recap

Earlier this year, in the lead up to and during H2O World, I was lucky enough to moderate discussions around applications of explainable machine learning (ML) with industry-leading practitioners and thinkers. This post contains links to these discussions, written answers and pertinent resources for some of the most common questions asked ...

Read more
H2O-3, Sparkling Water and Enterprise Steam Updates
by Venkatesh Yadav | April 10, 2019 Community , Data Science , H2O Release , Technical

We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.Below are some of the new features we have added:H2O-3 Yates (3.24.0.1) – 3/31/2019Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html Bug [PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local mach...

Read more
Building AI/ML models on Lending Club Data, with H2O.ai — Part 1
by Karthik Guruswamy, Vinod Iyengar | March 28, 2019 Beginners , Community , Data Journalism , Data Science , Technical , Tutorials

Lending Club publishes its basic loan databases to the public and a full version to its customers — anonymized of course. You can find the download page from this link (screenshot below): The publicly downloadable loan data has various attributes — roughly 150+ columns that have categorical, numeric, text and date fields. It also has a ‘...

Read more
How to explain a model with H2O Driverless AI

The ability to explain and trust the outcome of an AI-driven business decision is now a crucial aspect of the data science journey. There are many tools in the marketplace that claim to provide transparency and interpretability around machine learning models but how does one actually explain a model? H2O Driverless AI provides robust inte...

Read more
What is Your AI Thinking? Part 3

In the past two posts we’ve learned a little about interpretable machine learning in general. In this post, we will focus on how to accomplish interpretable machine learning using H2O Driverless AI . To review, the past two posts discussed: Exploratory data analysis (EDA) Accurate and interpretable models Global explanations Local...

Read more
What is Your AI Thinking? Part 2

Explaining AI to the Business PersonWelcome to part 2 of our blog series: What is Your AI Thinking? We will explore some of the most promising testing methods for enhancing trust in AI and machine learning models and systems. We will also cover the best practice of model documentation from a business and regulatory standpoint.More Techniq...

Read more
What is Your AI Thinking? Part 1

Explaining AI to the Business PersonExplainable AI is in the news, and for good reason. Financial services companies have cited the ability to explain AI-based decisions as one of the critical roadblocks to further adoption of AI for their industry . Moreover, interpretability, fairness, and transparency of data-driven decision support sy...

Read more
What Business Leaders Need to Know About AI
by Ingrid Burton | January 11, 2019 Beginners , Community , Data Journalism , Data Science

The interest around artificial intelligence (AI) is at an all-time fevered pitch right now, and it’s important to understand why.AI can solve real business problems and address very complex situations. Organizations and business leaders should start with the idea of how AI can help by identifying a business problem or use case that they c...

Read more
For Today’s BI Analyst - Accelerating your AI/ML efforts with Driverless AI
by Karthik Guruswamy | December 10, 2018 Data Science , H2O Driverless AI

Whether you are starting out as a novice data scientist or a veteran in AI and Machine Learning, modern tools can guide you in creating some of the best models from your data. Not to mention, ease of moving models to production.Also don’t forget the experienced BI Analysts in your organization, who wants to play with data science , only t...

Read more
Anomaly Detection with Isolation Forests using H2O
by Martin Barus | November 06, 2018 Data Science , H2O-3

IntroductionAnomaly detection is a common data science problem where the goal is to identify odd or suspicious observations, events, or items in our data that might be indicative of some issues in our data collection process (such as broken sensors, typos in collected forms, etc.) or unexpected events like security breaches, server failu...

Read more
Launching the Academic Program … OR ... What Made My First Four Weeks at H2O.ai so Special!

We just launched the H2O.ai Academic Program at our sold-out H2O World London. With nearly 1000 people in attendance, we received the first online sign-up forms submitted by professors and students alike. This program will massively democratize AI in academia, increasing the number of AI-skilled graduates – with both technical and busine...

Read more
How This AI Tool Breathes New Life Into Data Science

Ask any data scientist in your workplace. Any Data Science Supervised Learning ML/AI project will go through many steps and iterations before it can be put in production. Starting with the question of “Are we solving for a regression or classification problem?” Data Collection & Curation Are there Outliers? What is the Distribu...

Read more
What does NVIDIA’s Rapids platform mean for the Data Science community?

Today NVIDIA announced the launch of the RAPIDS suite of software libraries to enables GPU acceleration for data science workflows and we’re excited to partner with NVIDIA to bring GPU accelerated open source technology for the machine learning and AI community. “Machine learning is transforming businesses and NVIDIA GPUs are speeding...

Read more
Automatic Feature Engineering for Text Analytics - The Latest Addition to Our Kaggle Grandmasters' Recipes
by Jo-Fai Chow, Sudalai Rajkumar | September 12, 2018 Data Science , GPU , H2O Driverless AI , NLP

According to Kaggle’s ‘The State of Machine Learning and Data Science ’ survey , text data is the second most used data type at work for data scientists. There are a lot of interesting text analytics applications like sentiment prediction, product categorization, document classification and so on. In the latest version (1.3) of our Driver...

Read more
H2O for Inexperienced Users
by H2O.ai Team | August 24, 2018 Beginners , Data Science , H2O-3 , Machine Learning

Some background: I am a rising senior in highschool, and the summer of 2018, I interned at H2O.ai. With no ML experience beyond Andrew Ng’s Introduction to Machine Learning course on Coursera and a couple of his deep learning courses, I initially found myself slightly overwhelmed by the variety of new algorithms H2O has to offer in both ...

Read more
The different flavors of AutoML
by Erin LeDell | August 15, 2018 AutoML , Data Science , H2O Driverless AI , H2O-3

In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software (e.g. H2O , scikit-learn , keras ). Although these tools have made it easy to train and evaluate ma...

Read more
How to Frame Your Business Problem for Automatic Machine Learning

Over the last several years, machine learning has become an integral part of many organizations’ decision-making at various levels. With not enough data scientists to fill the increasing demand for data-driven business processes, H2O.ai has developed a product called Driverless AI that automates several time consuming aspects of a typica...

Read more
AI in Healthcare - Redefining Patient & Physician Experiences
by H2O.ai Team | May 14, 2018 Community , Data Science , Deep Learning

Register for the Meetup Here Patients, physicians, nurses, health administrators and policymakers are beneficiaries of the rapid transformations in health and life sciences. These transformations are being driven by new discoveries (etiology, therapies, and drugs/implants), market reconfiguration and consolidation, a movement to value-bas...

Read more
Come meet the Makers!
by H2O.ai Team | March 26, 2018 Data Science , Events , H2O Driverless AI , H2O4GPU

NVIDIA’s GPU Technology Conference (GTC) Silicon Valley, March 26-29th is the premier AI and deep learning event, providing you with training, insights, and direct access to the industry’s best and brightest. It’s where you will see the latest breakthroughs in self-driving cars, smart cities, healthcare, high-performance computing, virtu...

Read more
New features in H2O 3.18
by H2O.ai Team | February 22, 2018 AutoML , Ensembles , H2O Release , XGBoost

Wolpert Release (H2O 3.18)There’s a new major release of H2O and it’s packed with new features and fixes! We named this release after David Wolpert , who is famous for inventing Stacking (aka Stacked Ensembles ). Stacking is a central component in H2O AutoML , so we’re very grateful for his contributions to machine learning! He is also fa...

Read more
H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise
by H2O.ai Team | November 30, 2017 Data Science , Machine Learning

November 30, 2017 | Data Science, Machine Learning | H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise

Read more
Laying a Strong Foundation for Data Science Work
by H2O.ai Team | November 24, 2017 Data Science , IT

By William Merchan, CSO, DataScience.com In the past few years, data science has become the cornerstone of enterprise companies’ efforts to understand how to deliver better customer experiences. Even so, when DataScience.com commissioned Forrester to survey over 200 data-driven businesses last year, only 22% reported they were leverag...

Read more
H2O.ai Releases H2O4GPU, the Fastest Collection of GPU Algorithms on the Market, to Expedite Machine Learning in Python
by H2O.ai Team | September 26, 2017 GBM , GLM , GPU , k-Means

H2O4GPU is an open-source collection of GPU solvers created by H2O.ai. It builds on the easy-to-use scikit-learn Python API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algor...

Read more
Scalable Automatic Machine Learning: Introducing H2O's AutoML
by H2O.ai Team | June 21, 2017 AutoML , Ensembles , H2O Release , Technical

Prepared by: Erin LeDell, Navdeep Gill & Ray Peck In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts...

Read more
Stacked Ensembles and Word2Vec now available in H2O!

Prepared by: Erin LeDell and Navdeep Gill MathJax.Hub.Config({ tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]} }); Stacked Ensembles ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = my_models) Python:ensemble = H2OStackedEnsembleEstimator(base_models=my_models) ensemble.train(x=x, y=y, training...

Read more
Using Sentiment Analysis to Measure Election Surprise
by H2O.ai Team | December 01, 2016 Data Journalism

Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset. One recent event that ...

Read more
Creating a Binary Classifier to Sort Trump vs. Clinton Tweets Using NLP
by H2O.ai Team | October 17, 2016 Community , Data Journalism , Flow , Python

The problem : Can we determine if a tweet came from the Donald Trump Twitter account (@realDonaldTrump) or the Hillary Clinton Twitter account (@HillaryClinton) using text analysis and Natural Language Processing (NLP) alone? The Solution : Yes! We’ll divide this tutorial into three parts, the first on how to gather the necessary data, t...

Read more
When is the Best Time to Look for Apartments on Craigslist?
by H2O.ai Team | October 06, 2016 Data Journalism

A while ago I was looking for an apartment in San Francisco. There are a lot of problems with finding housing in San Francisco, mostly stemming from the fierce competition. I was checking Craigslist every single day. It still took me (and my girlfriend) a few months to find a place — and we had to sublet for three weeks in between. Thankf...

Read more
Distracted Driving
by H2O.ai Team | September 16, 2016 Data Journalism

Last week, we started to examine the 7.2% increase in traffic fatalities from 2014 to 2015, the reversal of a near decade-long downward trend. We then broke out the data by various accident classifications , such as “speeding” or “driving with a positive BAC,” and identified those classifications that had the greatest increase. One label...

Read more
Fatal Traffic Accidents Rise in 2015
by H2O.ai Team | September 07, 2016 Data Journalism

On Tuesday, August 30th, the National Highway Traffic Safety Administration released their annual dataset of traffic fatalities asking interested parties to use the dataset to identify the causes of an increase of 7.2% in fatalities from 2014 to 2015. As part of H2O.ai ‘s vision of using artificial intelligence for the betterment of soci...

Read more
Red herring bites
by H2O.ai Team | May 06, 2016 Data Munging , R-Bloggers , Technical

At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them or not; and ii) show how it scales with...

Read more
Fast csv writing for R
by H2O.ai Team | April 24, 2016 Data Munging , R , R-Bloggers , Technical

R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2...

Read more

ERROR