H2O.ai Blog
Filter By:
45 results Category: Year:Unsupervised Learning Metrics
That which is measured improves – Karl Pearson , Mathematician. Almost everyone has heard of accuracy, precision, and recall – the most common metrics for supervised learning . But not as many people know the metrics for unsupervised learning . So, in this article, we will take you through the most common methods and how to implement th...
Read moreA Quick Introduction to PyTorch: Using Deep Learning for Stock Price Prediction
Torch is a scalable and efficient deep learning framework. It offers flexibility and speed to build large scale applications. It also includes a wide range of libraries for developing speech, image, and video-based applications. The basic building block of Torch is called a tensor. All the operations defined in Torch use a tensor. Ok, l...
Read moreHow to Create Your Spotify EDA App with H2O Wave
In this article, I will show you how to build a Spotify Exploratory Data Analysis (EDA) app using H2O Wave from scratch.H2O Wave is an open-source Python development framework for interactive AI apps. You do not need to know Flask, HTML, CSS, etc. H2O Wave has ready-to-use user-interface components and charts, including dashboard templa...
Read moreAn Introduction to Unsupervised Machine Learning
There are three major branches of machine learning (ML): supervised, unsupervised, and reinforcement. Supervised learning makes up the bulk of the models businesses use, and reinforcement learning is behind front-page-news-AI such as AlphaGo . We believe unsupervised learning is the unsung hero of the three, and in this article, we brea...
Read moreInstall H2O Wave on AWS Lightsail or EC2
Note : this blog post was first published on Thomas’ personal blog Neural Market Trends . I recently had to set up H2O’s Wave Server on AWS Lightsail and build a simple Wave App as a Proof of Concept. If you’ve never heard of H2O Wave then you have been missing out on a new cool app development framework. We use it at H2O to build AI-ba...
Read moreShapley Values - A Gentle Introduction
If you can’t explain it to a six-year-old, you don’t understand it yourself. – Albert Einstein One fear caused by machine learning (ML) models is that they are blackboxes that cannot be explained. Some are so complex that no one, not even domain experts, can understand why they make certain decisions. This is of particular concern when s...
Read moreTime Series Forecasting Best Practices
Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best practices. The talk was well-received so we decided to turn it into a blog post. Below are some of the highlights from his talk. You can also follow the two software demos and try it yourself using our H2O AI Cloud .(Note : The video links with ...
Read moreImproving NLP Model Performance with Context-Aware Feature Extraction
I would like to share with you a simple yet very effective trick to improve feature engineering for text analytics. After reading this article, you will be able to follow the exact steps and try it yourself using our H2O AI Cloud .First of all, let’s have a look at the off-the-shelf natural language processing (NLP) recipes in H2O Driver...
Read moreH2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database
One of the goals of machine learning is to find unknown predictive features, even hidden from subject matter experts, in datasets that might not be apparent before, and use those 3rd party features to increase the accuracy of the model.A traditional way of doing this was to try and scrape and scour distributed, stagnant data sources on th...
Read moreH2O on Kubernetes using Helm
Deploying real-world applications using bare YAML files to Kubernetes is a rather complex task, and H2O is no exception. As demonstrated in one of the previous blog posts . Greatly simplified, a cluster of H2O open source machine learning nodes is brought up in the following manner: A headless service to make initial node discovery and ...
Read moreCombining the power of KNIME and H2O.ai in a single integrated workflow
KNIME and H2O.ai , the two data science pioneers known for their open source platforms, have partnered to further democratize AI. Our approaches are about being open, transparent, and pushing the leading edge of AI. We believe strongly that AI is not for the select few but for everyone. We are taking another step in democratizing AI by ...
Read moreEmpowering Snowflake Users with AI using SQL
At H2O.ai we work with many enterprise customers, all the way from Fortune 500 giants to small startups. What we heard from all these customers as they embark on their data science and machine learning journey is the need to capture and manage more data cost-effectively, and the ability to share that data across their organization to mak...
Read moreKey Takeaways from the 2020 Gartner Magic Quadrant for Data Science and Machine Learning
We are named a Visionary in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms (Feb 2020). We have been positioned furthest to the right for completeness of vision among all the vendors evaluated in the quadrant. So let’s walk you through the key strengths of our machine learning platforms. Automatic Machine Learn...
Read moreBlink: Data to AI/ML Production Pipeline Code in Just a Few Clicks
You have the data and now want to build a really really good AI/ML model and deliver to production. There are three options available today: Write the code yourself in a Jupyter notebook/R Studio etc., for training/validation and dev-ops model handoff. You decided to do the feature engineering also. Build your own features like above,...
Read moreParallel Grid Search in H2O
H2O-3 is, at its core, a platform for distributed, in-memory computing. On top of the distributed computation platform, the machine learning algorithms are implemented. At H2O.ai, we design every operation, be it data transformation, training of machine learning models or even parsing to utilize the distributed computation model. In ord...
Read moreScalable AutoML in H2O
Note: I’m grateful to Dr. Erin LeDell for the suggestions, corrections with the writeup. All of the images used here are from the talks’ slides. Erin Ledell’s talk was aimed at AutoML : Automated Machine Learning , broadly speaking, followed by an overview of H2O’s Open Source Project and the library. H2O AutoML provides an easy-to-use ...
Read moreClimbing the AI and ML Maturity Model Curve
AI/ML Maturity Model Curve/StepsAI/ML Maturity models are published and updated periodically by a lot of vendors. The end goal is almost always about effecting transformation and automate processes in a short period and making AI the DNA/core of the business.One of the biggest challenges for businesses today is to clearly define what succ...
Read moreImporting, Inspecting, and Scoring With MOJO Models Inside H2O
Machine-learning models created with H2O may be exported in two basic ways: Binary format, Model Object, Optimized (MOJO). An H2 O model can be saved in a binary format, which is tied to the very specific version of H2 O it has been created with. There are multiple reasons for such a restriction. One of the important reasons is that...
Read moreA Deep Dive into H2O’s AutoML
The demand for machine learning systems has soared over the past few years. This is majorly due to the success of Machine Learning techniques in a wide range of applications. AutoML is fundamentally changing the face of ML-based solutions today by enabling people from diverse backgrounds to use machine learning models to address complex ...
Read moreMake your own AI — Add Your Game to Auto-ML Models
When Features and Algorithms compete, your Business Use Case(s) wins! H2O Driverless AI is an Automatic Feature Engineering /Machine Learning platform to build AI/ML models on tabular data. Driverless AI can build supervised learning models for Time Series forecasts, Regression , Classification , etc. It supports a myriad of built-i...
Read morePredicting Failures from Sensor Data using AI/ML — Part 2
This is Part 2 of the blog post series and continuation of the original post, Predicting Failures from Sensor Data using AI/ML — Part 1 .Missing Values & Data ImbalanceOne of the things to note is that the hard-disk data set has a lot of missing values across its columns. Check out the Missing Data Heat Map on the training data set — ...
Read moreH2O Driverless AI: The Workbench for Data Science
This blog was written by Rohan Gupta and originally published here. 1. IntroductionIn today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you can get by with just intuitive knowledge. Especially if you’re on H2O’s Driverle...
Read moreCustom recipes for Driverless AI: Prophet and pmdarima cases
Last updated: 09/23/19 H2O Driverless AI provides a great new feature called “custom recipes”. These recipes are essentially custom snippets of code which can incorporate any machine learning algorithm , any scorer/metric and any feature transformer. A user can create custom recipes using python utilizing any external library or his/her o...
Read morePredicting Failures from Sensor Data using AI/ML— Part 1
Last updated: 08/26/19 Whether it’s healthcare, manufacturing or anything that we depend on either personal or in business, Prevention of a problem is always known to be better than cure! Classic prevention techniques involve time-based checks to see how things are progressing, positively or negatively. Time-based chec...
Read moreNew Innovations in Driverless AI
What’s new in Driverless AIWe’re super excited to announce the latest release of H2O Driverless AI . This is a major release with a ton of new features and functionality. Let’s quickly dig into all of that: Make Your Own AI with Recipes for Every Use Case: In the last year, Driverless AI introduced time-series and NLP recipes to meet the...
Read moreDetecting Sarcasm is difficult, but AI may have an answer
Recently, while shopping for a laptop bag, I stumbled upon a pretty amusing customer review: “This is the best laptop bag ever. It is so good that within two months of use, it is worthy of being used as a grocery bag.” The innate sarcasm in the review is evident as the user isn’t happy with the quality of the bag. However, as the sentence...
Read moreGetting started with H2O using Flow
This blog was originally published on towardsdatascience: https://towardsdatascience.com/getting-started-with-h2o-using-flow-b560b5d969b8A look into H2O’s open-source UI for combining code execution, text, plots, and rich media in a single document. Data collection is easy. Decision making is hard. Today, we have access to a humungous...
Read moreAn Overview of Python’s Datatable package
This blog originally appeared on Towardsdatascience.com “There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days”: Eric Schmidt If you are an R user, chances are that you have already been using the data.ta...
Read moreH2O-3, Sparkling Water and Enterprise Steam Updates
We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.Below are some of the new features we have added:H2O-3 Yates (3.24.0.1) – 3/31/2019Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html Bug [PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local mach...
Read moreBuilding AI/ML models on Lending Club Data, with H2O.ai — Part 1
Lending Club publishes its basic loan databases to the public and a full version to its customers — anonymized of course. You can find the download page from this link (screenshot below): The publicly downloadable loan data has various attributes — roughly 150+ columns that have categorical, numeric, text and date fields. It also has a ‘...
Read moreAI/ML Model Scoring - What Good Looks Like in Production
One of the main reasons why we build AI/Machine Learning models is for it to be used in production to support expert decision making. Whether your business is deciding what creatives your customers should be getting on emails or determining a product recommendation for a web page, AI/Models provide relevance/context to customers to drive ...
Read moreHow This AI Tool Breathes New Life Into Data Science
Ask any data scientist in your workplace. Any Data Science Supervised Learning ML/AI project will go through many steps and iterations before it can be put in production. Starting with the question of “Are we solving for a regression or classification problem?” Data Collection & Curation Are there Outliers? What is the Distribu...
Read moreH2O’s AutoML in Spark
This blog post demonstrates how H2O’s powerful automatic machine learning can be used together with the Spark in Sparkling Water.We show the benefits of Spark & H2O integration, use Spark for data munging tasks and H2O for the modelling phase, where all these steps are wrapped inside a Spark Pipeline. The integration between Spark and...
Read moreH2O-3 on FfDL: Bringing deep learning and machine learning closer together
This post originally appeared in the IBM Developer blog here. This post is co-authored by Animesh Singh, Nicholas Png, Tommy Li, and Vinod Iyengar. Deep learning frameworks like TensorFlow, PyTorch, Caffe, MXNet, and Chainer have reduced the effort and skills needed to train and use deep learning models. But for AI developers and data ...
Read moreScalable Automatic Machine Learning: Introducing H2O's AutoML
Prepared by: Erin LeDell, Navdeep Gill & Ray Peck In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts...
Read moreH2O announces GPU Open Analytics Initiative with MapD & Continuum
H2O.ai, Continuum Analytics, and MapD Technologies have announced the formation of the GPU Open Analytics Initiative (GOAI) to create common data frameworks enabling developers and statistical researchers to accelerate data science on GPUs. GOAI will foster the development of a data science ecosystem on GPUs by allowing resident applicat...
Read moreUse H2O.ai on Azure HDInsight
This is a repost from this article on MSDN. We’re hosting an upcoming webinar to present you how to use H2O on HDInsight and to answer your questions. Sign up for our upcoming webinar on combining H2O and Azure HDInsight. We recently announced that H2O and Microsoft Azure HDInsight have integrated to provide Data Scientists with a Lead...
Read moreSparkling Water on the Spark-Notebook
This is a guest post from our friends at Kensu. In the space of Data Science development in enterprises, two outstanding scalable technologies are Spark and H2O. Spark is a generic distributed computing framework and H2O is a very performant scalable platform for AI. Their complementarity is best exploited with the use of Sparkling Wat...
Read moreStacked Ensembles and Word2Vec now available in H2O!
Prepared by: Erin LeDell and Navdeep Gill MathJax.Hub.Config({ tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]} }); Stacked Ensembles ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = my_models) Python:ensemble = H2OStackedEnsembleEstimator(base_models=my_models) ensemble.train(x=x, y=y, training...
Read moreStart Off 2017 with Our Stanford Advisors
We were very excited to meet with our advisors (Prof. Stephen Boyd, Prof. Rob Tibshirani and Prof. Trevor Hastie) at H2O.AI on Jan 6, 2017. Professors Boyd, Tibshirani & Hastie in the house! @h2oai #elementsofstatisticallearning #MachineLearning pic.twitter.com/FnlCNrY7Hy — H2O.ai (@h2oai) January 6, 2017 Our CEO, Sri Ambati, ma...
Read moreIndexing 1 Billion Time Series with H2O and ISax
At H2O, we have recently debuted a new feature called ISax that works on time series data in an H2O Dataframe. ISax stands for Indexable Symbolic Aggregate ApproXimation, which means it can represent complex time series patterns using a symbolic notation and thereby reducing the dimensionality of your data. From there you can run H2O’s ML...
Read moreHyperparameter Optimization in H2O: Grid Search, Random Search and the Future
“Good, better, best. Never let it rest. ‘Til your good is better and your better is best.” – St. Jerome tl;drH2O now has random hyperparameter search with time- and metric-based early stopping. Bergstra and Bengio[1] write on p. 281: Compared with neural networks configured by a pure grid search, we find that random search over the s...
Read moreSpam Detection with Sparkling Water and Spark Machine Learning Pipelines
This short post presents the “ham or spam” demo, which has already been posted earlier by Michal Malohlava , using our new API in latest Sparkling Water for Spark 1.6 and earlier versions, unifying Spark and H2O Machine Learning pipelines. It shows how to create a simple Spark Machine Learning pipeline and a model based on the fitted pipe...
Read moreRed herring bites
At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them or not; and ii) show how it scales with...
Read moreFast csv writing for R
R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2...
Read more