H2O.ai Blog
Filter By:
27 results Category: Year:Behind the scenes of CRAN
(Just from my point of view as a package maintainer.) New users of R might not appreciate the full benefit of CRAN and new package maintainers may not appreciate the importance of keeping their packages updated and free of warnings and errors. This is something I only came to realize myself in the last few years so I thought I would write...
Read moreWhat is new in H2O latest release 3.10.2.1 (Tutte) ?
Today we released H2O version 3.10.2.1 (Tutte). It’s available on our Downloads page, and release notes can be found here . Photo Credit: https://en.wikipedia.org/wiki/W._T._Tutte Top enhancements in this release: GLM MOJO Support: GLM now supports our smaller, faster, more efficient MOJO (Model ObJect, Optimized) format for model pu...
Read moreUsing Sentiment Analysis to Measure Election Surprise
Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset. One recent event that ...
Read moreIndexing 1 Billion Time Series with H2O and ISax
At H2O, we have recently debuted a new feature called ISax that works on time series data in an H2O Dataframe. ISax stands for Indexable Symbolic Aggregate ApproXimation, which means it can represent complex time series patterns using a symbolic notation and thereby reducing the dimensionality of your data. From there you can run H2O’s ML...
Read moreWhy We Bought A Happy Diwali Billboard
It’s been a dark year in many ways, so we wanted to lighten things up and celebrate Diwali — the festival of lights! Diwali is a holiday that celebrates joy, hope, knowledge and all that is full of light — the perfect antidote for some of the more negative developments coming out of the Silicon Valley recently. Throw in a polarizing pre...
Read moreCreating a Binary Classifier to Sort Trump vs. Clinton Tweets Using NLP
The problem : Can we determine if a tweet came from the Donald Trump Twitter account (@realDonaldTrump) or the Hillary Clinton Twitter account (@HillaryClinton) using text analysis and Natural Language Processing (NLP) alone? The Solution : Yes! We’ll divide this tutorial into three parts, the first on how to gather the necessary data, t...
Read moresparklyr: R interface for Apache Spark
This post is reposted from Rstudio’s announcement on sparklyr – Rstudio’s extension for Spark Connect to Spark from R. The sparklyr package provides a complete dplyr backend. Filter and aggregate Spark datasets then bring them into R for analysis and visualization. Use Spark’s distributed machine learning library from R. Create...
Read moreWhen is the Best Time to Look for Apartments on Craigslist?
A while ago I was looking for an apartment in San Francisco. There are a lot of problems with finding housing in San Francisco, mostly stemming from the fierce competition. I was checking Craigslist every single day. It still took me (and my girlfriend) a few months to find a place — and we had to sublet for three weeks in between. Thankf...
Read moreFocus
———- Forwarded message ——— From: SriSatish Ambati Date: Thu, Sep 15, 2016 at 10:17 PM Subject: changes and all hands tomorrow. To: team Team, Our focus has changed towards larger fewer deals & deeper engagements with handful of finance and insurance customers. We took a hard look at our marketing spend, pr programs and personnel. We l...
Read moreDistracted Driving
Last week, we started to examine the 7.2% increase in traffic fatalities from 2014 to 2015, the reversal of a near decade-long downward trend. We then broke out the data by various accident classifications , such as “speeding” or “driving with a positive BAC,” and identified those classifications that had the greatest increase. One label...
Read moreIntroducing H2O Community & Support Portals
At H2O, we enjoy serving our customers and the community, and we take pride in making them successful while using H2O products. Today, we are very excited to announce two great platforms for our customers and for the community to better communicate with H2O. Let’s start with our community first: The success of every open source project ...
Read moreFatal Traffic Accidents Rise in 2015
On Tuesday, August 30th, the National Highway Traffic Safety Administration released their annual dataset of traffic fatalities asking interested parties to use the dataset to identify the causes of an increase of 7.2% in fatalities from 2014 to 2015. As part of H2O.ai ‘s vision of using artificial intelligence for the betterment of soci...
Read moreIoT - Take Charge of Your Business and IT Insights Starting at the Edge
Instead of just being hype, the Internet of Things (IoT) is now becoming a reality. Gartner forecasts that 6.4 billion connected devices will be in use worldwide, and 5.5 million new devices will get connected every day, in 2016. These devices range from wearables, to sensors in vehicles the can detect surrounding obstacles, to sensors in...
Read moreHyperparameter Optimization in H2O: Grid Search, Random Search and the Future
“Good, better, best. Never let it rest. ‘Til your good is better and your better is best.” – St. Jerome tl;drH2O now has random hyperparameter search with time- and metric-based early stopping. Bergstra and Bengio[1] write on p. 281: Compared with neural networks configured by a pure grid search, we find that random search over the s...
Read moreH2O GBM Tuning Tutorial for R
In this tutorial, we show how to build a well-tuned H2O GBM model for a supervised classification task. We specifically don’t focus on feature engineering and use a small dataset to allow you to reproduce these results in a few minutes on a laptop. This script can be directly transferred to datasets that are hundreds of GBs large and H...
Read moreSpam Detection with Sparkling Water and Spark Machine Learning Pipelines
This short post presents the “ham or spam” demo, which has already been posted earlier by Michal Malohlava , using our new API in latest Sparkling Water for Spark 1.6 and earlier versions, unifying Spark and H2O Machine Learning pipelines. It shows how to create a simple Spark Machine Learning pipeline and a model based on the fitted pipe...
Read moreInterview with Carolyn Phillips, Sr. Data Scientist, Neurensic
During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the second of a multipart series recapping our conversations. Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai. H2O.ai: How did you become a data scientist? Phillips: Until ...
Read moreInterview with Svetlana Kharlamova, Sr. Data Scientist, Grainger
During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the first of a multipart series recapping our conversations. Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai. H2O.ai: How did you become a data scientist? Kharlamova: I’m a...
Read moreH2O Day at Capital One
Here at H2O.ai one of our most important partners is Capital One, and we’re proud to have been working with them for over a year. One of the world’s leading financial services providers, Capital One has a strong reputation for being an extremely data and technology-focused organization. That’s why when the Capital One team invited us to t...
Read moreRed herring bites
At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them or not; and ii) show how it scales with...
Read moreFast csv writing for R
R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2...
Read moreApache Spark and H2O on AWS
This is a guest post re-published with permission from our friends at Datapipe. The original lives here. One of the advantages of public cloud is the ability to experiment and run various workloads without the need to commit to purchasing hardware. However, to meet your data processing needs, a well-defined mapping between your objecti...
Read moreConnecting to Spark & Sparkling Water from R & Rstudio
Sparkling Water offers the best of breed machine learning for Spark users. Sparkling Water brings all of H2O’s advanced algorithms and capabilities to Spark. This means that you can continue to use H2O from Rstudio or any other ide of your choice. This post will walk you through the steps to get running on plain R or R studio from Spark. ...
Read moreDrink in the Data with H2O at Strata SJ 2016
It’s about to rain data in San Jose when Strata + Hadoop World comes to town March 29 – March 31st. H2O has a waterfall of action happening at the show. Here’s a rundown of what’s on tap. Keep it handy so you have less chance of FOMO (fear of missing out). Hang out with H2O at Booth #1225 to learn more about how machine learning can hel...
Read moreRoad Ahead and BTUs
H2O.ai – Road Ahead – keynote presentation by Sri Ambati from Sri Ambati ...
Read moreThank you, Cliff
Cliff resigned from the Company last week – He is parting on good terms and supports our success in future. Cliff and I worked closely since 2004 so this is a loss for me. It ends an era of prolific work supporting my vision as a partner. Let’s take this opportunity to congratulate Cliff on his work, in helping me build something from not...
Read moreThe Top 10 Most Watched Videos From H2O World 2015
Now that we’re a few months out from H2O World we wanted to share with you all what the most popular talks were by online viewership. The talks covered a variety of topics from introductions, to in-depth examinations of use cases, to wide-ranging panels. Introduction to Data Science Featuring Erin LeDell, Statistician and Machine Learnin...
Read more