Return to page Blog

Filter By:

27 results Category: Year:
Behind the scenes of CRAN
by Team | December 28, 2016 R , R-Bloggers

(Just from my point of view as a package maintainer.) New users of R might not appreciate the full benefit of CRAN and new package maintainers may not appreciate the importance of keeping their packages updated and free of warnings and errors. This is something I only came to realize myself in the last few years so I thought I would write...

Read more
What is new in H2O latest release (Tutte) ?
by Team | December 23, 2016 Community , H2O Release

Today we released H2O version (Tutte). It’s available on our Downloads page, and release notes can be found here . Photo Credit: Top enhancements in this release: GLM MOJO Support: GLM now supports our smaller, faster, more efficient MOJO (Model ObJect, Optimized) format for model pu...

Read more
Using Sentiment Analysis to Measure Election Surprise
by Team | December 01, 2016 Data Journalism

Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset. One recent event that ...

Read more
Indexing 1 Billion Time Series with H2O and ISax
by Team | November 11, 2016 Solutions , Technical , Tutorials

At H2O, we have recently debuted a new feature called ISax that works on time series data in an H2O Dataframe. ISax stands for Indexable Symbolic Aggregate ApproXimation, which means it can represent complex time series patterns using a symbolic notation and thereby reducing the dimensionality of your data. From there you can run H2O’s ML...

Read more
Why We Bought A Happy Diwali Billboard
by Team | October 21, 2016

It’s been a dark year in many ways, so we wanted to lighten things up and celebrate Diwali — the festival of lights! Diwali is a holiday that celebrates joy, hope, knowledge and all that is full of light — the perfect antidote for some of the more negative developments coming out of the Silicon Valley recently. Throw in a polarizing pre...

Read more
Creating a Binary Classifier to Sort Trump vs. Clinton Tweets Using NLP
by Team | October 17, 2016 Community , Data Journalism , Flow , Python

The problem : Can we determine if a tweet came from the Donald Trump Twitter account (@realDonaldTrump) or the Hillary Clinton Twitter account (@HillaryClinton) using text analysis and Natural Language Processing (NLP) alone? The Solution : Yes! We’ll divide this tutorial into three parts, the first on how to gather the necessary data, t...

Read more
sparklyr: R interface for Apache Spark
by Team | October 07, 2016 Community , R , Sparkling Water

This post is reposted from Rstudio’s announcement on sparklyr – Rstudio’s extension for Spark Connect to Spark from R. The sparklyr package provides a complete dplyr backend. Filter and aggregate Spark datasets then bring them into R for analysis and visualization. Use Spark’s distributed machine learning library from R. Create...

Read more
When is the Best Time to Look for Apartments on Craigslist?
by Team | October 06, 2016 Data Journalism

A while ago I was looking for an apartment in San Francisco. There are a lot of problems with finding housing in San Francisco, mostly stemming from the fierce competition. I was checking Craigslist every single day. It still took me (and my girlfriend) a few months to find a place — and we had to sublet for three weeks in between. Thankf...

Read more
by Team | September 23, 2016 Community

———- Forwarded message ——— From: SriSatish Ambati Date: Thu, Sep 15, 2016 at 10:17 PM Subject: changes and all hands tomorrow. To: team Team, Our focus has changed towards larger fewer deals & deeper engagements with handful of finance and insurance customers. We took a hard look at our marketing spend, pr programs and personnel. We l...

Read more
Distracted Driving
by Team | September 16, 2016 Data Journalism

Last week, we started to examine the 7.2% increase in traffic fatalities from 2014 to 2015, the reversal of a near decade-long downward trend. We then broke out the data by various accident classifications , such as “speeding” or “driving with a positive BAC,” and identified those classifications that had the greatest increase. One label...

Read more
Introducing H2O Community & Support Portals
by Team | September 09, 2016 Community , Customers

At H2O, we enjoy serving our customers and the community, and we take pride in making them successful while using H2O products. Today, we are very excited to announce two great platforms for our customers and for the community to better communicate with H2O. Let’s start with our community first: The success of every open source project ...

Read more
Fatal Traffic Accidents Rise in 2015
by Team | September 07, 2016 Data Journalism

On Tuesday, August 30th, the National Highway Traffic Safety Administration released their annual dataset of traffic fatalities asking interested parties to use the dataset to identify the causes of an increase of 7.2% in fatalities from 2014 to 2015. As part of ‘s vision of using artificial intelligence for the betterment of soci...

Read more
IoT - Take Charge of Your Business and IT Insights Starting at the Edge
by Team | August 22, 2016 IoT , Solutions

Instead of just being hype, the Internet of Things (IoT) is now becoming a reality. Gartner forecasts that 6.4 billion connected devices will be in use worldwide, and 5.5 million new devices will get connected every day, in 2016. These devices range from wearables, to sensors in vehicles the can detect surrounding obstacles, to sensors in...

Read more
Hyperparameter Optimization in H2O: Grid Search, Random Search and the Future
by Team | June 16, 2016 R-Bloggers , Technical , Tutorials

“Good, better, best. Never let it rest. ‘Til your good is better and your better is best.” – St. Jerome tl;drH2O now has random hyperparameter search with time- and metric-based early stopping. Bergstra and Bengio[1] write on p. 281: Compared with neural networks configured by a pure grid search, we find that random search over the s...

Read more
H2O GBM Tuning Tutorial for R
by Team | June 16, 2016

  In this tutorial, we show how to build a well-tuned H2O GBM model for a supervised classification task. We specifically don’t focus on feature engineering and use a small dataset to allow you to reproduce these results in a few minutes on a laptop. This script can be directly transferred to datasets that are hundreds of GBs large and H...

Read more
Spam Detection with Sparkling Water and Spark Machine Learning Pipelines
by Team | June 15, 2016 Sparkling Water , Technical , Tutorials

This short post presents the “ham or spam” demo, which has already been posted earlier by Michal Malohlava , using our new API in latest Sparkling Water for Spark 1.6 and earlier versions, unifying Spark and H2O Machine Learning pipelines. It shows how to create a simple Spark Machine Learning pipeline and a model based on the fitted pipe...

Read more
Interview with Carolyn Phillips, Sr. Data Scientist, Neurensic
by Team | May 27, 2016 Community , Customers , Events

During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the second of a multipart series recapping our conversations. Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai. How did you become a data scientist? Phillips: Until ...

Read more
Interview with Svetlana Kharlamova, ­Sr. Data Scientist, Grainger
by Team | May 25, 2016 Community , Customers , Events

During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the first of a multipart series recapping our conversations. Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai. How did you become a data scientist? Kharlamova: I’m a...

Read more
H2O Day at Capital One
by Team | May 11, 2016 Community , Customers , Events

Here at one of our most important partners is Capital One, and we’re proud to have been working with them for over a year. One of the world’s leading financial services providers, Capital One has a strong reputation for being an extremely data and technology-focused organization. That’s why when the Capital One team invited us to t...

Read more
Red herring bites
by Team | May 06, 2016 Data Munging , R-Bloggers , Technical

At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them or not; and ii) show how it scales with...

Read more
Fast csv writing for R
by Team | April 24, 2016 Data Munging , R , R-Bloggers , Technical

R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2...

Read more
Apache Spark and H2O on AWS
by Team | April 20, 2016 Community , Guest Posts

This is a guest post re-published with permission from our friends at Datapipe. The original lives here. One of the advantages of public cloud is the ability to experiment and run various workloads without the need to commit to purchasing hardware. However, to meet your data processing needs, a well-defined mapping between your objecti...

Read more
Connecting to Spark & Sparkling Water from R & Rstudio
by Team | March 24, 2016

Sparkling Water offers the best of breed machine learning for Spark users. Sparkling Water brings all of H2O’s advanced algorithms and capabilities to Spark. This means that you can continue to use H2O from Rstudio or any other ide of your choice. This post will walk you through the steps to get running on plain R or R studio from Spark. ...

Read more
Drink in the Data with H2O at Strata SJ 2016
by Team | March 21, 2016 Community , Demos , Events

It’s about to rain data in San Jose when Strata + Hadoop World comes to town March 29 – March 31st. H2O has a waterfall of action happening at the show. Here’s a rundown of what’s on tap. Keep it handy so you have less chance of FOMO (fear of missing out). Hang out with H2O at Booth #1225 to learn more about how machine learning can hel...

Read more
Road Ahead and BTUs
by Team | March 03, 2016 – Road Ahead – keynote presentation by Sri Ambati from Sri Ambati ...

Read more
Thank you, Cliff
by Team | February 24, 2016

Cliff resigned from the Company last week – He is parting on good terms and supports our success in future. Cliff and I worked closely since 2004 so this is a loss for me. It ends an era of prolific work supporting my vision as a partner. Let’s take this opportunity to congratulate Cliff on his work, in helping me build something from not...

Read more
The Top 10 Most Watched Videos From H2O World 2015
by Team | January 08, 2016 Community , Customers , Events , H2O World

Now that we’re a few months out from H2O World we wanted to share with you all what the most popular talks were by online viewership. The talks covered a variety of topics from introductions, to in-depth examinations of use cases, to wide-ranging panels. Introduction to Data Science Featuring Erin LeDell, Statistician and Machine Learnin...

Read more