Â
I created this guide to help fellow newbies get their feet wet with H2O, an open-source predictive analytics platform that is fast, powerful, and easy to use. Using a combination of extraordinary math and high-performance parallel processing, H2O allows you to quickly create models for big data. The steps below show you how to download and start analyzing data at high speeds with H2O. After that itâs up to you.
(If you donât feel like reading the long version below just go here )
I recommend downloading the latest release of H2O (which is âBleeding Edge â as of this moment) because it has the most Python features, but you can also see the other releases here , as well as the software requirements. Okay, Letâs get started:
Do you have Java on your computer? No sure? Hereâs how to check:
MacBook-Pro:~ username$ java -version
Â
If you donât have Java you can either click through the pop up dialogue box and make your way to the correct downloadable version, or you can go directly to the Java downloads page here (two-for-one tip: download the Java Development Kit and get the Java Runtime Environment with it).
Now that you have Java (fingers crossed), you can download H2O (Iâm assuming you have Python, but if you donât, consider downloading Anaconda which gives you access to amazing Python packages for data analysis and scientific computing).
You can find the official instructions to Download H2Oâs âBleeding Edgeâ release here (click on the âInstall in Pythonâ tab), or follow below:
Fellow newbies donât type in the âMacBook-Pro:~ username$â part only type in whatâs listed after the â$â: (you can get more command line help here ).
MacBook-Pro:~ username$ pip install requests
MacBook-Pro:~ username$ pip install tabulate
MacBook-Pro:~ username$ pip install scikit-learn
MacBook-Pro:~ username$ pip uninstall h2o
MacBook-Pro:~ username$ pip install http://h2o-release.s3.amazonaws.com/h2o/master/3250/Python/h2o-3.7.0.3250-py2.py3-none-any.whl
As shown above, if you installed an earlier version of H2O, uninstalling and reinstalling H2O with pip will do the trick.
If donât already have IPython Notebook, you can download it following these instructions . If you downloaded Anaconda, it comes with IPython Notebook so youâre set. And hereâs a video tutorial on how to use IPython Notebook.
If everything goes as planned, to open IPython Notebook you âcdâ to your directory of choice (I chose my Desktop folder) and enter âipython notebookâ. (If youâre still new to the command line, learn more about using âcdâ, which I like to use as a verb, here and here ).
MacBook-Pro:~ username$ cd Desktop
MacBook-Pro:Desktop username$ ipython notebook
Random Note: After I updated to OS X El Capitan the command above didnât work. For many people using âconda update condaâ and then âconda update ipythonâ will solve the issue, but in my case I got an SSL error that wouldnât let me âconda updateâ anything. I found the solution here , using:
MacBook-Pro:~ username$ conda config â set ssl_verify False
MacBook-Pro:~ username$ conda update requests openssl
MacBook-Pro:~ username$ conda config â set ssl_verify True
Now that you have IPython Notebook, you can play around with some of H2Oâs demo notebooks. If youâre new to Github, however, downloading the demos to your desktop can seem daunting, but donât worry itâs easy. Hereâs the trick:
Classifying Handwritten DigitsâââEnter a Kaggle Competition
A great way to get a feel for H2O is to test it out on a Kaggle data science competition. Donât know what Kaggle is? Never enter a Kaggle Competition? Thatâs totally fine, Iâll give you a script to get your feet wet. If youâre still nervous hereâs a great article about how to get started with Kaggle given your previous experience.
Are you excited? Get excited! You are going to teach your computer to recognize HANDWRITTEN DIGITS! (I feel like if youâre still ready at this point, itâs time to let my enthusiasm shine through).
Getting HelpâââResources & Documentation