Please follow the updated version of tutorials here
H2O is hosting a meetup tomorrow at our officewhere attendees are encourage to hack away with us as we run Deep Learning on Sparkling Water. If you havenât already read allabout H2 Oâs integration into Spark then get started withHow Sparkling Water Brings H2O to Spark and Sparkling Water! .
For those who canât attend the meetup tomorrow or for the overachievers that want to get a head start and come with an arsenalof probing questions for our speaker, Michal Malohlava, we have the zip file with the prepackaged demo ready for download.
Â
Step 1 â PrerequisitesÂ
Step 2 â Download the zip fileÂ
Step 3 â Unzip the demo.zip file and run the example scriptÂ
$ unzip demo.zip
$ cd perrier/h2o-examples
$ export MASTER="local-cluster[3,2,1024]"
$ ./run-example.sh
Â
Note: If your machine detects multiple home addresses that H2O can launch on look for the IP address that H2O actually launches onwhich can be found under âAttempting to determine correct addressâ. Cancel the operation and set Sparkâs Local IP address to whereH2O was launching on and execute the example script again.Â
$ export SPARK_LOCAL_IP='127.0.0.1'
$ ./run-example.sh
Â
Step 1 â PrerequisitesÂ
Step 2 â Download the zip fileÂ
Step 3 â Unzip the demo.zip file and launch the Spark clusterÂ
$ unzip demo.zip
$ cd perrier/sbin
$ ./launch-spark-cloud.sh
$ export MASTER="spark://localhost:7077"
Â
Step 4 â Run the example scriptÂ
$ cd ../h2o-examples
$ ./run-example.sh
Â
For those adventurous enough to play with the source code, there is workflow available that will give the user more flexibilityso that different datasets can be used and different algothrims can be tried.
Â
Step 1 â PrerequisitesÂ
Step 2 â Download the zip fileÂ
Step 3 â Launch a Spark Cluster (UI can be accessed at localhost:8080)Â
$ unzip demo.zip
$ cd perrier/sbin
$ ./launch-spark-cloud.sh
$ export MASTER="spark://localhost:7077"
Â
Note: If your machine detects multiple home addresses that H2O can launch on look for the IP address that H2O actually launches onwhich can be found under âAttempting to determine correct addressâ. Set Sparkâs Local IP address to whereH2O was launching on, for example:Â
$ export SPARK_LOCAL_IP='127.0.0.1'
Â
Step 4 â Start the Spark ShellÂ
$ cd ../h2o-examples
$ ./sparkling-shell
Â
Step 5 â Import H2O Client App and Launch H2O (UI can be accessed at localhost:54321)Â â`scalaimport water.H2OClientAppH2OClientApp.start()import water.H2OH2O.waitForCloudSize(3, 10000)
import java.io.Fileimport water.fvec.import org.apache.spark.examples.h2o._import org.apache.spark.h2o. â`
Step 6 â Import dataÂscala
val dataFile = "../h2o-examples/smalldata/allyears2k_headers.csv.gz"
val airlinesData = new DataFrame(new File(dataFile))
Â
Step 7 â Move Data from Spark to H2O RDD (new RDD type in Spark) and count the number of flights in the airlines data â`scalaval h2oContext = new H2OContext(sc)import h2oContext._import org.apache.spark.rdd.RDD
val airlinesTable : RDD[Airlines] = toRDDAirlines airlinesTable.countâ`
Step 8 â Do the same count int SparkÂscala
val flightsOnlyToSFO = airlinesTable.filter( _.Dest.equals(Some("SFO")) )
flightsOnlyToSFO.count
Â
Step 9 â Run a SQL query that will only return flights flying to SFOÂscala
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext._ // import implicit conversions
airlinesTable.registerTempTable("airlinesTable")
val query = "SELECT * FROM airlinesTable WHERE Dest LIKE 'SFO'"
val result = sql(query) // Using a registered context and tables
result.count
result.count == flightsOnlyToSFO.count
Â
Step 10 â Set the parameters for running a Deep Learning model and build a model â`scalaimport hex.deeplearning._import hex.deeplearning.DeepLearningModel.DeepLearningParametersval dlParams = new DeepLearningParameters()
dlParams._training_frame = result( âYear, âMonth, âDayofMonth, âDayOfWeek, âCRSDepTime, âCRSArrTime,âUniqueCarrier, âFlightNum, âTailNum, âCRSElapsedTime, âOrigin, âDest,âDistance, âIsDepDelayed)dlParams.response_column = âIsDepDelayed.nameval dl = new DeepLearning(dlParams)val dlModel = dl.train.getâ`
Step 11 â Score on the Deep Learning model and grab the output predictionsÂscala
val predictionH2OFrame = dlModel.score(result)('predict)
val predictionsFromModel = toRDDDoubleHolder.map ( _.result.getOrElse("NaN") ).collect
Â
Step 1 â Launch Sandbox Download Virtualbox in order to use the OVA file.Start the OVA file and log in as user ops:
Note: password for sudo (root) is also 0xdataÂ
user: ops
password: 0xdata
$ ./upgrade.sh
Â
Step 3 â Try Sparkling Water REPLÂ
Simply run sshell to start the Spark shell and run H2O operations from the shell.
$ /opt/sparkling/h2o-examples/sparkling-shell