Sparkling Water on YARN Example

Follow these easy steps to get your first Sparkling Water example to run on a YARN cluster. This example uses Hortonworks HDP 2.1.

1. Assumptions

Installed:

Java 1.7+
YARN cluster

Note: In the current version of Sparkling Water running on YARN, the cluster formation requires multicast to work for the H₂O nodes to find each other. This requirement will be removed in a future version.

2. Download the software

Download the right version of Spark for your YARN cluster from this web page. The YARN cluster used in this example is HDP 2.1, so I chose Hadoop 2.4.
Download the Sparkling Water zip file distribution.

3. Unpack the software

Put the downloaded distributions of Spark and Sparkling Water in a single directory.

$ ls -l

total 192004
-rw-r--r-- 1 tomk tomk 191475104 Oct 31 15:43 spark-1.1.0-bin-hadoop2.4.tgz
-rw-rw-r-- 1 tomk tomk 5135267 Nov 2 16:21 sparkling-water-0.2.1-58.zip

$ tar zxf spark-1.1.0-bin-hadoop2.4.tgz$ unzip sparkling-water-0.2.1-58.zip

Archive: sparkling-water-0.2.1-58.zip
creating: sparkling-water-0.2.1-58/
inflating: sparkling-water-0.2.1-58/README.md
[ ... lots of files ... ]

4. Run the example

Change to the unpacked Spark directory

$ cd spark-1.1.0-bin-hadoop2.4/

Set the HADOOP_CONF_DIR environment variable (your cluster may vary)

$ export HADOOP_CONF_DIR=/etc/hadoop/conf

Use spark-submit to launch the Sparkling Water application job on YARN

$ bin/spark-submit --class water.SparklingWaterDriver --master yarn-client --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 ../sparkling-water-0.2.1-58/assembly/build/libs/*.jar

Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/11/02 16:25:39 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:25:39 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:25:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:25:39 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/11/02 16:25:39 INFO Remoting: Starting remoting
14/11/02 16:25:40 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@mr-0xd1.0xdata.loc:36089]
14/11/02 16:25:40 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@mr-0xd1.0xdata.loc:36089]
14/11/02 16:25:40 INFO util.Utils: Successfully started service 'sparkDriver' on port 36089.
14/11/02 16:25:40 INFO spark.SparkEnv: Registering MapOutputTracker
14/11/02 16:25:40 INFO spark.SparkEnv: Registering BlockManagerMaster
14/11/02 16:25:40 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141102162540-53d8
14/11/02 16:25:40 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 40286.
14/11/02 16:25:40 INFO network.ConnectionManager: Bound socket to port 40286 with id = ConnectionManagerId(mr-0xd1.0xdata.loc,40286)
14/11/02 16:25:40 INFO storage.MemoryStore: MemoryStore started with capacity 2.1 GB
14/11/02 16:25:40 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/11/02 16:25:40 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd1.0xdata.loc:40286 with 2.1 GB RAM
14/11/02 16:25:40 INFO storage.BlockManagerMaster: Registered BlockManager
14/11/02 16:25:40 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-23e3ba13-04ba-4a34-87ac-c69147266d8d
14/11/02 16:25:40 INFO spark.HttpServer: Starting HTTP Server
14/11/02 16:25:40 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:25:40 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:51874
14/11/02 16:25:40 INFO util.Utils: Successfully started service 'HTTP file server' on port 51874.
14/11/02 16:25:45 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:25:45 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/11/02 16:25:45 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/11/02 16:25:45 INFO ui.SparkUI: Started SparkUI at http://mr-0xd1.0xdata.loc:4040
14/11/02 16:25:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/02 16:25:45 INFO spark.SparkContext: Added JAR file:/home/tomk/tmp/spark-1.1.0-bin-hadoop2.4/../sparkling-water-0.2.1-58/assembly/build/libs/sparkling-water-assembly-0.2.1-58-all.jar at http://172.16.2.181:51874/jars/sparkling-water-assembly-0.2.1-58-all.jar with timestamp 1414974345939
--args is deprecated. Use --arg instead.
14/11/02 16:25:46 INFO client.RMProxy: Connecting to ResourceManager at mr-0xd7.0xdata.loc/172.16.2.187:8050
14/11/02 16:25:46 INFO yarn.Client: Got cluster metric info from ResourceManager, number of NodeManagers: 8
14/11/02 16:25:46 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 227328
14/11/02 16:25:46 INFO yarn.Client: Preparing Local resources
14/11/02 16:25:46 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
14/11/02 16:25:46 INFO yarn.Client: Uploading file:/home/tomk/tmp/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar to hdfs://mr-0xd6.0xdata.loc/user/tomk/.sparkStaging/application_1413598290344_0066/spark-assembly-1.1.0-hadoop2.4.0.jar
14/11/02 16:25:48 INFO yarn.Client: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "mr-0xd6.0xdata.loc" port: -1 file: "/user/tomk/.sparkStaging/application_1413598290344_0066/spark-assembly-1.1.0-hadoop2.4.0.jar" } size: 138884949 timestamp: 1414974348190 type: FILE visibility: PRIVATE)
14/11/02 16:25:48 INFO yarn.Client: Setting up the launch environment
14/11/02 16:25:48 INFO yarn.Client: Setting up container launch context
14/11/02 16:25:48 INFO yarn.Client: Yarn AM launch context:
14/11/02 16:25:48 INFO yarn.Client: class: org.apache.spark.deploy.yarn.ExecutorLauncher
14/11/02 16:25:48 INFO yarn.Client: env: Map(CLASSPATH -> $PWD:$PWD/__spark__.jar:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$PWD/__app__.jar:$PWD/*, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 138884949, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1413598290344_0066/, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE, SPARK_USER -> tomk, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1414974348190, SPARK_YARN_CACHE_FILES -> hdfs://mr-0xd6.0xdata.loc/user/tomk/.sparkStaging/application_1413598290344_0066/spark-assembly-1.1.0-hadoop2.4.0.jar#__spark__.jar)
14/11/02 16:25:48 INFO yarn.Client: command: $JAVA_HOME/bin/java -server -Xmx4096m -Djava.io.tmpdir=$PWD/tmp '-Dspark.tachyonStore.folderName=spark-b2067b5c-d854-475b-be02-a67b66a4c8f5' '-Dspark.driver.memory=4g' '-Dspark.executor.memory=2g' '-Dspark.executor.instances=3' '-Dspark.yarn.secondary.jars=' '-Dspark.driver.host=mr-0xd1.0xdata.loc' '-Dspark.driver.appUIHistoryAddress=' '-Dspark.app.name=Sparkling Water' '-Dspark.driver.appUIAddress=mr-0xd1.0xdata.loc:4040' '-Dspark.jars=file:/home/tomk/tmp/spark-1.1.0-bin-hadoop2.4/../sparkling-water-0.2.1-58/assembly/build/libs/sparkling-water-assembly-0.2.1-58-all.jar' '-Dspark.fileserver.uri=http://172.16.2.181:51874' '-Dspark.master=yarn-client' '-Dspark.driver.port=36089' '-Dspark.executor.cores=1' org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar null --arg 'mr-0xd1.0xdata.loc:36089' --executor-memory 2048 --executor-cores 1 --num-executors 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
14/11/02 16:25:48 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:25:48 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:25:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:25:48 INFO yarn.Client: Submitting application to ResourceManager
14/11/02 16:25:48 INFO impl.YarnClientImpl: Submitted application application_1413598290344_0066
14/11/02 16:25:48 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1414974348389
yarnAppState: ACCEPTED
14/11/02 16:25:49 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1414974348389
yarnAppState: ACCEPTED
14/11/02 16:25:50 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1414974348389
yarnAppState: ACCEPTED
14/11/02 16:25:51 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, PROXY_HOST=mr-0xd7.0xdata.loc,PROXY_URI_BASE=http://mr-0xd7.0xdata.loc:8088/proxy/application_1413598290344_0066, /proxy/application_1413598290344_0066
14/11/02 16:25:51 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
14/11/02 16:25:51 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: 0
appStartTime: 1414974348389
yarnAppState: RUNNING
14/11/02 16:25:53 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd1.0xdata.loc:55555/user/Executor#-58639133] with ID 2
14/11/02 16:25:53 INFO util.RackResolver: Resolved mr-0xd1.0xdata.loc to /default-rack
14/11/02 16:25:53 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd1.0xdata.loc:44336 with 1060.3 MB RAM
14/11/02 16:25:53 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd2.0xdata.loc:48257/user/Executor#-1480833723] with ID 1
14/11/02 16:25:53 INFO util.RackResolver: Resolved mr-0xd2.0xdata.loc to /default-rack
14/11/02 16:25:54 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd2.0xdata.loc:52830 with 1060.3 MB RAM
14/11/02 16:25:54 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd10.0xdata.loc:38855/user/Executor#-1943381558] with ID 3
14/11/02 16:25:54 INFO util.RackResolver: Resolved mr-0xd10.0xdata.loc to /default-rack
14/11/02 16:25:54 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
14/11/02 16:25:54 INFO h2o.H2OContext: Starting -1 H2O nodes...
14/11/02 16:25:54 INFO spark.SparkContext: Starting job: collect at H2OContextUtils.scala:28
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Got job 0 (collect at H2OContextUtils.scala:28) with 200 output partitions (allowLocal=false)
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at H2OContextUtils.scala:28)
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Missing parents: List()
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at H2OContextUtils.scala:23), which has no missing parents
14/11/02 16:25:54 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd10.0xdata.loc:60447 with 1060.3 MB RAM
14/11/02 16:25:54 INFO storage.MemoryStore: ensureFreeSpace(1832) called with curMem=0, maxMem=2223023063
14/11/02 16:25:54 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 2.1 GB)
14/11/02 16:25:54 INFO storage.MemoryStore: ensureFreeSpace(1218) called with curMem=1832, maxMem=2223023063
14/11/02 16:25:54 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1218.0 B, free 2.1 GB)
14/11/02 16:25:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on mr-0xd1.0xdata.loc:40286 (size: 1218.0 B, free: 2.1 GB)
14/11/02 16:25:54 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Submitting 200 missing tasks from Stage 0 (MappedRDD[1] at map at H2OContextUtils.scala:23)
14/11/02 16:25:54 INFO cluster.YarnClientClusterScheduler: Adding task set 0.0 with 200 tasks
14/11/02 16:25:54 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, mr-0xd10.0xdata.loc, PROCESS_LOCAL, 1234 bytes)
[ ... lots of scheduler.TaskSetManager output deleted ... ]
14/11/02 16:32:55 INFO scheduler.DAGScheduler: Stage 1 (collect at H2OContextUtils.scala:67) finished in 1.213 s
14/11/02 16:32:55 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
14/11/02 16:32:55 INFO spark.SparkContext: Job finished: collect at H2OContextUtils.scala:67, took 1.284518985 s
14/11/02 16:32:55 INFO h2o.H2OContext: Sparkling H2O - H2O status: (3,true),(1,true),(2,true),(2,true),(3,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true)
11-02 16:32:55.152 172.16.2.181:54321 24915 main INFO: ----- H2O started (client) -----
11-02 16:32:55.182 172.16.2.181:54321 24915 main INFO: Build git branch: (unknown)
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Build git hash: (unknown)
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Build git describe: (unknown)
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Build project version: (unknown)
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Built by: '(unknown)'
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Built on: '(unknown)'
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Java availableProcessors: 32
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Java heap totalMemory: 3.83 GB
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Java heap maxMemory: 3.83 GB
11-02 16:32:55.183 172.16.2.181:54321 24915 main INFO: Java version: Java 1.7.0_72 (from Oracle Corporation)
11-02 16:32:55.184 172.16.2.181:54321 24915 main INFO: OS version: Linux 3.13.0-35-generic (amd64)
11-02 16:32:55.184 172.16.2.181:54321 24915 main INFO: Possible IP Address: vethJQ8838 (vethJQ8838), fe80:0:0:0:fc0f:adff:fe93:af9b%104
11-02 16:32:55.184 172.16.2.181:54321 24915 main INFO: Possible IP Address: br2 (br2), fe80:0:0:0:a236:9fff:fe35:4262%9
11-02 16:32:55.184 172.16.2.181:54321 24915 main INFO: Possible IP Address: br2 (br2), 172.16.2.181
11-02 16:32:55.184 172.16.2.181:54321 24915 main INFO: Possible IP Address: lo (lo), 0:0:0:0:0:0:0:1%1
11-02 16:32:55.184 172.16.2.181:54321 24915 main INFO: Possible IP Address: lo (lo), 127.0.0.1
11-02 16:32:55.184 172.16.2.181:54321 24915 main INFO: Internal communication uses port: 54322
11-02 16:32:55.184 172.16.2.181:54321 24915 main INFO: Listening for HTTP and REST traffic on http://172.16.2.181:54321/
11-02 16:32:55.185 172.16.2.181:54321 24915 main INFO: H2O cloud name: 'sparkling-water-42' on /172.16.2.181:54321, discovery address /235.34.3.46:60194
11-02 16:32:55.185 172.16.2.181:54321 24915 main INFO: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
11-02 16:32:55.185 172.16.2.181:54321 24915 main INFO: 1. Open a terminal and run 'ssh -L 55555:localhost:54321 tomk@172.16.2.181'
11-02 16:32:55.185 172.16.2.181:54321 24915 main INFO: 2. Point your browser to http://localhost:55555
11-02 16:32:55.286 172.16.2.181:54321 24915 main INFO: Log dir: '/tmp/h2o-tomk/h2ologs'
11-02 16:32:55.521 172.16.2.181:54321 24915 FJ-126-15 INFO: Cloud of size 1 formed [/172.16.2.183:54321]
11-02 16:32:58.348 172.16.2.181:54321 24915 FJ-126-15 INFO: Cloud of size 3 formed [/172.16.2.183:54321, /172.16.2.188:54321, /172.16.2.189:54321]

You’re up and running!

At this point, Sparkling Water is running and waiting for you to connect. Point your web browser to one of addresses listed on last line, for example 172.16.2.183:54321, to access the H₂O Web UI.

5. Summary

Here is a breakdown of the command-line flags we passed to spark-submit above:

Option	Description
--class water.SparklingWaterDriver	Class name of the Scala main driver program. In this case Sparkling Water driver is used. It launches H2O services on all Spark cluster nodes.
--master yarn-client	Launch using Spark on YARN in client mode. Use yarn-cluster to launch a job in cluster mode.
--num-executors 3	Number of Spark executors to start (must be number of H2O nodes)
--driver-memory 4g	Memory to give the driver program
--executor-memory 2g	Memory to give to each H2O node
--executor-cores 1	Before Hadoop 2.2, YARN does not support cores in container resource requests. Thus, when running against an earlier version, the numbers of cores given via command line arguments cannot be passed to YARN. Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.
../sparkling-water-0.2.1-58/assembly/build/libs/*.jar	Sparkling Water application jar file

Note: To manage the running application on the YARN cluster, use the yarn command. For example, to list running applications: yarn application -appStates RUNNING -list

Launch with Sparkling Shell

Set Spark home

$ cd spark-1.1.0-bin-hadoop2.4/$ export SPARK_HOME=$(pwd)

Set Hadoop Yarn configuration

$ export HADOOP_CONF_DIR=/etc/hadoop/conf

Export YARN cluster reference for Spark runtime

$ export MASTER="yarn-client"

Run Sparkling Shell

$ cd ../sparkling-water-0.2.1-58/$ bin/sparkling-shell --num-executors 3 --executor-memory 2g --executor-cores 1 --master yarn-client

Using
MASTER = yarn-client"
SPARK_HOME = /home/tomk/tmp/spark-1.1.0-bin-hadoop2.4"
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/11/02 16:50:44 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:50:44 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:50:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:50:44 INFO spark.HttpServer: Starting HTTP Server
14/11/02 16:50:44 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:50:44 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60849
14/11/02 16:50:44 INFO util.Utils: Successfully started service 'HTTP class server' on port 60849.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_72)
Type in expressions to have them evaluated.
Type :help for more information.
14/11/02 16:50:48 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:50:48 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:50:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:50:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/11/02 16:50:49 INFO Remoting: Starting remoting
14/11/02 16:50:49 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@mr-0xd1.0xdata.loc:55740]
14/11/02 16:50:49 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@mr-0xd1.0xdata.loc:55740]
14/11/02 16:50:49 INFO util.Utils: Successfully started service 'sparkDriver' on port 55740.
14/11/02 16:50:49 INFO spark.SparkEnv: Registering MapOutputTracker
14/11/02 16:50:49 INFO spark.SparkEnv: Registering BlockManagerMaster
14/11/02 16:50:49 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141102165049-070c
14/11/02 16:50:49 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 36632.
14/11/02 16:50:49 INFO network.ConnectionManager: Bound socket to port 36632 with id = ConnectionManagerId(mr-0xd1.0xdata.loc,36632)
14/11/02 16:50:49 INFO storage.MemoryStore: MemoryStore started with capacity 1589.8 MB
14/11/02 16:50:49 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/11/02 16:50:49 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd1.0xdata.loc:36632 with 1589.8 MB RAM
14/11/02 16:50:49 INFO storage.BlockManagerMaster: Registered BlockManager
14/11/02 16:50:49 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-8cbcd2b8-ebc6-406d-b69c-3d2596bc9707
14/11/02 16:50:49 INFO spark.HttpServer: Starting HTTP Server
14/11/02 16:50:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:50:49 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:34458
14/11/02 16:50:49 INFO util.Utils: Successfully started service 'HTTP file server' on port 34458.
14/11/02 16:50:54 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:50:54 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/11/02 16:50:54 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/11/02 16:50:54 INFO ui.SparkUI: Started SparkUI at http://mr-0xd1.0xdata.loc:4040
14/11/02 16:50:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/02 16:50:54 INFO spark.SparkContext: Added JAR file:/home/tomk/tmp/sparkling-water-0.2.1-58/assembly/build/libs/sparkling-water-assembly-0.2.1-58-all.jar at http://172.16.2.181:34458/jars/sparkling-water-assembly-0.2.1-58-all.jar with timestamp 1414975854959
--args is deprecated. Use --arg instead.
14/11/02 16:50:55 INFO client.RMProxy: Connecting to ResourceManager at mr-0xd7.0xdata.loc/172.16.2.187:8050
14/11/02 16:50:55 INFO yarn.Client: Got cluster metric info from ResourceManager, number of NodeManagers: 8
14/11/02 16:50:55 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 227328
14/11/02 16:50:55 INFO yarn.Client: Preparing Local resources
14/11/02 16:50:55 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
14/11/02 16:50:55 INFO yarn.Client: Uploading file:/home/tomk/tmp/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar to hdfs://mr-0xd6.0xdata.loc/user/tomk/.sparkStaging/application_1413598290344_0068/spark-assembly-1.1.0-hadoop2.4.0.jar
14/11/02 16:50:57 INFO yarn.Client: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "mr-0xd6.0xdata.loc" port: -1 file: "/user/tomk/.sparkStaging/application_1413598290344_0068/spark-assembly-1.1.0-hadoop2.4.0.jar" } size: 138884949 timestamp: 1414975857136 type: FILE visibility: PRIVATE)
14/11/02 16:50:57 INFO yarn.Client: Setting up the launch environment
14/11/02 16:50:57 INFO yarn.Client: Setting up container launch context
14/11/02 16:50:57 INFO yarn.Client: Yarn AM launch context:
14/11/02 16:50:57 INFO yarn.Client: class: org.apache.spark.deploy.yarn.ExecutorLauncher
14/11/02 16:50:57 INFO yarn.Client: env: Map(CLASSPATH -> $PWD:$PWD/__spark__.jar:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$PWD/__app__.jar:$PWD/*, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 138884949, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1413598290344_0068/, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE, SPARK_USER -> tomk, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1414975857136, SPARK_YARN_CACHE_FILES -> hdfs://mr-0xd6.0xdata.loc/user/tomk/.sparkStaging/application_1413598290344_0068/spark-assembly-1.1.0-hadoop2.4.0.jar#__spark__.jar)
14/11/02 16:50:57 INFO yarn.Client: command: $JAVA_HOME/bin/java -server -Xmx3072m -Djava.io.tmpdir=$PWD/tmp '-Dspark.tachyonStore.folderName=spark-b990a005-0b25-4775-9bb9-94792222ca20' '-Dspark.driver.memory=3G' '-Dspark.executor.memory=2g' '-Dspark.executor.instances=3' '-Dspark.yarn.secondary.jars=' '-Dspark.repl.class.uri=http://172.16.2.181:60849' '-Dspark.driver.host=mr-0xd1.0xdata.loc' '-Dspark.driver.appUIHistoryAddress=' '-Dspark.app.name=Spark shell' '-Dspark.driver.appUIAddress=mr-0xd1.0xdata.loc:4040' '-Dspark.jars=file:/home/tomk/tmp/sparkling-water-0.2.1-58/assembly/build/libs/sparkling-water-assembly-0.2.1-58-all.jar' '-Dspark.fileserver.uri=http://172.16.2.181:34458' '-Dspark.driver.port=55740' '-Dspark.master=yarn-client' '-Dspark.executor.cores=1' org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar null --arg 'mr-0xd1.0xdata.loc:55740' --executor-memory 2048 --executor-cores 1 --num-executors 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
14/11/02 16:50:57 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:50:57 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:50:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:50:57 INFO yarn.Client: Submitting application to ResourceManager
14/11/02 16:50:57 INFO impl.YarnClientImpl: Submitted application application_1413598290344_0068
14/11/02 16:50:57 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1414975857313
yarnAppState: ACCEPTED
14/11/02 16:50:58 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1414975857313
yarnAppState: ACCEPTED
14/11/02 16:50:59 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1414975857313
yarnAppState: ACCEPTED
14/11/02 16:51:00 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1414975857313
yarnAppState: ACCEPTED
14/11/02 16:51:00 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, PROXY_HOST=mr-0xd7.0xdata.loc,PROXY_URI_BASE=http://mr-0xd7.0xdata.loc:8088/proxy/application_1413598290344_0068, /proxy/application_1413598290344_0068
14/11/02 16:51:00 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
14/11/02 16:51:01 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: 0
appStartTime: 1414975857313
yarnAppState: RUNNING
14/11/02 16:51:03 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd3.0xdata.loc:36741/user/Executor#-407387512] with ID 3
14/11/02 16:51:03 INFO util.RackResolver: Resolved mr-0xd3.0xdata.loc to /default-rack
14/11/02 16:51:03 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd3.0xdata.loc:57269 with 1060.3 MB RAM
14/11/02 16:51:05 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd2.0xdata.loc:42829/user/Executor#-1195655046] with ID 2
14/11/02 16:51:05 INFO util.RackResolver: Resolved mr-0xd2.0xdata.loc to /default-rack
14/11/02 16:51:05 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd9.0xdata.loc:41874/user/Executor#1710426358] with ID 1
14/11/02 16:51:05 INFO util.RackResolver: Resolved mr-0xd9.0xdata.loc to /default-rack
14/11/02 16:51:05 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
14/11/02 16:51:05 INFO repl.SparkILoop: Created spark context..
14/11/02 16:51:05 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd2.0xdata.loc:38121 with 1060.3 MB RAM
14/11/02 16:51:05 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd9.0xdata.loc:34657 with 1060.3 MB RAM
Spark context available as sc.
scala>

Now you can launch H₂O services on the Spark cluster.

scala> import org.apache.spark.h2o._

import org.apache.spark.h2o._

scala> val h2oContext = new H2OContext(sc).start()

14/11/02 16:54:06 INFO h2o.H2OContext: Starting -1 H2O nodes...
14/11/02 16:54:07 INFO spark.SparkContext: Starting job: collect at H2OContextUtils.scala:28
14/11/02 16:54:07 INFO scheduler.DAGScheduler: Got job 0 (collect at H2OContextUtils.scala:28) with 200 output partitions (allowLocal=false)
[ ... lots of output ... ]
11-02 15:14:21.816 172.16.2.181:54323 17176 FJ-126-15 INFO: Cloud of size 1 formed [/172.16.2.182:54321]
11-02 15:14:24.678 172.16.2.181:54323 17176 FJ-126-15 INFO: Cloud of size 3 formed [/172.16.2.181:54321, /172.16.2.182:54321, /172.16.2.184:54321]
h2oContext: org.apache.spark.h2o.H2OContext =
Sparkling H2O setup:
workers=-1
flatfile: false
basePort: 54321
incrPort: 2
drddMulFactor: 10

scala> import h2oContext._

import h2oContext._

Now Spark and H₂O services are running and available for use.

7. Helpful resources

Spark resources:

Sparkling water:

http://0xdata.com/download

Explore similar content by topic

H2O.ai Team

At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.

Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.

Generative AI

Predictive AI

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

HEALTHCARE

ENERGY

FINANCIAL INDUSTRIES

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

What is an AI Cloud?

2023 Gartner® Magic Quadrant™

BLOG