November 1st, 2014

Sparkling Water on YARN Example

RSS icon RSS Category: Uncategorized [EN]
Fallback Featured Image

Follow these easy steps to get your first Sparkling Water example to run on a YARN cluster. This example uses Hortonworks HDP 2.1.

1. Assumptions

Installed:

  • Java 1.7+
  • YARN cluster

Note: In the current version of Sparkling Water running on YARN, the cluster formation requires multicast to work for the H2O nodes to find each other. This requirement will be removed in a future version.

2. Download the software

  • Download the right version of Spark for your YARN cluster from this web page. The YARN cluster used in this example is HDP 2.1, so I chose Hadoop 2.4.

  • Download the Sparkling Water zip file distribution.

3. Unpack the software

Put the downloaded distributions of Spark and Sparkling Water in a single directory.

$ ls -l

total 192004
-rw-r--r-- 1 tomk tomk 191475104 Oct 31 15:43 spark-1.1.0-bin-hadoop2.4.tgz
-rw-rw-r-- 1 tomk tomk   5135267 Nov  2 16:21 sparkling-water-0.2.1-58.zip

$ tar zxf spark-1.1.0-bin-hadoop2.4.tgz$ unzip sparkling-water-0.2.1-58.zip

Archive:  sparkling-water-0.2.1-58.zip
   creating: sparkling-water-0.2.1-58/
  inflating: sparkling-water-0.2.1-58/README.md
[ ... lots of files ... ]

4. Run the example

Change to the unpacked Spark directory

$ cd spark-1.1.0-bin-hadoop2.4/

Set the HADOOP_CONF_DIR environment variable (your cluster may vary)

$ export HADOOP_CONF_DIR=/etc/hadoop/conf

Use spark-submit to launch the Sparkling Water application job on YARN

$ bin/spark-submit --class water.SparklingWaterDriver --master yarn-client --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 ../sparkling-water-0.2.1-58/assembly/build/libs/*.jar

Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/11/02 16:25:39 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:25:39 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:25:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:25:39 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/11/02 16:25:39 INFO Remoting: Starting remoting
14/11/02 16:25:40 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@mr-0xd1.0xdata.loc:36089]
14/11/02 16:25:40 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@mr-0xd1.0xdata.loc:36089]
14/11/02 16:25:40 INFO util.Utils: Successfully started service 'sparkDriver' on port 36089.
14/11/02 16:25:40 INFO spark.SparkEnv: Registering MapOutputTracker
14/11/02 16:25:40 INFO spark.SparkEnv: Registering BlockManagerMaster
14/11/02 16:25:40 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141102162540-53d8
14/11/02 16:25:40 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 40286.
14/11/02 16:25:40 INFO network.ConnectionManager: Bound socket to port 40286 with id = ConnectionManagerId(mr-0xd1.0xdata.loc,40286)
14/11/02 16:25:40 INFO storage.MemoryStore: MemoryStore started with capacity 2.1 GB
14/11/02 16:25:40 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/11/02 16:25:40 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd1.0xdata.loc:40286 with 2.1 GB RAM
14/11/02 16:25:40 INFO storage.BlockManagerMaster: Registered BlockManager
14/11/02 16:25:40 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-23e3ba13-04ba-4a34-87ac-c69147266d8d
14/11/02 16:25:40 INFO spark.HttpServer: Starting HTTP Server
14/11/02 16:25:40 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:25:40 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:51874
14/11/02 16:25:40 INFO util.Utils: Successfully started service 'HTTP file server' on port 51874.
14/11/02 16:25:45 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:25:45 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/11/02 16:25:45 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/11/02 16:25:45 INFO ui.SparkUI: Started SparkUI at http://mr-0xd1.0xdata.loc:4040
14/11/02 16:25:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/02 16:25:45 INFO spark.SparkContext: Added JAR file:/home/tomk/tmp/spark-1.1.0-bin-hadoop2.4/../sparkling-water-0.2.1-58/assembly/build/libs/sparkling-water-assembly-0.2.1-58-all.jar at http://172.16.2.181:51874/jars/sparkling-water-assembly-0.2.1-58-all.jar with timestamp 1414974345939
--args is deprecated. Use --arg instead.
14/11/02 16:25:46 INFO client.RMProxy: Connecting to ResourceManager at mr-0xd7.0xdata.loc/172.16.2.187:8050
14/11/02 16:25:46 INFO yarn.Client: Got cluster metric info from ResourceManager, number of NodeManagers: 8
14/11/02 16:25:46 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 227328
14/11/02 16:25:46 INFO yarn.Client: Preparing Local resources
14/11/02 16:25:46 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
14/11/02 16:25:46 INFO yarn.Client: Uploading file:/home/tomk/tmp/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar to hdfs://mr-0xd6.0xdata.loc/user/tomk/.sparkStaging/application_1413598290344_0066/spark-assembly-1.1.0-hadoop2.4.0.jar
14/11/02 16:25:48 INFO yarn.Client: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "mr-0xd6.0xdata.loc" port: -1 file: "/user/tomk/.sparkStaging/application_1413598290344_0066/spark-assembly-1.1.0-hadoop2.4.0.jar" } size: 138884949 timestamp: 1414974348190 type: FILE visibility: PRIVATE)
14/11/02 16:25:48 INFO yarn.Client: Setting up the launch environment
14/11/02 16:25:48 INFO yarn.Client: Setting up container launch context
14/11/02 16:25:48 INFO yarn.Client: Yarn AM launch context:
14/11/02 16:25:48 INFO yarn.Client:   class:   org.apache.spark.deploy.yarn.ExecutorLauncher
14/11/02 16:25:48 INFO yarn.Client:   env:     Map(CLASSPATH -> $PWD:$PWD/__spark__.jar:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$PWD/__app__.jar:$PWD/*, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 138884949, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1413598290344_0066/, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE, SPARK_USER -> tomk, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1414974348190, SPARK_YARN_CACHE_FILES -> hdfs://mr-0xd6.0xdata.loc/user/tomk/.sparkStaging/application_1413598290344_0066/spark-assembly-1.1.0-hadoop2.4.0.jar#__spark__.jar)
14/11/02 16:25:48 INFO yarn.Client:   command: $JAVA_HOME/bin/java -server -Xmx4096m -Djava.io.tmpdir=$PWD/tmp '-Dspark.tachyonStore.folderName=spark-b2067b5c-d854-475b-be02-a67b66a4c8f5' '-Dspark.driver.memory=4g' '-Dspark.executor.memory=2g' '-Dspark.executor.instances=3' '-Dspark.yarn.secondary.jars=' '-Dspark.driver.host=mr-0xd1.0xdata.loc' '-Dspark.driver.appUIHistoryAddress=' '-Dspark.app.name=Sparkling Water' '-Dspark.driver.appUIAddress=mr-0xd1.0xdata.loc:4040' '-Dspark.jars=file:/home/tomk/tmp/spark-1.1.0-bin-hadoop2.4/../sparkling-water-0.2.1-58/assembly/build/libs/sparkling-water-assembly-0.2.1-58-all.jar' '-Dspark.fileserver.uri=http://172.16.2.181:51874' '-Dspark.master=yarn-client' '-Dspark.driver.port=36089' '-Dspark.executor.cores=1' org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar  null  --arg  'mr-0xd1.0xdata.loc:36089' --executor-memory 2048 --executor-cores 1 --num-executors  3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
14/11/02 16:25:48 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:25:48 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:25:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:25:48 INFO yarn.Client: Submitting application to ResourceManager
14/11/02 16:25:48 INFO impl.YarnClientImpl: Submitted application application_1413598290344_0066
14/11/02 16:25:48 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: -1
     appStartTime: 1414974348389
     yarnAppState: ACCEPTED
14/11/02 16:25:49 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: -1
     appStartTime: 1414974348389
     yarnAppState: ACCEPTED
14/11/02 16:25:50 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: -1
     appStartTime: 1414974348389
     yarnAppState: ACCEPTED
14/11/02 16:25:51 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, PROXY_HOST=mr-0xd7.0xdata.loc,PROXY_URI_BASE=http://mr-0xd7.0xdata.loc:8088/proxy/application_1413598290344_0066, /proxy/application_1413598290344_0066
14/11/02 16:25:51 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
14/11/02 16:25:51 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: 0
     appStartTime: 1414974348389
     yarnAppState: RUNNING
14/11/02 16:25:53 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd1.0xdata.loc:55555/user/Executor#-58639133] with ID 2
14/11/02 16:25:53 INFO util.RackResolver: Resolved mr-0xd1.0xdata.loc to /default-rack
14/11/02 16:25:53 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd1.0xdata.loc:44336 with 1060.3 MB RAM
14/11/02 16:25:53 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd2.0xdata.loc:48257/user/Executor#-1480833723] with ID 1
14/11/02 16:25:53 INFO util.RackResolver: Resolved mr-0xd2.0xdata.loc to /default-rack
14/11/02 16:25:54 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd2.0xdata.loc:52830 with 1060.3 MB RAM
14/11/02 16:25:54 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd10.0xdata.loc:38855/user/Executor#-1943381558] with ID 3
14/11/02 16:25:54 INFO util.RackResolver: Resolved mr-0xd10.0xdata.loc to /default-rack
14/11/02 16:25:54 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
14/11/02 16:25:54 INFO h2o.H2OContext: Starting -1 H2O nodes...
14/11/02 16:25:54 INFO spark.SparkContext: Starting job: collect at H2OContextUtils.scala:28
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Got job 0 (collect at H2OContextUtils.scala:28) with 200 output partitions (allowLocal=false)
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at H2OContextUtils.scala:28)
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Missing parents: List()
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at H2OContextUtils.scala:23), which has no missing parents
14/11/02 16:25:54 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd10.0xdata.loc:60447 with 1060.3 MB RAM
14/11/02 16:25:54 INFO storage.MemoryStore: ensureFreeSpace(1832) called with curMem=0, maxMem=2223023063
14/11/02 16:25:54 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 2.1 GB)
14/11/02 16:25:54 INFO storage.MemoryStore: ensureFreeSpace(1218) called with curMem=1832, maxMem=2223023063
14/11/02 16:25:54 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1218.0 B, free 2.1 GB)
14/11/02 16:25:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on mr-0xd1.0xdata.loc:40286 (size: 1218.0 B, free: 2.1 GB)
14/11/02 16:25:54 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
14/11/02 16:25:54 INFO scheduler.DAGScheduler: Submitting 200 missing tasks from Stage 0 (MappedRDD[1] at map at H2OContextUtils.scala:23)
14/11/02 16:25:54 INFO cluster.YarnClientClusterScheduler: Adding task set 0.0 with 200 tasks
14/11/02 16:25:54 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, mr-0xd10.0xdata.loc, PROCESS_LOCAL, 1234 bytes)
[ ... lots of scheduler.TaskSetManager output deleted ... ]
14/11/02 16:32:55 INFO scheduler.DAGScheduler: Stage 1 (collect at H2OContextUtils.scala:67) finished in 1.213 s
14/11/02 16:32:55 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
14/11/02 16:32:55 INFO spark.SparkContext: Job finished: collect at H2OContextUtils.scala:67, took 1.284518985 s
14/11/02 16:32:55 INFO h2o.H2OContext: Sparkling H2O - H2O status: (3,true),(1,true),(2,true),(2,true),(3,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(3,true),(1,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true),(2,true),(1,true),(3,true)
11-02 16:32:55.152 172.16.2.181:54321    24915  main      INFO: ----- H2O started (client) -----
11-02 16:32:55.182 172.16.2.181:54321    24915  main      INFO: Build git branch: (unknown)
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Build git hash: (unknown)
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Build git describe: (unknown)
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Build project version: (unknown)
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Built by: '(unknown)'
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Built on: '(unknown)'
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Java availableProcessors: 32
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Java heap totalMemory: 3.83 GB
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Java heap maxMemory: 3.83 GB
11-02 16:32:55.183 172.16.2.181:54321    24915  main      INFO: Java version: Java 1.7.0_72 (from Oracle Corporation)
11-02 16:32:55.184 172.16.2.181:54321    24915  main      INFO: OS   version: Linux 3.13.0-35-generic (amd64)
11-02 16:32:55.184 172.16.2.181:54321    24915  main      INFO: Possible IP Address: vethJQ8838 (vethJQ8838), fe80:0:0:0:fc0f:adff:fe93:af9b%104
11-02 16:32:55.184 172.16.2.181:54321    24915  main      INFO: Possible IP Address: br2 (br2), fe80:0:0:0:a236:9fff:fe35:4262%9
11-02 16:32:55.184 172.16.2.181:54321    24915  main      INFO: Possible IP Address: br2 (br2), 172.16.2.181
11-02 16:32:55.184 172.16.2.181:54321    24915  main      INFO: Possible IP Address: lo (lo), 0:0:0:0:0:0:0:1%1
11-02 16:32:55.184 172.16.2.181:54321    24915  main      INFO: Possible IP Address: lo (lo), 127.0.0.1
11-02 16:32:55.184 172.16.2.181:54321    24915  main      INFO: Internal communication uses port: 54322
11-02 16:32:55.184 172.16.2.181:54321    24915  main      INFO: Listening for HTTP and REST traffic on  http://172.16.2.181:54321/
11-02 16:32:55.185 172.16.2.181:54321    24915  main      INFO: H2O cloud name: 'sparkling-water-42' on /172.16.2.181:54321, discovery address /235.34.3.46:60194
11-02 16:32:55.185 172.16.2.181:54321    24915  main      INFO: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
11-02 16:32:55.185 172.16.2.181:54321    24915  main      INFO:   1. Open a terminal and run 'ssh -L 55555:localhost:54321 tomk@172.16.2.181'
11-02 16:32:55.185 172.16.2.181:54321    24915  main      INFO:   2. Point your browser to http://localhost:55555
11-02 16:32:55.286 172.16.2.181:54321    24915  main      INFO: Log dir: '/tmp/h2o-tomk/h2ologs'
11-02 16:32:55.521 172.16.2.181:54321    24915  FJ-126-15 INFO: Cloud of size 1 formed [/172.16.2.183:54321]
11-02 16:32:58.348 172.16.2.181:54321    24915  FJ-126-15 INFO: Cloud of size 3 formed [/172.16.2.183:54321, /172.16.2.188:54321, /172.16.2.189:54321]
You’re up and running!

At this point, Sparkling Water is running and waiting for you to connect. Point your web browser to one of addresses listed on last line, for example 172.16.2.183:54321, to access the H2O Web UI.

5. Summary

Here is a breakdown of the command-line flags we passed to spark-submit above:

Option Description
--class water.SparklingWaterDriver Class name of the Scala main driver program. In this case Sparkling Water driver is used. It launches H2O services on all Spark cluster nodes.
--master yarn-client Launch using Spark on YARN in client mode. Use yarn-cluster to launch a job in cluster mode.
--num-executors 3 Number of Spark executors to start (must be number of H2O nodes)
--driver-memory 4g Memory to give the driver program
--executor-memory 2g Memory to give to each H2O node
--executor-cores 1 Before Hadoop 2.2, YARN does not support cores in container resource requests. Thus, when running against an earlier version, the numbers of cores given via command line arguments cannot be passed to YARN. Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.
../sparkling-water-0.2.1-58/assembly/build/libs/*.jar Sparkling Water application jar file

Note: To manage the running application on the YARN cluster, use the yarn command. For example, to list running applications: yarn application -appStates RUNNING -list

Launch with Sparkling Shell

Set Spark home

$ cd spark-1.1.0-bin-hadoop2.4/$ export SPARK_HOME=$(pwd)

Set Hadoop Yarn configuration

$ export HADOOP_CONF_DIR=/etc/hadoop/conf

Export YARN cluster reference for Spark runtime

$ export MASTER="yarn-client"

Run Sparkling Shell

$ cd ../sparkling-water-0.2.1-58/$ bin/sparkling-shell --num-executors 3 --executor-memory 2g --executor-cores 1 --master yarn-client

Using
  MASTER     = yarn-client"
  SPARK_HOME = /home/tomk/tmp/spark-1.1.0-bin-hadoop2.4"
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/11/02 16:50:44 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:50:44 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:50:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:50:44 INFO spark.HttpServer: Starting HTTP Server
14/11/02 16:50:44 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:50:44 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60849
14/11/02 16:50:44 INFO util.Utils: Successfully started service 'HTTP class server' on port 60849.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
      /_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_72)
Type in expressions to have them evaluated.
Type :help for more information.
14/11/02 16:50:48 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:50:48 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:50:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:50:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/11/02 16:50:49 INFO Remoting: Starting remoting
14/11/02 16:50:49 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@mr-0xd1.0xdata.loc:55740]
14/11/02 16:50:49 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@mr-0xd1.0xdata.loc:55740]
14/11/02 16:50:49 INFO util.Utils: Successfully started service 'sparkDriver' on port 55740.
14/11/02 16:50:49 INFO spark.SparkEnv: Registering MapOutputTracker
14/11/02 16:50:49 INFO spark.SparkEnv: Registering BlockManagerMaster
14/11/02 16:50:49 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141102165049-070c
14/11/02 16:50:49 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 36632.
14/11/02 16:50:49 INFO network.ConnectionManager: Bound socket to port 36632 with id = ConnectionManagerId(mr-0xd1.0xdata.loc,36632)
14/11/02 16:50:49 INFO storage.MemoryStore: MemoryStore started with capacity 1589.8 MB
14/11/02 16:50:49 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/11/02 16:50:49 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd1.0xdata.loc:36632 with 1589.8 MB RAM
14/11/02 16:50:49 INFO storage.BlockManagerMaster: Registered BlockManager
14/11/02 16:50:49 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-8cbcd2b8-ebc6-406d-b69c-3d2596bc9707
14/11/02 16:50:49 INFO spark.HttpServer: Starting HTTP Server
14/11/02 16:50:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:50:49 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:34458
14/11/02 16:50:49 INFO util.Utils: Successfully started service 'HTTP file server' on port 34458.
14/11/02 16:50:54 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/02 16:50:54 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/11/02 16:50:54 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/11/02 16:50:54 INFO ui.SparkUI: Started SparkUI at http://mr-0xd1.0xdata.loc:4040
14/11/02 16:50:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/02 16:50:54 INFO spark.SparkContext: Added JAR file:/home/tomk/tmp/sparkling-water-0.2.1-58/assembly/build/libs/sparkling-water-assembly-0.2.1-58-all.jar at http://172.16.2.181:34458/jars/sparkling-water-assembly-0.2.1-58-all.jar with timestamp 1414975854959
--args is deprecated. Use --arg instead.
14/11/02 16:50:55 INFO client.RMProxy: Connecting to ResourceManager at mr-0xd7.0xdata.loc/172.16.2.187:8050
14/11/02 16:50:55 INFO yarn.Client: Got cluster metric info from ResourceManager, number of NodeManagers: 8
14/11/02 16:50:55 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 227328
14/11/02 16:50:55 INFO yarn.Client: Preparing Local resources
14/11/02 16:50:55 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
14/11/02 16:50:55 INFO yarn.Client: Uploading file:/home/tomk/tmp/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar to hdfs://mr-0xd6.0xdata.loc/user/tomk/.sparkStaging/application_1413598290344_0068/spark-assembly-1.1.0-hadoop2.4.0.jar
14/11/02 16:50:57 INFO yarn.Client: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "mr-0xd6.0xdata.loc" port: -1 file: "/user/tomk/.sparkStaging/application_1413598290344_0068/spark-assembly-1.1.0-hadoop2.4.0.jar" } size: 138884949 timestamp: 1414975857136 type: FILE visibility: PRIVATE)
14/11/02 16:50:57 INFO yarn.Client: Setting up the launch environment
14/11/02 16:50:57 INFO yarn.Client: Setting up container launch context
14/11/02 16:50:57 INFO yarn.Client: Yarn AM launch context:
14/11/02 16:50:57 INFO yarn.Client:   class:   org.apache.spark.deploy.yarn.ExecutorLauncher
14/11/02 16:50:57 INFO yarn.Client:   env:     Map(CLASSPATH -> $PWD:$PWD/__spark__.jar:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$PWD/__app__.jar:$PWD/*, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 138884949, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1413598290344_0068/, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE, SPARK_USER -> tomk, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1414975857136, SPARK_YARN_CACHE_FILES -> hdfs://mr-0xd6.0xdata.loc/user/tomk/.sparkStaging/application_1413598290344_0068/spark-assembly-1.1.0-hadoop2.4.0.jar#__spark__.jar)
14/11/02 16:50:57 INFO yarn.Client:   command: $JAVA_HOME/bin/java -server -Xmx3072m -Djava.io.tmpdir=$PWD/tmp '-Dspark.tachyonStore.folderName=spark-b990a005-0b25-4775-9bb9-94792222ca20' '-Dspark.driver.memory=3G' '-Dspark.executor.memory=2g' '-Dspark.executor.instances=3' '-Dspark.yarn.secondary.jars=' '-Dspark.repl.class.uri=http://172.16.2.181:60849' '-Dspark.driver.host=mr-0xd1.0xdata.loc' '-Dspark.driver.appUIHistoryAddress=' '-Dspark.app.name=Spark shell' '-Dspark.driver.appUIAddress=mr-0xd1.0xdata.loc:4040' '-Dspark.jars=file:/home/tomk/tmp/sparkling-water-0.2.1-58/assembly/build/libs/sparkling-water-assembly-0.2.1-58-all.jar' '-Dspark.fileserver.uri=http://172.16.2.181:34458' '-Dspark.driver.port=55740' '-Dspark.master=yarn-client' '-Dspark.executor.cores=1' org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar  null  --arg  'mr-0xd1.0xdata.loc:55740' --executor-memory 2048 --executor-cores 1 --num-executors  3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
14/11/02 16:50:57 INFO spark.SecurityManager: Changing view acls to: tomk,
14/11/02 16:50:57 INFO spark.SecurityManager: Changing modify acls to: tomk,
14/11/02 16:50:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tomk, ); users with modify permissions: Set(tomk, )
14/11/02 16:50:57 INFO yarn.Client: Submitting application to ResourceManager
14/11/02 16:50:57 INFO impl.YarnClientImpl: Submitted application application_1413598290344_0068
14/11/02 16:50:57 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: -1
     appStartTime: 1414975857313
     yarnAppState: ACCEPTED
14/11/02 16:50:58 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: -1
     appStartTime: 1414975857313
     yarnAppState: ACCEPTED
14/11/02 16:50:59 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: -1
     appStartTime: 1414975857313
     yarnAppState: ACCEPTED
14/11/02 16:51:00 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: -1
     appStartTime: 1414975857313
     yarnAppState: ACCEPTED
14/11/02 16:51:00 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, PROXY_HOST=mr-0xd7.0xdata.loc,PROXY_URI_BASE=http://mr-0xd7.0xdata.loc:8088/proxy/application_1413598290344_0068, /proxy/application_1413598290344_0068
14/11/02 16:51:00 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
14/11/02 16:51:01 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
     appMasterRpcPort: 0
     appStartTime: 1414975857313
     yarnAppState: RUNNING
14/11/02 16:51:03 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd3.0xdata.loc:36741/user/Executor#-407387512] with ID 3
14/11/02 16:51:03 INFO util.RackResolver: Resolved mr-0xd3.0xdata.loc to /default-rack
14/11/02 16:51:03 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd3.0xdata.loc:57269 with 1060.3 MB RAM
14/11/02 16:51:05 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd2.0xdata.loc:42829/user/Executor#-1195655046] with ID 2
14/11/02 16:51:05 INFO util.RackResolver: Resolved mr-0xd2.0xdata.loc to /default-rack
14/11/02 16:51:05 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@mr-0xd9.0xdata.loc:41874/user/Executor#1710426358] with ID 1
14/11/02 16:51:05 INFO util.RackResolver: Resolved mr-0xd9.0xdata.loc to /default-rack
14/11/02 16:51:05 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
14/11/02 16:51:05 INFO repl.SparkILoop: Created spark context..
14/11/02 16:51:05 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd2.0xdata.loc:38121 with 1060.3 MB RAM
14/11/02 16:51:05 INFO storage.BlockManagerMasterActor: Registering block manager mr-0xd9.0xdata.loc:34657 with 1060.3 MB RAM
Spark context available as sc.
scala>

Now you can launch H2O services on the Spark cluster.

scala> import org.apache.spark.h2o._

import org.apache.spark.h2o._

scala> val h2oContext = new H2OContext(sc).start()

14/11/02 16:54:06 INFO h2o.H2OContext: Starting -1 H2O nodes...
14/11/02 16:54:07 INFO spark.SparkContext: Starting job: collect at H2OContextUtils.scala:28
14/11/02 16:54:07 INFO scheduler.DAGScheduler: Got job 0 (collect at H2OContextUtils.scala:28) with 200 output partitions (allowLocal=false)
[ ... lots of output ... ]
11-02 15:14:21.816 172.16.2.181:54323    17176  FJ-126-15 INFO: Cloud of size 1 formed [/172.16.2.182:54321]
11-02 15:14:24.678 172.16.2.181:54323    17176  FJ-126-15 INFO: Cloud of size 3 formed [/172.16.2.181:54321, /172.16.2.182:54321, /172.16.2.184:54321]
h2oContext: org.apache.spark.h2o.H2OContext =
Sparkling H2O setup:
  workers=-1
  flatfile: false
  basePort: 54321
  incrPort: 2
  drddMulFactor: 10

scala> import h2oContext._

import h2oContext._

Now Spark and H2O services are running and available for use.

7. Helpful resources

Spark resources:

Sparkling water:

Leave a Reply

+
H2O LLM DataStudio Part II: Convert Documents to QA Pairs for fine tuning of LLMs

Convert unstructured datasets to Question-answer pairs required for LLM fine-tuning and other downstream tasks with

September 22, 2023 - by Genevieve Richards, Tarique Hussain and Shivam Bansal
+
Building a Fraud Detection Model with H2O AI Cloud

In a previous article[1], we discussed how machine learning could be harnessed to mitigate fraud.

July 28, 2023 - by Asghar Ghorbani
+
A Look at the UniformRobust Method for Histogram Type

Tree-based algorithms, especially Gradient Boosting Machines (GBM's), are one of the most popular algorithms used.

July 25, 2023 - by Hannah Tillman and Megan Kurka
+
H2O LLM EvalGPT: A Comprehensive Tool for Evaluating Large Language Models

In an era where Large Language Models (LLMs) are rapidly gaining traction for diverse applications,

July 19, 2023 - by Srinivas Neppalli, Abhay Singhal and Michal Malohlava
+
Testing Large Language Model (LLM) Vulnerabilities Using Adversarial Attacks

Adversarial analysis seeks to explain a machine learning model by understanding locally what changes need

July 19, 2023 - by Kim Montgomery, Pramit Choudhary and Michal Malohlava
+
Reducing False Positives in Financial Transactions with AutoML

In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems

July 14, 2023 - by Asghar Ghorbani

Ready to see the H2O.ai platform in action?

Make data and AI deliver meaningful and significant value to your organization with our state-of-the-art AI platform.