July 13th, 2016

H2O + TensorFlow on AWS GPU

RSS icon RSS Category: Deep Learning, Tutorials
Fallback Featured Image

TensorFlow on AWS GPU instance
In this tutorial, we show how to setup TensorFlow on AWS GPU instance and run H2O Tensorflow Deep learning demo.
Pre-requisites:
To get started, request an AWS EC2 instance with GPU support. We used a single g2.2xlarge instance running Ubuntu 14.04.To setup TensorFlow with GPU support, following softwares should be installed:

  1. Java 1.8
  2. Python pip
  3. Unzip utility
  4. CUDA Toolkit (>= v7.0)
  5. cuDNN (v4.0)
  6. Bazel (>= v0.2)
  7. TensorFlow (v0.9)

To run H2O Tensorflow Deep learning demo, following softwares should be installed:

  1. IPython notebook
  2. Scala
  3. Spark
  4. Sparkling water

Software Installation:
Java:


#To install Java follow below steps: Type ‘Y’ on installation prompt
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Update JAVA_HOME in ~/.bashrc
#Add JAVA_HOME to PATH:
export PATH=$PATH:$JAVA_HOME/bin
# Execute following command to update current session:
source ~/.bashrc
#Verify version and path:
java -version
echo $JAVA_HOME

Python:


#AWS EC2 instance has Python installed by default. Verify if Python 2.7 is installed already:
python -V
#Install pip
sudo apt-get install python-pip
#Install IPython notebook
sudo pip install "ipython[notebook]"
#To run H2O example notebooks, execute following commands:
sudo pip install requests
sudo pip install tabulate

Unzip utility:


#Execute following command to install unzip
sudo apt-get install unzip

Scala:


#Follow below mentioned steps: Type ‘Y’ on installation prompt
sudo apt-get install scala
#Update SCALA_HOME in ~/.bashrc and execute following command to update current session:
source ~/.bashrc
#Verify version and path:
scala -version
echo $SCALA_HOME

Spark:


#Java and Scala should be installed before installing Spark.
#Get latest version of Spark binary:
wget http://apache.cs.utah.edu/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz
#Extract the file:
tar xvzf spark-1.6.1-bin-hadoop2.6.tgz
#Update SPARK_HOME in ~/.bashrc and execute following command to update current session:
source ~/.bashrc
#Add SPARK_HOME to PATH:
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
#Verify the variables:
echo $SPARK_HOME

Sparkling Water:


#Latest Spark pre-built for Hadoop should be installed and point SPARK_HOME to it:
export SPARK_HOME="/path/to/spark/installation"
#To launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node, export MASTER variable
export MASTER="local-cluster[3,2,1024]"
#Download and run Sparkling Water
wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/5/sparkling-water-1.6.5.zip
unzip sparkling-water-1.6.5.zip
cd sparkling-water-1.6.5
bin/sparkling-shell --conf "spark.executor.memory=1g"

CUDA Toolkit:


#In order to build or run TensorFlow with GPU support, both NVIDIA’s Cuda Toolkit (>= 7.0) and cuDNN (>= v2) need to be installed.
#To install CUDA toolkit, run:
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/x86_64/cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install cuda

cuDNN:


#To install cuDNN, download a file named cudnn-7.0-linux-x64-v4.0-prod.tgz after filling NVIDIA questionnaire.
#You need to transfer it to your EC2 instance’s home directory.
tar -zxf cudnn-7.0-linux-x64-v4.0-prod.tgz &&
rm cudnn-7.0-linux-x64-v4.0-prod.tgz
sudo cp -R cuda/lib64 /usr/local/cuda/lib64
sudo cp ~/cuda/include/cudnn.h /usr/local/cuda
#Reboot the system
sudo reboot
#Update environment variables as shown below:
export CUDA_HOME=/usr/local/cuda
export CUDA_ROOT=/usr/local/cuda
export PATH=$PATH:$CUDA_ROOT/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64

Bazel:


#To instal Bazel(>= v0.2), run:
sudo apt-get install pkg-config zip g++ zlib1g-dev
wget https://github.com/bazelbuild/bazel/releases/download/0.3.0/bazel-0.3.0-installer-linux-x86_64.sh
chmod +x bazel-0.3.0-installer-linux-x86_64.sh
./bazel-0.3.0-installer-linux-x86_64.sh --user

TensorFlow:


#Download and install TensorFlow:
wget https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade tensorflow-0.9.0rc0-cp27-none-linux_x86_64.whl
#Configure TF with GPU support enabled using:
./configure

To build TensorFlow, run:


bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-0.8.0-py2-none-any.whl

Run H2O Tensorflow Deep learning demo:


#Since, we want to open IPython notebook remotely, we will use IP and port option. To start TensorFlow notebook:
cd sparkling-water-1.6.5/ 
IPYTHON_OPTS="notebook --no-browser --ip='*' --port=54321" bin/pysparkling #Note that port specified in above command should be open in the system. Open http://PublicIP:8888 in browser to start IPython notebook console. Click on TensorFlowDeepLearning.ipynb Refer this video for demo details. #Sample .bashrc contents: export JAVA_HOME=/usr/lib/jvm/java-8-oracle export SCALA_HOME=/usr/share/java export SPARK_HOME=/home/ubuntu/spark-1.6.1-bin-hadoop2.6 export MASTER="local-cluster[3,2,1024]" export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin export CUDA_HOME=/usr/local/cuda export CUDA_ROOT=/usr/local/cuda export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/jvm/java-8-oracle/bin:/home/ubuntu/spark-1.6.1-bin-hadoop2.6/bin:/home/ubuntu/spark-1.6.1-bin-hadoop2.6/sbin:/usr/local/cuda/bin:/home/ubuntu/bin export LD_LIBRARY_PATH=:/usr/local/cuda/lib64

Troubleshooting:
1) ERROR: Getting java.net.UnknownHostException while starting spark-shell
Solution:
Make sure /etc/hosts has entry for hostname.
Eg: 127.0.0.1 hostname
2) ERROR: Getting Could not find .egg-info directory in install record error during IPython installation
Solution:

sudo pip install --upgrade setuptools pip

3) ERROR: Can’t find swig while configuring TF
Solution:

sudo apt-get install swig

4) ERROR: “Ignoring gpu device (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5”
Solution:
Specify 3.0 while configuring TF at:
Please note that each additional compute capability significantly increases your build time and binary size.
5) ERROR: Could not insert ‘nvidia_352’: Unknown symbol in module, or unknown parameter (see dmesg)
Solution:

sudo apt-get install linux-image-extra-virtual

6) ERROR: Cannot find ’./util/python/python_include
Solution:

sudo apt-get install python-dev

7) Find Public IP address of system
Solution:

curl http://169.254.169.254/latest/meta-data/public-ipv4

Demo Videos

Leave a Reply

+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel
LLM blog header
+
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable,

May 1, 2023 - by Parul Pandey

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More