Benjamin Bouillé bio photo

Benjamin Bouillé

IT consultant

Email Twitter Google+ LinkedIn Github Stackoverflow

Overview

Introduction

On the road of ressource optimisation and cluster efficiency, mesos is good move for distribution like Hadoop or MPI (see the paper from Berkeley). However what about other technologies like Storm and real time processing ? It seems that storm can run on top of mesos with a custom storm distribution published by Nathan Martz : storm-mesos. Let’s see how to do it.

Note : another framework is under development at Yahoo and the prototype is available on github : storm-yarn.

Pre-requisites

Environnement

  • OS : Debian 6.0.6 x64

Download

  • storm-0.8.2.zip (projet site) : distributed and fault-tolerant realtime computation framework
  • storm-mesos (from github) : the distribution to run Storm on top of Mesos
  • lein (from github) : a shell script to manage Leiningen, a tool to automate Clojure projets

Quick download to your storm home directory :

$ wget https://dl.dropbox.com/u/133901206/storm-0.8.2.zip
$ git clone git://github.com/isnoopy/storm-mesos.git
$ wget https://raw.github.com/technomancy/leiningen/stable/bin/lein

Install lein

  • Place it on your $PATH :
$ mv ~/lein ~/bin/.
$ echo "export PATH=$PATH:/home/storm/bin" >> ~/.bashrc
$ source ~/.bashrc
  • Set it to be executable :
chmod 755 ~/bin/lein
  • Install leiningen (v2.2.0) : lein self-install

Prepare storm-mesos

Copy mesos-0.11.0.jar and protobuf-2.4.1.jar from mesos build in the lib/ folder :

$ cd storm-mesos
$ mkdir lib
$ cp ../mesos-0.11.0/build/protobuf-2.4.1.jar lib/.
$ cp ../mesos-0.11.0/build/src/mesos-0.11.0.jar lib/.
  • Copy the storm distribution in the lib/ folder:
$ cd ../storm-0.8.2.zip lib/.
  • Update the description (version number etc.) of storm-mesos in project.clj

  • Update the storm configuration file : storm.yaml. For this post, every daemons are running on localhost and the configuration is the following :

## Default configuration for standalone mode (every daemons on one node with default settings)


# Path to mesos distribution built properly

java.library.path: "native:/usr/local/mesos-0.11.0/build/src/.libs"

# hostname:port of the mesos master node

mesos.master.url: "localhost:5050"

# in cluster ENV, change it to a globally accessible directory (HDFS or NFS etc.)

mesos.executor.uri: "/usr/local/storm-mesos-0.8.2-SNAPSHOT.tgz"

# hostname:port of zookeeper nodes (default port 2181)

storm.zookeeper.servers:
- "localhost"

# hostname of nimbus node

nimbus.host: "localhost"

# full path of storm local working directory

storm.local.dir: "/usr/local/storm-local"

Note : at this moment the storm-mesos-0.8.2-SNAPSHOT.tgz is not yet build. However the storm configuration file requires to set its location now in the mesos.executor.uri. The file storm-mesos-0.8.2-SNAPSHOT.tgz is the same for every node. In standalone mode, local path is used but in a cluster mode, a path accessible globally (HDFS or NFS for example) is required.

  • Update dependencies :
$ lein deps
  • Create the pom.xml :
$ lein install
  • Compile with maven :
$ mvn clean compile install
  • Build storm-mesos :
$ ./bin/build-release.sh lib/storm-0.8.2.zip
  • Deploy the distribution to the path declared in mesos.executor.uri of storm.yaml :
$ sudo cp storm-mesos-0.8.2-SNAPSHOT.tgz /usr/local/.
  • Unpack the distribution to run the Nimbus (this operation is done only on the master node running the Nimbus). For standalone mode, unpack it in the home directory:
$ cd ~
$ tar xzvf storm-mesos/storm-mesos-0.8.2-SNAPSHOT.tgz
  • Ensure that zookeeper is running on localhost and run the nimbus as root :
$ cd ~/storm-mesos-0.8.2-SNAPSHOT
$ sudo ./bin/storm-mesos nimbus

  • Run the storm ui :
$ cd ~/storm-mesos-0.8.2-SNAPSHOT
$ ./bin/storm-mesos ui