Vagrant provisioning using shell - how to setup storm project

Provisioning in Vagrant

There is a lot of choices when you want to provision your virtual machine with vagrant:
1. Run scripts manually - completely insane idea
2. Run shell scripts inline or not
3. Use chef or puppet

The first one is good only for testing, trying, ..., but it is insane if you want to provision all machines like this.

The second one is looks quite optimistic, especially if you don't know what are tools from third point.

Today I will try to present how to setup storm project on vagrant machine using only shell provisioning. To simplify the operation I will use tutorial from Hortonworks: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bkinstallingmanuallybook/content/chrpm_storm.html
1000 words which describe my work looks like:
setup storm with vagrant in bash

Step .0. Choosing and preparing machine

First we need to create directory and init vagrant so let's do this:

mkdir storm-vagrant  
cd  storm-vagrant  
vagrant init  

I will use centos box which I downloaded before (you can use URL instead of file). Why Centos you asked? Because in many companies the only allowed Linux is Red Hat. Centos is closed enough. So let's add the box:

 vagrant box add "centos-6.5" file:///c:/vagrantfiles/CentOS-6.4-i386-v20131103.box

In vagrant file we need to change our box name so
config.vm.box = "base" becomes config.vm.box = "centos-6.5"

Step .1. Configure network

We will need network access to guest machine. It can be done in two ways:
1. Using port forwarding
2. Setting IP adress for guest machine.

I will add code to enable private network in machine so vagrantfile will look like below.:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|  
  # All Vagrant configuration is done here. The most common configuration
  # options are documented and commented below. For a complete reference,
  # please see the online documentation at vagrantup.com.

  # Every Vagrant virtual environment requires a box to build off of.
  config.vm.box = "centos-6.5"
  config.vm.network "private_network", ip: "192.168.33.10"

Caution: In next code listings I will only append above lines.

Now we are ready to run:

vagrant up  

After the above command is complete (don't worry about read color on console), let's explore our box. Run vagrant ssh and you are inside. Check what you want and logout from the machine.

Step .2. Installing storm rpms.

According to " Chapter 1. Getting Ready to Install" we need to configure remote repositories. For Centos 6 the line is:

wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.5.0/hdp.repo -O /etc/yum.repos.d/hdp.repo  

So let's change our vagrant file to (first two lines already exist in our file):

  config.vm.box = "centos-6.5"
  config.vm.network "private_network", ip: "192.168.33.10"
  config.vm.provision :shell, :inline  => "wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.5.0/hdp.repo -O /etc/yum.repos.d/hdp.repo"

According to Chapter 17.1 Install the Storm RPMs. we can install storm rpm with

yum install storm  

but above command will prompt as if we are sure to download storm + zookeeper (yes storm needs zookeeper to run). To avoid prompt just add -y to the command. So vagrant file will evolve to:

config.vm.box = "centos-6.5"  
  config.vm.provision :shell, :inline  => "wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.5.0/hdp.repo -O /etc/yum.repos.d/hdp.repo"
  config.vm.provision :shell, :inline  => "yum -y install storm"

To call above provisioning we have to choices:
1. Run vagrant destroy and again vagrant up - if we do too much manual changes in previous steps
2. Or just vagrant provision to run only provisioning on running box

Step .3. Running zookeeper

Zookeeper? Why? -you think now. Storm need zookeeper to comunicate between its nodes. The second line of last chapter: 5. Validate the Installation is:

You must start ZooKeeper before starting Storm.

First we need Java (for storm and zookeeper). I prefer Java 7 form Sun, so I just downloaded rpm form official page to vagrant folder. In recipe please add:

 config.vm.provision :shell, :inline  => '
    javarpm="$(ls /vagrant/ | grep "^jdk.*.rpm" | tail -1)"
    echo $javarpm
    rpm -Uvh /vagrant/$javarpm
    java -version'

Please notice that I have to change " into '. The second option will be escape " with \"

Then we need to create dirs as it is explain in Chapter 5.2: Set Directories and Permissions. I created a file directories.sh with below content. It should be placed side-by-side to Vagrantfile:

#!/bin/sh

# Directory where ZooKeeper will store data. For example, /grid1/hadoop/zookeeper/data
export ZOOKEEPER_DATA_DIR="/grid1/hadoop/zookeeper/data";  
# Directory to store the ZooKeeper configuration files.
export ZOOKEEPER_CONF_DIR="/etc/zookeeper/conf";  
# Directory to store the ZooKeeper logs.
export ZOOKEEPER_LOG_DIR="/var/log/zookeeper";  
# Directory to store the ZooKeeper process ID.
export ZOOKEEPER_PID_DIR="/var/run/zookeeper";  

Caution: If you are on Windows host, remember to change line ending to Unix style to avoid : command not found printing on console during login to guest machine

And do provisioning below with all steps mention in zookeeper chapter (creating dirs, set zookeeper node id and start it).

config.vm.provision :shell, :inline   => "  
    yes | cp /vagrant/directories.sh /etc/profile.d/directories.sh
    chmod 755 /etc/profile.d/directories.sh"

config.vm.provision :shell, :inline  => "  
    mkdir -p $ZOOKEEPER_LOG_DIR;
    chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_LOG_DIR;
    chmod -R 755 $ZOOKEEPER_LOG_DIR;

    mkdir -p $ZOOKEEPER_PID_DIR;
    chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_PID_DIR;
    chmod -R 755 $ZOOKEEPER_PID_DIR;

    mkdir -p $ZOOKEEPER_DATA_DIR;
    chmod -R 755 $ZOOKEEPER_DATA_DIR;
    chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_DATA_DIR

    echo '1' >> $ZOOKEEPER_DATA_DIR/myid

    su - zookeeper -c 'source /etc/zookeeper/conf/zookeeper-env.sh ; export ZOOCFGDIR=/etc/zookeeper/conf;/usr/lib/zookeeper/bin/zkServer.sh start >> /var/log/zookeeper/zoo.out 2>&1'"

To check if ZooKeeper is running we need to login to our box and run:

ps aux | grep zookeeper  

Is process does not exist just check:

cat /var/log/zookeeper/zoo.out  

Step .4. Configure Storm

After reading 17.2. Configure Storm a new file is needed: storm.yaml. Because everything will be on one machine I will set ZOOKEEPER_SERVERS and NIMBUS_HOSTNAME to localhost. Create a storm.yaml again side-by-side to vagrantfile with following content:

storm.zookeeper.servers:  
    - localhost

nimbus.host: localhost

drpc.servers:  
    - "localhost"

storm.local.dir: /tmp/storm/local

logviewer.port: 8081  

and copy it into /etc/storm/conf/storm.yaml. The above file is located in shared /vagrant directory.

Both above modification in vagrantfile:

[...]
  config.vm.provision :shell, :inline  => "yum -y install storm"
  config.vm.provision :shell, :inline  => "
    mkdir -p  $STORM_LOCAL_DIR
    chown -R storm:storm $STORM_LOCAL_DIR
   chmod -R 755 $STORM_LOCAL_DIR

   cp -f /vagrant/storm.yaml /etc/storm/conf/storm.yaml"

Attention: I added mkdir -p $STORM_LOCAL_DIR because this folder wasn't create on destination machine.

The last step is to add proper exports in direstories.sh file:

#storm local dir
export STORM_LOCAL_DIR="/tmp/storm/local";  

To validate install we should login to guest and run:

sudo su - storm  
storm nimbus  

The expected output is something similar to:

/usr/bin/storm: line 2: /etc/default/hadoop: No such file or directory

Running: java -server -Dstorm.options= -Dstorm.home=/usr/lib/storm   [...snip...] backtype.storm.daemon.nimbus  

Don't be afraid about first line. We won't need Hadoop at all, but Hortonworks installation files assume that we install it everywhere.

Step .5. Configure Process Controller

The optional 3 chapter I omit because we don't need it now - we didn't secure zookeeper. What is interesting we didn't do anything about zookeeper - let's back to this later. Just believe me.

In chapter 4. Configure Process Controller there is a mention about tool called supervisord. If we check it using yum search supervisord we won't find it in enabled repositories. To install it we need EPEL repo with

cd /tmp  
wget http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm  
sudo rpm -Uhv epel-release-6-8.noarch.rpm  

and then run: yum install -y supervisor. Because supervisor.conf can contain password it is suggested to mark it with 600

So in vagrant file we have to add above (remember with -y in yum command):

config.vm.provision :shell, :inline  => "  
    cd /tmp  
    wget http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm  
    sudo rpm -Uhv epel-release-6-8.noarch.rpm  
    yum install -y supervisor
     sudo chmod 600 /etc/supervisord.conf"

Let's run the machine (with vagrant provision), login to it and copy default conf file: sudo cp /etc/supervisord.conf /vagrant. In copied file we can append lines from 4. Configure Process Controller.

After this step we need to add copy back this file in provisioning:

config.vm.provision :shell, :inline  => "  
    yes | cp /vagrant/supervisord.conf /etc/supervisord.conf
    /etc/init.d/supervisord restart"

Remember: supervisord.conf, storm.yaml and vagrantfile are integral parts of our recipe

Step.6. Opening ports

The last part will be open ports for storm-ui, which is by default 8080. We should add line in /etc/sysconfig/iptables file and restart iptables. The file will look like (remember about UNIX line endings):

# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8080 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT  

And provisioning is:

  config.vm.provision :shell, :inline  => "
    yes | cp /vagrant/iptables /etc/sysconfig/iptables
    /etc/init.d/iptables restart"

Then run vagrant provision for the last time.

Step.7. Check installation

At least we check if everything is working. Just open your favorite browser on http://192.168.33.10:8080/ and you should see Storm UI web page. If there is a connection error you can:
1. Wait a while - it need some time to start :)
2. Check logs in /var/log/storm and /var/log/zookeeper
3. Recreate your machine with running vagrant destroy and then vagrant up

That all. Next time I will show how to use better provisiong than shell, because as you probably notice shell provisionig works, but is quite ugly and has a lot of harcoded stuff.