Apache Storm Installation and Configuration Tutorial
3.1 Installation and Configuration
Hello and welcome to the third lesson of the Apache Storm course offered by SimpliLearn. This lesson will help you get familiarized the steps for installation and configuration of Storm. Now, let us start with exploring the objectives of this lesson.
By the end of this lesson, you will be able to choose proper hardware for Storm installation, install Storm on an Ubuntu system, configure Storm, run Storm on an Ubuntu system and describe the steps to setup a multi-node Storm cluster. Moving on, let us discuss the Storm versions.
3.3 Storm Versions
Storm has multiple versions. You need to first choose the latest stable version for installation. Zookeeper is a prerequisite to run Storm on the system. So, you start with installing zookeeper. Since Zookeeper comes as a part of Kafka and you will learn about Kafka interface to Strom later in the lesson, let us start with installing Kafka before installing Storm. Version 0.8.2 is the current stable version of Kafka and the current stable version of Storm is Version 0.9.5. The stable version of Kafka can be downloaded from: https://www.apache.org/dyn/closer.cgi?path=/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz The stable version of Storm can be down loaded from: http://apache.mirrors.pair.com/storm/apache-storm-0.9.5/apache-storm-0.9.5.tar.gz Next, let us look at selecting the right Operating System.
3.4 OS selection
You can choose any of the Linux operating systems for installation. Ubuntu 12.04 or later – Installed on our virtual machine is a good choice. You can also choose other Linux systems like Red Hat Enterprise Linux (referred to as RHEL) or Cent OS (Free version of RHEL) or Debian systems. Let’s move on to understand how to select an appropriate machine for Storm installation,
3.5 Machine Selection
Storm needs good memory and adequate processing power. Below are the recommended machine configurations: For development systems, a minimum of 2GB RAM, 1 CPU for Storm and 1 TB hard disk is recommended. For production systems, a minimum of 16 GB RAM is required but 32GB of RAM per machine is recommended. At least 6-Core CPUs with processers which are 2GHz or more are recommended. Four 2 TB hard disks are also recommended. For network, 1GB Ethernet is sufficient. Next, let us look at how to prepare the machine for installation.
3.6 Preparing for Installation
Following are the prerequisite software for installing Storm: Java JRE 1.7 or higher Oracle JRE recommended but works with Open JRE as well Zookeeper, which can be installed from Kafka repository Now, let us look at the steps to download the software.
3.7 Download Kafka
Kafka can be downloaded directly from the Apache Kafka website: wget http://mirrors.advancedhosters.com/apache/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz You may choose a different mirror based on your location by checking: https://www.apache.org/dyn/closer.cgi?path=/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz Do this on each machine where Zookeeper has to be installed. The file with .tgz or .tar.gz extension is called tarball. Tarball is a compressed tar archive on Linux. Next, let us learn how to download Storm.
3.8 Download Storm
Storm can be downloaded directly from the Apache Storm website: wget http://apache.mirrors.pair.com/storm/apache-storm-0.9.5/apache-storm-0.9.5.tar.gz Do this on each machine where Storm has to be installed. Now, we will learn how to install Kafka.
3.9 Install Kafka Demo 01
After the download, the archives have to be unzipped and moved to proper location: Unzip the package using tar utility: tar –xzf Kafka_2.9.1-0.8.2.1.tgz Move to proper directory: sudo mv Kafka_2.9.1-0.8.2.1 /usr/local/Kafka Note that sudo may ask for the password. Now that Kafka is installed, let us install Storm. Edit the .bashrc file in the home directory and add Kafka directory to the path Access the home directory by using the cd command as below: cd Edit the .bashrc file using the vi command as below: vi .bashrc In vi, the below mentioned lines are added at the end of the file. i command is used to go to the insert mode in vi and escape command to get out of the insert mode in this manner: export KAFKA_PREFIX=/usr/local/Kafka export PATH=$PATH:$KAFKA_PREFIX/bin export STORM_PREFIX=/usr/local/storm export PATH=$PATH:$STORM_PREFIX/bin Now, get out of the insert mode using the escape command and save the file with :wq Note that all the above commands are case sensitive, so you need to type exactly as shown. Restart bash for changes to take effect as shown below: exec bash This will set up the path to include the Kafka and Storm directory.
Moving on, let us see how to configure memory settings. Some development systems have low memory, so by default heap memory settings will not work on them. Below changes are required for a development cluster with low memory: Change the directory to bin directory of kafka installation. cd /usr/local/Kafka/bin Next, edit the zookeeper-server-start.sh file using vi editor. You can use i to enter the insert mode in vi and escape to get out of the insert mode. Escape key is normally located at the top left corner of the keyboard) vi zookeeper-server-start.sh In the above file, replace the line export KAFKA_HEAP_OPTS="-Xmx512M -Xms512M" with export KAFKA_HEAP_OPTS="-Xmx64M -Xms64M". Press escape and save the file with :wq. Next, edit the Kafka-server-start.sh file using vi. vi Kafka-server-start.sh In the above file, replace the line export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G" with export KAFKA_HEAP_OPTS="-Xmx128M -Xms128M". Press escape and save the file with :wq. Now, let us look at how to configure zookeeper for Storm. Since Storm uses zookeeper for distributed coordination, you need to configure zookeeper to work with Kafka. Modify the zookeeper.properties file in the Kafka configuration directory a shown below: cd /usr/local/Kafka/config vi zookeeper.properties Check the file and add the following lines if already not present: initLimit=5 syncLimit=2 maxClientCnxns=0 server.1=localhost:2888:3888 Press Escape and save the file with :wq. Then, exit the editor. Use the below command to create a myid file for zookeeper: echo 1 > /tmp/myid sudo cp /tmp/myid /tmp/zookeeper/myid Next, let us learn how to configure Kafka. Below changes are required for Kafka configuration. cd /usr/local/Kafka/config vi server.properties Replace the below line: broker.id=0 with broker.id=1. Check that the default port is set to 9092: port=9092. Check that zookeeper is set to connect at port 2081: zookeeper.connect=localhost:2181. If you have multiple zookeeper instances, you can specify them as mentioned above, separated by commas. Now, you will learn modify some more Kafka properties. Few more changes are to be done in the server.properties file. Add the following two lines at the end of the file: queued.max.requests=1000 auto.create.topics.enable=false The last line ensures that the topics are explicitly created before creating a message for the topic. • Press escape to exit insert mode and save the file with :wq Moving on, you will learn how to start the Kafka server. First, you need to start the zookeeper server with the below command: sudo nohup /usr/local/Kafka/bin/zookeeper-server-start.sh /usr/local/Kafka/config/zookeeper.properties > /tmp/zk.out 2>/tmp/zk.err & Enter the SimpliLearn password, if asked for password. Note that sudo is used so that you have proper permissions. The & (ampersand) is added at the end so that the process runs in the background. For background processes, nohup is added in the beginning so that the background process does not end, even if your session is terminated. The standard output from the server is sent to /tmp/zk.out file and the standard error is sent to /tmp/zk.err file with the 2> option.
Next, start the Kafka server with the below command: sudo nohup /usr/local/Kafka/bin/Kafka-server-start.sh /usr/local/Kafka/config/server.properties > /tmp/Kafka.out 2>/tmp/Kafka.err & sudo and nohup are used here in the same way as explained in the previous command. Next, let us look at creating directories for Storm. Create lib directory for Storm using the below command: sudo mkdir -p /var/lib/storm sudo chmod 777 /var/lib/storm As you have learnt how to install Storm and create the directories for Storm. Next, you will learn how to configure Storm. Given here are the changes required for Storm configuration. Change directory to the Storm configuration directory using the command mentioned below: cd /usr/local/storm/conf Edit the storm.yaml file using the command: vi storm.yaml. Replace these lines #storm.zookeeper.servers: # - “server1” • with storm.zookeeper.servers: - “localhost” Specify the address of the nimbus host: nimbus.host: “nimbus1” Change nimbus1 to the IP address of the machine. Now, you will configure the storm memory parameters. Continue modifying the same file to specify the memory for java processes, you can start with 128MB for all the processes. Add the lines mentioned below. If the lines already exist, modify to change the numbers. nimbus.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true" ui.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true" supervisor.childopts: "-Djava.net.preferIPv4Stack=true" worker.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true" Specify the data directory for Storm: storm.local.dir: "/var/lib/storm" Press escape and save the storm.yaml file with :wq. Next, let us look at starting the Storm servers. Let us start the Storm nimbus and supervisor servers. You will also need to start the Storm UI to monitor through web interface. Start Storm nimbus server on master node: nohup bin/storm nimbus >/tmp/nimbus.out 2>/tmp/nimbus.err & Start Storm supervisor on each worker node: nohup bin/storm supervisor > /tmp/supervisor.out 2>/tmp/supervisor.err & Start the Storm UI for monitoring; you can check this at port 8080 using your favourite browser. nohup storm ui >/tmp/ui.out 2>/tmp/ui.err & Next, let us run a sample Storm program. Here, you will run a sample program created by SimpliLearn that processes the logfile. cd /tmp wget simplilearncdn/logfile wget simplilearncdn/LogProcessTopology.jar storm jar LogProcessTopology.jar storm.starter.LogProcessTopology test1 storm list The command mentioned above will give the following output: Topology_name Status Num_tasks Num_workers Uptime_secs ------------------------------------------------------------------- test1 ACTIVE 7 1 23 Next, let us check the output of the sample Storm program. The output of the sample program is in /tmp/stormoutput.txt directory. You can check the content of this file with the command: cat /tmp/stormoutput.txt. The output will be displayed as shown below: INFO:1 ERROR:1 WARNING:1 ERROR:2 WARNING:2 INFO:2 ERROR:3 WARNING:3 ERROR:4 WARNING:4 Note that the actual output might be different in your case. Next, let us check the Storm UI. You can check the storm processes using Storm UI at port 8080. Use your browser (Firefox or Chrome) and IP_address:8080, where IP_address will be the IP address of your virtual machine. The diagram shows the Storm UI from the browser at port 8080. It shows the cluster summary, topology summary, supervisor summary as well as Nimbus server configuration parameters. You can see that the topology test1 is currently running. Now, you will learn how to stop the Storm topology. You can stop the running storm topology with the help of the following command: storm kill test1 Verify that the topology is not running with storm list. This will produce the following output: Topology_name Status Num_tasks Num_workers Uptime_secs ------------------------------------------------------------------- test1 KILLED 7 1 315 Let us now check the log files.
To setup a multi-node cluster, let us take an example of setting up a 3 node cluster with nodes and IP addresses: node1, node2 and node3 First, install Kafka and Storm on each machine as discussed earlier. That is, download the Kafka and Storm tarballs, unzip the compressed archive and move the expanded directory to /usr/local/Kafka and /usr/local/storm respectively. This has to be done on each of the three nodes. Moving on to the second step of setting up multi-node Storm cluster. Setup zookeeper on each node: cd /usr/local/Kafka/config vi zookeeper.properties Add the following lines if they are not present: initLimit=5 syncLimit=2 maxClientCnxns=0 server.1=node1:2888:3888 Server.2=node2:2888:3888 Server.3=node3:2888:3888 Press Escape and save the file with :wq. Then, exit the editor. Note that node1, node2, node3 are the IP addresses of the 3 servers. The third step is to: Setup the myid file for zookeeper. The command mentioned below is used to create the myid file for zookeeper: • On node1: echo 1 > /tmp/myid sudo cp /tmp/myid /tmp/zookeeper/myid • On node2: echo 2 > /tmp/myid sudo cp /tmp/myid /tmp/zookeeper/myid • On node3: echo 3 > /tmp/myid sudo cp /tmp/myid /tmp/zookeeper/myid Note that the content of myid file is different on each server. Moving on to the fourth step. The storm broker properties need to be set up for which the changes mentioned below are required for storm configuration on each machine: cd /usr/local/storm/conf vi storm.yaml • Replace the following lines: #storm.zookeeper.servers: # - “server1” • with storm.zookeeper.servers: - “node1” -”node2” -”node3”Specify the address of the nimbus host. Please note that you will have Nimbus running on only the master node – node1 in our cluster. nimbus.host: “node1” Moving on to the fifth step of the set up. Some more changes are made to storm.yaml file. For childopts, specify the memory for child opts, you can start with 128M nimbus.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true" ui.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true" supervisor.childopts: "-Djava.net.preferIPv4Stack=true" worker.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true" Specify the data directory for storm: storm.local.dir: "/var/lib/storm" Press escape and save the storm.yaml file with :wq.
Finally, the last step of the set up is to Start the zookeeper server on each node. sudo nohup /usr/local/Kafka/bin/zookeeper-server-start.sh /usr/local/Kafka/config/zookeeper.properties > /tmp/zk.out 2>/tmp/zk.err & Start the Storm Nimbus server on node1: nohup storm nimbus > /tmp/nimbus.out 2>/tmp/nimbus.err & Start the Storm supervisor process on each node: nohup storm supervisor > /tmp/supervisor.out 2>/tmp/supervisor.err & This completes the setup of the multi-node Storm cluster. We have come to the end of this lesson. Now, let’s do a small quiz to test your knowledge.
3.10 Install Storm Demo 02
You need to follow the same steps to download Storm download: Unzip the package using tar utility: tar -xzf apache-storm-0.9.5.tar.gz Move to proper directory: sudo mv apache-storm-0.9.5.tar.gz /usr/local/storm Next, let us set up path for Kafka and Storm.
A few questions will be presented on the following screens. Select the correct option and click submit to see the feedback.
Here are the key takeaways. • Storm has multiple versions and the latest stable version of Storm is 0.9.5 • Choose proper OS and machine configurations before the installation is started. • Kafka installation is used to install zookeeper. • Storm can be installed by downloading the latest tarball. • After the installation of zookeeper and Storm, both of them need to be configured. • After the configuration of zookeeper and Storm changes, the zookeeper server has to be started before starting Storm. • Storm command can be used to submit a topology to Storm. • To set up a multi-node Storm cluster, a six stepped process needs to be followed.
This concludes the lesson: Introduction to the Installation and Configuration of Storm. In the next lesson, you will learn about the advanced Storm concepts.
About the On-Demand Webinar
About the Webinar