Cassandra Installation and Configuration Tutorial

Cassandra Installation and Configuration

Welcome to the fourth lesson ‘Cassandra Installation and Configuration’ of the Apache Cassandra Certification Course. This lesson will cover the steps to install and configure Cassandra.

Let us begin with the objectives of this lesson.

Objectives

After completing this lesson on Cassandra Installation and Configuration, you will be able to:

  • State the various versions of Cassandra.

  • Explain the steps to install and configure Cassandra on the Ubuntu system.

  • List the steps to install Cassandra on CentOS.

Let us know about Cassandra versions in the next section.

Willing to have in-depth knowledge of Apache Cassandra? Click to get complete course details!

Cassandra Versions

Cassandra has multiple versions as mentioned below. You need to choose the right version for installation.

  • Version 1.0, released on October 17, 2011, was the first production version.

  • Version 1.2 was released on January 02, 2013 with virtual nodes added.

  • Version 2.0 was released on September 04, 2013, which added lightweight transactions.

  • Version 2.1 is the latest, which supports Cassandra Query Language or CQL 3.0. This was released on April 01, 2015.

Cassandra is an open source product supported by DataStax enterprise. DataStax provides the package installations as well as drivers for Cassandra.

Let us discuss the steps to install Cassandra on Ubuntu System in the next section.

Steps to Install and Configure Cassandra on Ubuntu System

To install and configure Cassandra on the Ubuntu system, perform the following steps:

  1. Select the operating system.

  2. Select the machine.

  3. Prepare for installation.

  4. Setup repository.

  5. Install Cassandra

  6. Check the installation.

  7. Configure Cassandra

  8. Configure the single-node cluster.

  9. Configure the multi-node cluster.

  10. Setup property file.

  11. Configure the production cluster.

  12. Setup gossiping property file.

  13. Start the Cassandra services.

  14. Connect to Cassandra.

Each step will be discussed in detail in the following sections.

Step 1 - Operating System Selection

You can choose any of the Linux operating systems for installation. Some of the examples of the Linux operating systems are as follows:

  • Ubuntu 12.04 or later version installed on a virtual machine

  • Red Hat Enterprise Linux, referred to as RHEL

  • CentOS, a free version of RHEL, and

  • Debian systems.

Also, you can also choose to install Cassandra on Windows 7 or 8.

Step 2 - Machine Selection

Cassandra needs a good memory and adequate processing power. The recommended machine configurations for the Cassandra cluster are as follows:

For development systems

  • Minimum of 2GB RAM

  • Two CPUs, and

  • 1 TB hard disk.

For production systems

  • Minimum of 16 GB to a maximum of 96 GB of RAM per machine

  • An 8-Core CPUs, 2 gigahertz and above processors, and

  • Four 2 TB hard disks.

Step 3 - Preparing for Installation

The prerequisite software for installing Cassandra are:

  • Java JRE 1.7 or higher. Open JRE works. However, Oracle JRE is recommended.

  • Python, for some Cassandra tools.

  • Extra Packages for Enterprise Linux or EPEL, for some systems.

Step 4 - Setup Repository

DataStax provides packaged installation of Cassandra for many operating systems. You need to configure the repository for the Cassandra installation. You can provide instructions for the Ubuntu system and identify the commands for other systems from the DataStax site.

There are two steps:

  • First, add the DataStax repository to the sources list. The command for this is as shown:

echo "deb http://debian.DataStax.com/community stable main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

Note the pipe character or vertical bar in this command.

This command adds the repository line to the file Cassandra.sources.list in the

/etc/apt/sources.list.d directory.

  • Second, add the DataStax repository key to trusted keys. The command for this is shown:

curl -L http://debian.DataStax.com/debian/repo_key | sudo apt-key add –

Note the pipe character of vertical bar in this command. Also, note that there is a hyphen character at the end.

This command gets the repository key from DataStax and adds it to the aptitude keys of Ubuntu.

Step 5 - Install Cassandra

After the repository is setup, update the packages and install Cassandra.

To update the packages, use the command given:

sudo apt-get update

This command leads to a series of messages while updating the repository.

Enter Y when asked to confirm any updates.

Once packages are updated, install Cassandra package with the command given:

sudo apt-get install dsc21=2.1.1-1 cassandra=2.1.1

Note that the version being installed is 2.1.1, which is the stable version of Ubuntu. Once Cassandra is installed on the system, its services start automatically.

Step 6 - Check the Installation

After installing Cassandra, it is important to check the installation. Go to the configuration directory of Cassandra to check the configuration files.

Go to the configuration directory /etc/Cassandra and use the command given:

cd/etc/cassandra

This command will change the directory.

Note that Linux is case sensitive. Therefore, all the letters in this command must be in the lower case. Note that on some systems, the configuration directory will be /etc/cassandra/conf.

Next, do a directory listing to check the configuration files, using the command given:

is -l

This command will list the configuration files, such as cassandra-env.sh, cassandra-topology.properties, and cassandra.yaml.

Step 7 - Configuring Cassandra

The configuration files that are normally modified after the installation are as follows:

  • The first file is cassandra-env.sh - This is a Linux shell script for setting up environment properties, such as Java heap size and JVM parameters.

  • The next file is cassandra.yaml - This is the main configuration file used to customize the Cassandra cluster. You can set parameters, such as cluster name, number of virtual nodes, data file location, seed providers, listen address, and Remote Procedure Call or RPC ports in this file.

  • The last file is cassandra-topology.properties - This is the cluster topology specification file. It contains the list of nodes and their topologies, such as datacenter and rack configuration.

You can open and check the default configuration in these files using the command: vi

Step 8 - Configuration for a Single-Node Cluster

For a single-node cluster, take the default configuration and modify only the cluster name. All addresses will be localhost, which is the same as 127.0.0.1.

The contents of cassandra.yaml are:

cluster_name: 'Simplilearn Cluster‘

num_tokens: 256

Data_file_directories:

- /var/lib/cassandra/data

Seed_provider:

- seeds: "127.0.0.1“

listen_address: localhost

native_transport_port: 9042

endpoint_snitch: SimpleSnitch

Some of the key points are:

  • The cluster name is changed to Simplilearn Cluster.

  • Other parameters, such as num_tokens and data file directories are set by default. These parameters need not be modified.

  • The seed provider is set to 127.0.0.1, whereas listen_address is set to localhost, as both are same.

  • The endpoint snitch is set to SimpleSnitch as the property file is not needed for this cluster.

Step 9 - Configuration for a Multi-Node and Multi-Datacenter Clusters

For a multi-node and multi-datacenter clusters, specify the node addresses and the cluster topology through the cassandra-topology.properties file.

cassandra.yaml contains the default settings, as in case of a single-node cluster.

The contents are:

cluster_name: 'Simplilearn Cluster’

num_tokens: 256

Data_file_directories:

- /var/lib/cassandra/data

Seed_provider:

seeds: "127.0.0.1“

listen_address: localhost

native_transport_port: 9042

endpoint_snitch: SimpleSnitch

 

Some of the key points are:

  • Multiple seed nodes must be specified.

  • The listen address also can be specified as a node address instead of localhost.

  • The endpoint snitch should point to the property file by using the PropertyFileSnitch.

Step 10 - Setup Property File

The cassandra-topology.properties file contains the cluster topology for the entire cluster, while PropertyFileSnitch is used as the snitch.

The contents of the sample file are:

# Cassandra Node IP=Data Center:Rack

192.168.1.100=DC1:RAC1

192.168.2.200=DC2:RAC2

10.0.0.10=DC1:RAC1

10.0.0.11=DC1:RAC1

10.0.0.12=DC1:RAC2

# default for unknown nodes

default=DC1:r1

The lines starting with the hash are the comments that will be ignored. Each line contains the data center and rack information for a node in the cluster. It has the following format:

IP address = datacenter name: Rack Name

For example, if a node with IP address 192.168.1.100 is located in rack RAC1 of the datacenter DC1, then the line for this node will be:

192.168.1.100=DC1:RAC1

Further, you can also specify a default configuration to use for nodes that are not listed in the file. To do so, use the word default for the IP address as default=DC1:r1.

Step 11 - Configuration for a Production Cluster

For a production cluster, specify the node addresses, cluster topology through the cassandra-rackdc.properties file, and the snitch as GossipingPropertyFileSnitch.

The gossip protocol is used to propagate the topology information.

Contents of cassandra.yaml are:

cluster_name: 'Simplilearn Production Cluster‘

num_tokens: 256

Data_file_directories:

- /var/lib/cassandra/data seed_provider: #List of seed nodes to use for gossip bootstrap

- seeds: “node1, node2, node3“

listen_address: node1 #Address of this node

native_transport_port: 9042

endpoint_snitch: GossipingPropertyFileSnitch

This configuration will be similar to the multi-datacenter configuration. However, the end point snitch will be specified as GossipingPropertyFileSnitch.

Step 12 - Setup Gossiping Property File

The file cassandra-rackdc.properties contain the cluster topology information for the current node, while GossipingPropertyFileSnitch is used as the snitch.

The contents of the sample gossiping property file are:

# These properties are used with GossipingPropertyFileSnitch and

will

# indicate the rack and dc for this node

dc=DC1

rack=RAC1

A sample file contains the datacenter and rack information only for that node.

In each file:

  • the datacenter for the node is specified using dc= line, and

  • the rack for the node is specified using rack= line.

This reduces the amount of information shared during the gossip protocol. This completes the process of installing and configuring Cassandra.

Step 13 - Starting Cassandra Services

After the configuration files are set up, start the Cassandra services. Typically, the installer starts the service immediately. In such a case, stop the service, remove the data, and then restart the service.

First, check whether the service is running using the given command:

sudo service cassandra status

This will indicate if the service is running. If the status shows running, stop the service using the given command:

sudo service cassandra stop

Remember to enter the password as simplilearn when the sudo command prompts.

This command will stop the running Cassandra service.

Next, remove the existing data directory of Cassandra, as it will be using the data directory from the previous configuration. Note that this needs to be done only once before you put any valuable data into Cassandra. Remove the existing data directories using the given command:

sudo rm -rf /var/lib/cassandra/data/system/*

Finally, start the Cassandra service using the given command:

sudo service cassandra start

This will start the Cassandra services. For a multi-node setup, this entire process has to be done on each node.

Step 14 - Connecting to Cassandra

Once the Cassandra service is running, you can connect to Cassandra using the Cassandra command line interface.

First, set up the host to connect to, using the CQLSH_HOST environment variable. You can set this variable using the given command:

export CQLSH_HOST=localhost

Note that the commands are case sensitive.

After this, you can start the command line interface with the given command – cqlsh

In the cqlsh prompt window, type help and then press the Enter key. This shows a list of commands provided by Cassandra. Next, you can type exit and press the Enter key to leave the Cassandra command line interface.

This completes the process of Cassandra installation and configuration.

Looking for more information on Apache Cassandra? Why not consider enrolling in our Course!

Installing on CentOS

In addition to Ubuntu, CentOS is another popular Linux distribution. To install Cassandra on CentOS, it is recommended to use yum instead of apt.

The instructions to install Cassandra on CentOS are:

First, check whether EPEL is installed using the given command:

sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm

Next, add the yum repository specification for DataStax to the repositories using the commands as shown:

sudo cat > /etc/yum.repos.d/DataStax.repo <<here

[DataStax]

name= DataStax Repo for Apache Cassandra

baseurl=http://rpm.DataStax.com/community

enabled=1

gpgcheck=0

here

Note that the commands are case sensitive.

Next, you can install the latest version of Cassandra. Version 2.0 is recommended as it is a stable version of CentOS. Use the given command to install Cassandra:

sudo yum install dsc20-2.0.11-1 cassandra20-2.0.11-1

Finally, check the configuration directory. The configuration directory on CentOS is /etc/cassandra/conf.

To change the configuration files and start the Cassandra services, follow the same instructions as used in Ubuntu

Summary

Let us summarize the topics covered in this lesson.

  • Cassandra has multiple versions, and the latest versions are 2.0 and above.

  • It is important to choose the right operating system and machine configurations before installing Cassandra.

  • Cassandra can be installed using the DataStax repository.

  • Cassandra configuration files are stored in /etc/cassandra or /etc/cassandra/conf.

  • cassandra.yaml is the main configuration file for Cassandra.

  • Use the SimpleSnitch, PropertyFileSnitch or GossipingPropertyFileSnitch based on the type of cluster.

  • Cassandra services can be started as a Linux service.

  • cqlsh is the command line interface used to connect to Cassandra.

Conclusion

This concludes the lesson on Installation and Configuration of Cassandra. The next lesson will focus on Cassandra Data Model.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Name*
Email*
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Email*
Phone Number*
Company*
Job Title*