Cloudera is a software that provides a platform for data analytics, data warehousing, and machine learning. Initially, Cloudera started as an open-source Apache Hadoop distribution project, commonly known as Cloudera Distribution for Hadoop or CDH. It contains Apache Hadoop and other related projects where all the components are 100% open-source under Apache License.
Cloudera provides virtual machine images of complete Apache Hadoop clusters, making it easy to get started with Cloudera CDH.
What Is Cloudera QuickStart VM?
Cloudera QuickStart VM includes everything that you would need for using CDH, Impala, Cloudera Search, and Cloudera Manager. The Cloudera QuickStart VM uses a package-based install that allows you to work with or without the Cloudera Manager. It has a sample of Cloudera’s platform for “Big Data.”
Now that you have a brief understanding of what Cloudera QuickStart VM is, let’s have a look at the prerequisites to install Cloudera QuickStart VM.
Cloudera QuickStart VM Installation - Prerequisites
- A virtual machine such as Oracle Virtual Box or VMWare
- RAM of 12+ GB. That is 4+ GB for the operating system and 8+ GB for Cloudera
- 80GB hard disk
Downloading the Cloudera QuickStart VM
- The Cloudera QuickStart VMs are openly available as Zip archives in VirtualBox, VMware and KVM formats. To download the VM, search for https://www.cloudera.com/downloads.html, and select the appropriate version of CDH that you require.
Fig: Download Cloudera QuickStart VM
- Click on the ‘GET IT NOW’ button, and it will prompt you to fill in your details.
- Once the file is downloaded, go to the download folder and unzip these files. It can then be used to set up a single node Cloudera cluster.
- Shown below are the two virtual images of Cloudera QuickStart VM.
- Now that the downloading process is done with, let's move forward with this Cloudera QuickStart VM Installation guide and see the actual process.
Cloudera QuickStart VM Installation
- Before setting up the Cloudera Virtual Machine, you would need to have a virtual machine such as VMware or Oracle VirtualBox on your system.
- In this case, we are using Oracle VirtualBox to set up the Cloudera QuickStart VM.
- In order to download and install the Oracle VirtualBox on your operating system, click on the following link: Oracle VirtualBox.
- To set up the Cloudera QuickStart VM in your Oracle VirtualBox Manager, click on ‘File’ and then select ‘Import Appliance’.
Fig: Importing the Cloudera QuickStart VM image
- Choose the QuickStart VM image by looking into your downloads. Click on ‘Open’ and then ‘Next’. Now you can see the specifications, then click on ‘Import’. This will start importing the virtual disk image .vmdk file into your VM box.
- Once this is done, we have to change the specifications of the machines to use. Here, we are giving 2 CPU cores and 5GB RAM. Wait for a while, as the importing finishes. The next step is to go ahead and set up a Cloudera QuickStart VM for practice. Once the importing is complete, you can see the Cloudera QuickStart VM on the left side panel.
Fig: Cloudera VM set up successful
- Now, to give more RAM and CPU cores, click on ‘Settings’, followed by ‘System’, and increase the RAM to 5GB. Click on the processor and assign 2 CPU cores. Subsequently, select ‘Network’. The Adapter 1 settings should be NAT by default. Click on ‘OK’ next.
- Now you are required to start the machine, so that it uses 2 CPU cores, 5GB RAM, and brings up the Cloudera QuickStart VM.
- The next step will be going ahead and starting the machine by clicking the ‘Start’ symbol on top.
- Once your machine comes on, it will look like this:
- Next, we have to follow a few steps to gain admin console access. You need to click on the terminal present on top of the desktop screen, and type in the following:
hostname # This shows the hostname which will be quickstart.cloudera
hdfs dfs -ls / # Checks if you have access and if your cluster is working. It displays what exists on your HDFS location by default
service cloudera-scm-server status # Tells what command you have to type to use cloudera express free
su - #Login as root
service cloudera-scm-server status # The password for root is cloudera
- Once you see that your HDFS access is working fine, you can close the terminal. Then, you have to click on the following icon that says ‘Launch Cloudera Express’.
- Once you click on the express icon, a screen will appear with the following command:
- You are required to copy the command, and run it on a separate terminal. Hence, open a new terminal, and use the below command to close the Cloudera based services. It will restart the services, after which you can access your admin console.
Fig: Restarting services on Cloudera QuickStart VM
- Now that our deployment has been configured, client configurations have also been deployed. Additionally, it has restarted the Cloudera Management Service, which gives access to the Cloudera QuickStart admin console with the help of a username and password.
- Go on and open up the browser and change the port number to 7180.
- You can log in to the Cloudera Manager by providing your username and password.
Fig: Logging in to Cloudera Manager
- Since Cloudera is CPU and memory intensive, it could slow down if you haven’t assigned enough RAM to the Cloudera cluster. So, it’s always recommended to stop or delete the services that you don’t need.
- Next, click on the drill-down button associated with each service and select delete to remove it.
Fig: Deleting unnecessary services on Cloudera QuickStart VM
- Before deleting any service, you must remove all the dependencies for that particular service. You can add services to your cluster at any point in time when you need it. You can also fix different configuration issues thereupon.
- In Cloudera Manager, you can fix the health issues or configuration issues within your cluster.
Fig: Solving Health and Configuration Issues on Cloudera QuickStart VM
- You can go ahead and restart the services now. It will ensure that the cluster becomes accessible either by Hue as a web interface or Cloudera QuickStart Terminal, where you can write your commands.
- You can switch to an HDFS user, which is the admin user. This usually does not have a password unless you have set it. Now, you can type any HDFS command in the terminal, which will give the output.
Cloudera QuickStart VM allows you to implement and administer Hadoop related tools and services effortlessly. In this article, we looked at what Cloudera QuickStart VM is, and what the prerequisites are to install Cloudera QuickStart VM.
We also understood how to download the Cloudera QuickStart VM on windows. Finally, we demonstrated a step-by-step process to install and configure Cloudera QuickStart VM.
To learn more about Cloudera QuickStart VM, click on the following video link: Cloudera QuickStart VM Installation
Want to know anything more about installing the Cloudera QuickStart VM? Comment on this article and our experts will get back to you at the earliest!