No Batteries Required - Home
Working with Hadoop: A practical Guide – Part 3
Ray Kahn
AUG 13, 2013 14:40 PM
A+ A A-

In part 1 & 2, I set out to briefly explain Hadoop’s ecosystem and my choice of its distribution. In this section I will walk you through the deployment and configuration of Cloudera manager as well as creating a cluster of Cloudera client hosts (aka nodes). A word of caution: this is not for the faint of hearts and you will need sudo/admin privileges to be able to properly configure the management node as well as the client hosts.

My Choice of Hardware & OS

First off, I am using physical servers. I could have gone the virtual server route, which I will eventually do and write about my experience in this blog, but for now I have decided to test the waters with a 3 server set-up: 1 management node and 2 client hosts. My client servers are old and not very powerful as I requisitioned old servers which were just lying about. The client nodes have Intel 2 core P4 CPU 3.2 GHz, 1 Gig of RAM and a SCSI WDC 75 Gig disk. I am certain that I will need to install additional RAMs on these servers as when the full Cloudera’s software suit runs on them their throughput will come close to nil. I installed Ubuntu 12.04 64 bit as OS. Ubuntu installation is the easiest part of this process. I used a CD installation for Ubuntu.

The management node is a much more powerful server. It has a quad core Intel i5-2400 3.10 GHz CPU with 4 Gig RAM and 225 Gig of storage.

Pre-Requisites

Cloudera’s installation requires a few important pre-steps:

1. You need sudo/su/admin privileges: There isn’t much that can be done without this: You either have sudo access or you don’t; if not, you need your systems admin’s assistance during this process. Since I was building the cluster myself I had full privilege, beginning with Ubuntu deployment to my servers. Sudo is needed to make sure that you have configured your hostname and hosts file correctly.

2. Must have/install ssh on all your servers, management as well as client nodes: This also requires sudo/admin rights. To install ssh use the following commands:

o   sudo apt-get install openssh-client (installs the client)

o   sudo apt-get install openssh-server (installs the server)

3. Must setup passwordless SSH access to your client nodes: Since I will be running all of my deployments and configurations as a sudoer I will need to make sure that the client server allows passwordless logins to the said client. In the sudoers file you need to add the following at the bottom of the page after #include directive:

o   $username ALL=(ALL) NOPASSWD: ALL (Replace $username with the user that will be sshing to the client server(s) and installing and configuring Cloudera’s software bundle). Obviously I have just created a massive security hole and must address that at some point; but that is something which I will remedy later on.

4. Must configure your hostname & hosts file properly:

o   Client Servers: You must make sure that the name you have assigned to the host also exists in the hosts file. However you must provide a Fully Qualified Domain Name in the hosts file. As an example I am providing the host name of one of the clients:

§  In /etc/hostname: cloudera-node1

§  In /etc/hosts: I have several declarations (replace x with the ip address of your server):

·         127.0.0.1 localhost

·         127.0.1.1 www.cloudera-node1.com cloudera-node1

·         10.32.xxx.xx www.cloudera-node1.com

o   For Cloudera manager to communicate correctly with the client hosts it must know the FQDN of the hosts.

§  In /etc/hostname: clouder-manager

§  In /etc/hosts: I have several declarations (replace x with the ip address of your server):

·         127.0.0.1 localhost

·         127.0.1.1 www.cloudera-manager.com cloudera-manager

·         10.32.xxx.xx www.cloudera-node2.com

·         10.32.xxx.xx www.cloudera-node1.com

 

Installing Cloudera’s Hadoop Package

Cloudera installation is quite easy. Cloudera Standard can be downloaded from http://www.cloudera.com/content/support/en/downloads.html. There is a short registration form and after that you are prompted to save cloudera-manager-installer.bin file. You will need to change the permissions on this file to be an executable file. The bin will install CDH, Impala 1.1, Cloudera Search, Sentry and Cloudera Manager. Please refer to the documentation pages for more information about these packages.

Installation is straight forward if you have done the pre-requisites correctly. If not, be prepared for a couple of days of heartache as I did. But if you have your hosts name and ssh configured correctly everything else is a breeze as the manager will walk you through the installation. Mind you this is a long installation process so make sure you have set aside ample time to do this.

What’s Next?

In the following weeks and months I will start writing my analytics filter and create a web UI as a BI dashboard. In the meantime let me know if you are having difficulty installing and configuring Cloudera distribution.

If you or your company is interested in more information on this topic and other topics, be sure to keep reading my blog.  

Also, as I am with the IEEE Computer Society,  I should mention there are technology resources available 24/7 and specific training on custom topics available. Here is the IEEE CS program link if you are interested, TechLeader Training Partner Program, http://www.computer.org/portal/web/Corporate-Programs.

FIRST
PREV
NEXT
LAST
Page(s):
[%= name %]
[%= createDate %]
[%= comment %]
Share this:
Please login to enter a comment:
 
RESET