Setting up Hadoop multi-node cluster on Amazon EC2 Part 2

Let's Do Big Data...

In Part-1 we have successfully created, launched and connected to Amazon Ubuntu Instances. In Part-2 I will show how to install and setup Hadoop cluster. If you are seeing this page first time, I would strongly advise you to go over Part-1.

In this article

  • HadoopNameNode will be referred as master,
  • HadoopSecondaryNameNode will be referred as SecondaryNameNode or SNN
  • HadoopSlave1 and HadoopSlave2 will be referred as slaves (where data nodes will reside)

So, let’s begin.

1. Apache Hadoop Installation and Cluster Setup

1.1 Update the packages and dependencies.

Let’s update the packages , I will start with master , repeat this for SNN and 2 slaves.

$ sudo apt-get update

Once its complete, let’s install java

1.2 Install Java

Add following PPA and install the latest Oracle Java (JDK) 7 in Ubuntu

$ sudo add-apt-repository ppa:webupd8team/java

$ sudo apt-get update && sudo apt-get install oracle-jdk7-installer

Check if Ubuntu uses JDK 7


View original post 2,213 more words