hadoop-installation-ubuntu

Hadoop Setup
Hadoop Setup

Hadoop installation & configuration on Ubuntu

Prerequisites:
1. Java open-jdk 8
2. SSH Server - open ssh server & client
3. Hadoop dist - 2.9.1


$ sudo apt-get update
$ sudo apt-get install openjdk-8-jdk

$ whereis java or whereis jvm //To get the path of Java
General installation directory -  /usr/lib/jvm/java-8-openjdk-amd64

Adding Java JDK8 path to environment variables
$ sudo gedit /etc/profile or $ gedit .bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin

Verify the Java installation
$ java -version
$ javac -version



Search and Download Apache Hadoop 2.9.1 from Apache Hadoop site
And extract the file to a directory
/home/mm/softwares/



$ sudo gedit /etc/profile or $ sudo gedit .bashrc

export HADOOP_HOME=/home/mm/softwares/hadoop-2.9.1
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/"
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin



Installing Open SSH
$ sudo apt install openssh-server openssh-client -y

Create new user or proceed to key generation if you want to install ssh for current user
$ sudo adduser hdp
Change to New User
$ su - hdp

Key generation
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

Test the connection
$ ssh localhost
Accept the connection and add to the secure list



a. core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

b. hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.block.size</name>
<value>67108864</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/mm/softwares/hadoop-2.9.1/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/mm/softwares/hadoop-2.9.1/hdfs/datanode</value>
</property>

c. mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

d. yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>



File - HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

File - HADOOP_HOME/sbin/start-dfs.sh
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64



$ hdfs namenode -format



$ start-dfs.sh
$ start-yarn.sh



$ jps
Below processes should display
2566 SecondaryNameNode
2826 NodeManager
2700 ResourceManager
2190 NameNode
4351 Jps
2335 DataNode



localhost:50070/explorer.html

Compiled on SATURDAY, 03-AUGUST-2024, 01:15:25 PM IST

Comments

Popular posts from this blog

jenv-tool

hive-installation-in-ubuntu