Configure Hadoop and start cluster services using Ansible Playbook.

To configure hadoop using playbook ,We first need to install ansible on our controller node .

Prasantmahato
6 min readDec 6, 2020

Ansible playbook only supports yaml file .

In the ansible playbook

We have to ,

First configure and start the services of NameNode .

We need to update the inventory file with the IP’s and Authentication details .

INVENTORY FILE

After updating the inventory file, We have to configure ansible configuration file.

vim /etc/ansible/ansible.cfg

In ansible configuration file, We need to update the location of inventory files and also we need to configure ansible not to check host-key ,This is because whenever first time we do remote login they always ask for host-key checking for the first time .

To check all the list of hosts ,So that we can confirm our configuration has gone right.

ansible all — list-hosts

CHECKING ALL HOSTS

To check all IP on the inventory is ping able or not ,So that we can continue with the next step.

ansible all -m ping

CHECKING CONNECTIVITY TO THE TARGET NODE

To do this Task I have created 192.168.43.233 as NameNode and 192.168.43.76 as DataNode .

So lets start ,

I created a separate variable file to keep the ansible playbook organised so that if in future if any changes is required . Anyone can do it easily.

I created a file named nn.yml also called as parameter file.

vim /root/ansible_wp/nn.yml

PARAMETER FILE

I created a file named namenode.yml to configure any server as Name Node .

vim /root/ansible_wp/namenode.yml

PLAYBOOK TO CONFIGURE HADOOP NAME NODE

In namenode.yml

I created 192.168.43.233 as host.

HOST

I attached the attribute file in this playbook.

INSERTING ATTRIBUTE FILE

Then assigned Tasks

STEP 1

Copying jdk rpm from controller node to the managed node.

COPYING JDK RPM

STEP 2

Installing that jdk rpm in Managed node .

INSTALLING JDK RPM

STEP 3

Copying hadoop rpm to the Managed node.

COPYING HADOOP RPM

STEP 4

Installing hadoop rpm on the Managed node .

INSTALLING HADOOP RPM

STEP 5

Copying the configured core-site.xml file and hdfs-site.xml to the managed node as it overwrites that file if any.

copying configured core-site.xml file and hdfs-site.xml

STEP 6

Deleting the Name Node folder if any named /nn1 ,it will be deleted and created a new one with the same name.

DELETING NAME NODE FOLDER

STEP 7

Formatting the Name Node inorder to start the Hadoop Cluster ,We have to format Name Node folder.

FORMATTING THE NAME NODE FOLDER

STEP 8

Stopping the NameNode service ,so if any NameNode service is in running it will stop.

STOPPING THE NAME NODE FOLDER

STEP 9

Starting the NameNode service.

STARTING THE NAME NODE FOLDER

STEP 10

Checking whether the NameNode is ready or not.

RUNNING JPS COMMAND

STEP 11

Creating a Firewall rule for port no 9001 .To let Data Node to connect to the Name Node portno 9001.

CREATING A FIREWALL RULE

Now ,

ansible-playbook namenode.yml

PLAYBOOK RUNNING

I created a file named dn.yml also called as parameter file.

ATTRIBUTE FILE FOR DATA NODE

I created a file named datanode.yml to configure any server as Data Node.

In datanode.yml

I created 192.168.43.76 as host

HOST

I attached the attribute file in this playbook.

INSERTING ATTRIBUTE FILE

Then,

I used variable_prompt because i wanted anyone who run this playbook first check whether ip on the attribute file is correct as Name Node.

USING vars_prompt.

Tasks

STEP 1

Stopping the Data Node ,if any Data Node is running .

STOPPING DATA NODE

STEP 2

Copying jdk rpm from controller node to the managed node.

COPYING JDK RPM

STEP 3

Installing that java rpm in Managed node .

INSTALLING JAVA RPM

STEP 4

Copying Hadoop rpm to the managed node.

COPYING HADOOP RPM

STEP 5

Installing Hadoop rpm on the managed node .

INSTALLING HADOOP RPM

STEP 6

Copying the configured core-site.xml file and hdfs-site.xml to the managed node as it overwrites that file if any.

Copying the configured core-site.xml file and hdfs-site.xml

In core-site.xml file

I created a variable ip for the future use if any Name Node ip changes in future we can dynamically change the ip just by going inside the attributes file ,ie dn.yml

CONFIGURED CORE-SITE.XML FILE

STEP 7

Deleting the datanode folder if any named /dn1 ,it will be deleted and created a new one with the same name .

DELETING DATA NODE FOLDER

STEP 8

Starting the Data Node service .

STARTING THE DATA NODE SERVICE

STEP 9

Checking whether Data Node is ready or not .

USING JPS COMMAND

STEP 10

Taking the whole report of hadoop cluster

CHECKING HADOOP CLUSTER REPORT

Now ,

ansible-playbook namenode.yml

DATANODE PLAYBOOK RUNNING

By hadoop report command ,We can confirm that Data Node has been successfully connected to Name Node.

ThankYou .

For any suggestions or query contact me on my LinkedIn .

--

--