Configure Hadoop and start cluster services using Ansible Playbook.
To configure hadoop using playbook ,We first need to install ansible on our controller node .
Ansible playbook only supports yaml file .
In the ansible playbook
We have to ,
First configure and start the services of NameNode .
We need to update the inventory file with the IP’s and Authentication details .
After updating the inventory file, We have to configure ansible configuration file.
vim /etc/ansible/ansible.cfg
In ansible configuration file, We need to update the location of inventory files and also we need to configure ansible not to check host-key ,This is because whenever first time we do remote login they always ask for host-key checking for the first time .
To check all the list of hosts ,So that we can confirm our configuration has gone right.
ansible all — list-hosts
To check all IP on the inventory is ping able or not ,So that we can continue with the next step.
ansible all -m ping
To do this Task I have created 192.168.43.233 as NameNode and 192.168.43.76 as DataNode .
So lets start ,
I created a separate variable file to keep the ansible playbook organised so that if in future if any changes is required . Anyone can do it easily.
I created a file named nn.yml also called as parameter file.
vim /root/ansible_wp/nn.yml
I created a file named namenode.yml to configure any server as Name Node .
vim /root/ansible_wp/namenode.yml
In namenode.yml
I created 192.168.43.233 as host.
I attached the attribute file in this playbook.
Then assigned Tasks
STEP 1
Copying jdk rpm from controller node to the managed node.
STEP 2
Installing that jdk rpm in Managed node .
STEP 3
Copying hadoop rpm to the Managed node.
STEP 4
Installing hadoop rpm on the Managed node .
STEP 5
Copying the configured core-site.xml file and hdfs-site.xml to the managed node as it overwrites that file if any.
STEP 6
Deleting the Name Node folder if any named /nn1 ,it will be deleted and created a new one with the same name.
STEP 7
Formatting the Name Node inorder to start the Hadoop Cluster ,We have to format Name Node folder.
STEP 8
Stopping the NameNode service ,so if any NameNode service is in running it will stop.
STEP 9
Starting the NameNode service.
STEP 10
Checking whether the NameNode is ready or not.
STEP 11
Creating a Firewall rule for port no 9001 .To let Data Node to connect to the Name Node portno 9001.
Now ,
ansible-playbook namenode.yml
I created a file named dn.yml also called as parameter file.
I created a file named datanode.yml to configure any server as Data Node.
In datanode.yml
I created 192.168.43.76 as host
I attached the attribute file in this playbook.
Then,
I used variable_prompt because i wanted anyone who run this playbook first check whether ip on the attribute file is correct as Name Node.
Tasks
STEP 1
Stopping the Data Node ,if any Data Node is running .
STEP 2
Copying jdk rpm from controller node to the managed node.
STEP 3
Installing that java rpm in Managed node .
STEP 4
Copying Hadoop rpm to the managed node.
STEP 5
Installing Hadoop rpm on the managed node .
STEP 6
Copying the configured core-site.xml file and hdfs-site.xml to the managed node as it overwrites that file if any.
In core-site.xml file
I created a variable ip for the future use if any Name Node ip changes in future we can dynamically change the ip just by going inside the attributes file ,ie dn.yml
STEP 7
Deleting the datanode folder if any named /dn1 ,it will be deleted and created a new one with the same name .
STEP 8
Starting the Data Node service .
STEP 9
Checking whether Data Node is ready or not .
STEP 10
Taking the whole report of hadoop cluster
Now ,
ansible-playbook namenode.yml
By hadoop report command ,We can confirm that Data Node has been successfully connected to Name Node.