According to popular articles, Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.
A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment. Such clusters run Hadoop’s open source distributed processing software on low-cost commodity computers.
To prove this statement ,I have used tcpdump command throughout the practical.
What I planned to do ???
Step1
Create a Hadoop cluster with 2 data nodes ,configure everything and to make it ready for use .
Step2
Upload a file from Client side
At the same time both the data nodes should be ready with tcpdump command ,so that later we can read the packets .
Now ,I uploaded file from client side and received certain packets on both the data nodes .
In first data node I received packets.
In 2nd data node I received packets .
I started reading these packets with the help of below IPs.
Datanode 2 –Public_ip : 13.233.110.102 Private_ip : 172.31.36.159
Datanode 1- Public_ip : 65.0.74.41 Private_ip : 172.31.42.168
Namenode — Public _ip :15.206.93.255 Private_ip: 172.31.34.233
Client -Public_ip :13.233.73.201 Private_ip :172.31.41.169
Result
What I found is Very Intresting .
I found Client is uploading data in only first Datanode and rest replications are made by all Data nodes ,like I noticed when the Client uploads the data in the first data node .,at the same time first Data node creates a replica to the second Data node and the second Datanode creates a replica in third Data node .
I also noticed one thing that the first Datanode always ping the Name node so that Namenode would feel that there is a live node to which data is to be transferred and the second node pings to first data node and the third Datanode pings to second node to mark it as a Live Datanode .
One more main thing I want to state is Hadoop doesnot uses the concept of parallelism .The method I explained is the concept used by Hadoop for fulfilling Velocity problem.