To prove Client go to Datanode directly and read the data stored on DataNode .
As we know Hadoop cluster consist of NameNode , DataNodes , and Client .And the myth in market is Client go to NameNode and read the file on DataNode via NameNode ,But this is not True I have done a small research on this myth — Is actually the Client go to NameNode and then read the file on DataNode via NameNode or Does Client go to DataNode directly and read the data .
Practical Demonstration
I have a created a Hadoop cluster with 4 DataNodes ,1 NameNode , and with 1 Client.
All DataNodes are now connected to Namenode and the Client too is waiting too upload a file ,
Next I do is ,As the instances I have been using are having Centos7 so I runned tcpdump -i eth0 -n tcp port 50010 in all 4 DataNodes . As by this command we are commanding if any packet from outside comes inside through my network card eth0 to the port no 50010 just show me ,We use tcp as hdfs protocol comes inside tcp and whole hadoop cluster works on hdfs protocol .
Then after running the tcpdump command in all 4 DataNodes we read the file from the client side by using hadoop fs -cat /filename .
Now we can see 2nd data node is sending packets and we can confirm by seeing the ip of DataNode which is sending packet back to our Client .
Hence,it is now clear actually that the Client go to Datanode directly and read the data stored on DataNode .
I really appreciate that you are reading my post .