HDFS Questions and Answers-2

Is it good practice to use HDFS for small data files?
Since NameNode is a pricey high performance device, using HDFS for numerous little files is not a good idea. It is not advisable to fill the NameNode space with extra metadata that is generated for each of the small multiple files. A lot of metadata files are produced when storing many tiny files on HDFS. It is difficult to store this metadata in RAM because each file, block, or directory requires 150 bytes for metadata. As a result, the total size of all the metadata will be excessive.

What is Yarn?
The processing framework of Hadoop, known as YARN (Yet Another Resource Negotiator), controls resources and gives processes an environment in which to run.

What are the functions of the Active and Passive NameNodes?
The Active NameNode is the NameNode that participates and operates in the Hadoop cluster. Similar to an active NameNode, a passive NameNode (Standby NameNode), only functions when the active NameNode fails. To make sure that the Hadoop cluster never goes without a NameNode, whenever the active NameNode fails, the passive NameNode or the standby NameNode takes its place.

What are the distinguishing features of Hadoop 1.x and Hadoop 2.x?
In Hadoop 1.x, NameNode is a single point of failure, but in Hadoop 2.x, Active & Passive NameNode feature was introduced to address this.
MRV2/YARN (ResourceManager & NodeManager) has taken the position of MRV1 (Job Tracker & Task Tracker).

What steps does NameNode take when a DataNode fails?
Each DataNode in the cluster sends a Heartbeat (signal) to NameNode on a regular basis, indicating that the DataNode is operating normally.

The blocks on a DataNode are listed in a block report. After a predetermined amount of time, a DataNode is considered dead if it fails to deliver a heartbeat message.

Using the replicas that were previously built, the NameNode duplicates the blocks of the dead node to another DataNode.

Is the “HDFS Block” and a “Input Split” the same thing?
The logical separation of the data is called “Input Split,” while the physical division is called “HDFS Block.” For storage, HDFS divides data into blocks, while MapReduce splits the data into input splits and assigns each split to a mapper function.

What does Hadoop’s Safe Mode do?
In Hadoop, the condition in which NameNode doesn’t replicate or delete blocks is known as Safe Mode. When operating in safe mode, NameNode solely gathers block report data from the DataNodes.

How to enter into safe mode and how to exit from it?
Below command is used to enter into Safe mode manually –
$ Hdfs dfsadmin -safe mode enter

Below command is used to leave Safe Mode manually –
$ hdfs dfsadmin -safe mode leave