HDFS Federation vs High Availability

This blog post will compare and contrast HDFS Federation and High Availability, two approaches that were designed as a solution for the NameNode single point of failure issue.

Relation between Namenodes

The number of NameNodes in HDFS Federation is unrestricted, and they are not related to one another.
There are two NameNodes (Primary Namenode and Standby Namenode) in HDFS High Availability that are related to one another. Both standby and active NameNodes are operational at all times.

Updating Metadata

Each NameNode will have its own dedicated pool in the metadata pool that is shared by all NameNodes.
While standby NameNodes are inactive and periodically update their metadata, active NameNodes will start up one at a time.

Fault Tolerance

HDFS Federation offers fault tolerance, so if one NameNode goes down, the data of the other NameNode won’t be impacted.
It takes two different machines (for Primary NN and Standby NN) to use HDFS High Availability. The primary NameNode will be configured first, followed by the standby NameNode on the other system.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s