For a quorum, more than half the nodes must be available over the internal network. A seven-node cluster, for example, requires a four-node quorum. A 10-node cluster requires a six-node quorum. If a node is unreachable over the internal network, OneFS separates the node from the cluster, an action referred to as splitting. After a cluster is split, cluster operations continue as long as enough nodes remain connected to have a quorum.
In a split cluster, the nodes that remain in the cluster are referred to as the majority group. Nodes that are split from the cluster are referred to as the minority group.
When split nodes can reconnect with the cluster and re-synchronize with the other nodes, the nodes rejoin the cluster's majority group, an action referred to as merging.
A OneFS cluster contains two quorum properties:
- read quorum (efs.gmp.has_quorum)
- write quorum (efs.gmp.has_super_block_quorum)
By connecting to a node with SSH and running the sysctl command-line tool as root, you can view the status of both types of quorum. Here is an example for a cluster that has a quorum for both read and write operations, as the command output indicates with a 1, for true:
sysctl efs.gmp.has_quorum efs.gmp.has_quorum: 1 sysctl efs.gmp.has_super_block_quorum efs.gmp.has_super_block_quorum: 1
The degraded states of nodes—such as smartfail, read-only, offline—effect quorum in different ways. A node in a smartfail or read-only state affects only write quorum. A node in an offline state, however, affects both read and write quorum. In a cluster, the combination of nodes in different degraded states determines whether read requests, write requests, or both work.
A cluster can lose write quorum but keep read quorum. Consider a four-node cluster in which nodes 1 and 2 are working normally. Node 3 is in a read-only state, and node 4 is in a smartfail state. In such a case, read requests to the cluster succeed. Write requests, however, receive an input-output error because the states of nodes 3 and 4 break the write quorum.
A cluster can also lose both its read and write quorum. If nodes 3 and 4 in a four-node cluster are in an offline state, both write requests and read requests receive an input-output error, and you cannot access the file system. When OneFS can reconnect with the nodes, OneFS merges them back into the cluster. Unlike a RAID system, an Isilon node can rejoin the cluster without being rebuilt and reconfigured.