ECS data protection

ECS protects data within a site by mirroring the data onto multiple nodes, and by using erasure coding to break down data chunks into multiple fragments and distribute the fragments across nodes. Erasure coding (EC) reduces the storage overhead and ensures data durability and resilience against disk and node failures.

By default, the storage engine implements the Reed-Solomon 12 + 4 erasure coding scheme in which an object is broken into 12 data fragments and 4 coding fragments. The resulting 16 fragments are dispersed across the nodes in the local site. When an object is erasure-coded, ECS can read the object directly from the 12 data fragments without any decoding or reconstruction. The code fragments are used only for object reconstruction when a hardware failure occurs. ECS also supports a 10 + 2 scheme for use with cold storage archives to store objects that do not change frequently and do not require the more robust default EC scheme.

The following table shows the requirements for the supported erasure coding schemes.

Table 1. Erasure encoding requirements for regular and cold archives
Use case Minimum required nodes Minimum required disks Recommended disks EC efficiency EC scheme
Regular archive 4 16 32 1.33 12 + 4
Cold archive 6 12 24 1.2 10 + 2

Sites can be federated, so that data is replicated to another site to increase availability and data durability, and to ensure that ECS is resilient against site failure. For three or more sites, in addition to the erasure coding of chunks at a site, chunks that are replicated to other sites are combined using a technique called XOR to provide increased storage efficiency.

The following table shows the storage efficiency that can be achieved by ECS where multiple sites are used.

Table 2. Storage overhead
Number of sites in replication group Storage overhead
Default (Erasure Code: 12+4) Cold archive (Erasure Code: 10+2)
1 1.33 1.2
2 2.67 2.4
3 2.00 1.8
4 1.77 1.6
5 1.67 1.5
6 1.60 1.44
7 1.55 1.40
8 1.52 1.37

If you have one site, with erasure coding the object data chunks use more space (1.33 or 1.2 times storage overhead) than the raw data bytes require. If you have two sites, the storage overhead is doubled (2.67 or 2.4 times storage overhead) because both sites store a replica of the data, and the data is erasure coded at both sites. If you have three or more sites, ECS combines the replicated chunks so that, counter intuitively, the storage overhead reduces.

When one node is down in a four nodes system, ECS starts to rebuild the EC on priority to avoid DU. As one node is down, EC segment separates to other three nodes, which results in segment number being greater than the EC code number. If the down node comes back, things go back to normal. When another node with the most number of EC segments goes down, the DU window is as large as the node NA window, when the node does not recover it causes DL.

EC retiring feature converts unsaved EC chunk into three mirror copies chunk for data safety. However, EC retiring has some limitations:

  • It increases system capacity, and protection overhead from 1.33 to 3.
  • When there is no node down situation, EC retiring introduce unnecessary IO.
  • The feature applies to four nodes system. EC retiring does not automatically trigger, you need to trigger it on demand using an API through service console.

For a detailed description of the mechanism used by ECS to provide data durability, resilience, and availability, see the ECS High Availability Design White Paper.