PSO behavior

If a disaster occurs, an entire site can become unrecoverable; this is referred to in ECS as a permanent site outage (PSO). ECS treats the unrecoverable site as a temporary site failure, but only if the entire site is down or completely unreachable over the WAN. If the failure is permanent, the System Administrator must permanently fail over the site from the federation to initiate failover processing; this initiates resynchronization and re-protection of the objects stored on the failed site. The recovery tasks run as a background process. For more information on how to perform the failover procedure in the ECS Portal, see #GUID-042D2322-0A4B-474F-826F-E58BC3D6FBA8.

  • Before you trigger PSO (planned or unplanned), make sure that the site is off (all nodes are shut down). Make sure that you remove the site from the replication group and also the federation.
  • If you want to reuse the same racks from the PSOed site, the racks must be disconnected physically and the nodes should be re-imaged before you bring them online.

Before you initiate a PSO in the ECS Portal, it is advised to contact your customer support representative, so that the representative can validate the cluster health. Data is not accessible until the failover processing is completed. You can monitor the progress of the failover processing on the Monitor > Geo Replication > Failover Processing tab in the ECS Portal. While the recovery background tasks are running, but after failover processing has completed, some data from the removed site might not be read back until the recovery tasks fully complete.