How to perform disaster recovery with replicators

Once you have set up replicators for active-passive replication, you can use them to fail over to the standby cluster if the leader cluster becomes unavailable, and to restore the original replication direction when the leader comes back online.

Failover process

If the leader cluster becomes unavailable, you can manually fail over to the standby cluster.

On the standby cluster, promote the replica project to become the leader:

lxc project promote-replica <project_name>

If the leader cluster is unreachable, promotion proceeds automatically without requiring validation. Use --force to skip validation when the leader cluster is still reachable but you want to promote anyway (for example, during a planned takeover before demoting the leader):

lxc project promote-replica <project_name> --force

After this command, the project on the standby cluster becomes writable. Start the instances to resume your workloads:

lxc start --all --project <project_name>

Recovering the original leader cluster

When the original leader cluster comes back online, it will be out of sync with the new leader (the former standby). Scheduled replicator runs on the original leader cluster will fail because both projects are in leader mode.

A replicator run requires the source project to be in leader mode and the target project to be in standby mode.

To restore the original leader cluster and resume the original replication direction:

1. Sync from the new leader back to the original leader

On the original leader cluster, stop all running instances in the project before running restore. The --restore action is rejected if any local instance is running, to prevent partial restores:

lxc stop <instance_name> [<instance_name>...] --force

Demote the project on the original leader cluster to standby mode:

lxc project demote-replica <project_name>

If the new leader cluster is unreachable, use --force to skip the validation:

lxc project demote-replica <project_name> --force

On the original leader cluster, run the replicator in restore mode to pull data from the new leader:

lxc replicator run <replicator_name> --restore

Restore mode uses the new leader’s instance list as the authoritative source. Any instances created on the new leader during the failover period are also created on the recovering cluster automatically.

The original leader cluster is now a standby replica of the new leader cluster.

2. Resume original replication direction

To return to the original setup where the original leader cluster replicates to the standby, stop any running instances in the project on the new leader cluster (former standby). Next, demote the project on the new leader cluster back to standby mode:

lxc project demote-replica <project_name>

Finally, promote the project on the original leader cluster back to leader mode:

lxc project promote-replica <project_name>

Your original active-passive disaster recovery setup is now restored. You can restart your instances on the leader cluster and resume your scheduled replicator runs.