I’m trying to provide streaming replication between a master (T1) and slave (T2) and swapping their roles when necessary (i.e letting T1 be a slave to T2). So far I am able to get this working if I’m able to shut down the T1 cleanly, as it undergoes the following process:
- Shut down T1
- Promote T2
- Configure T1 to work as a slave by configuring recovery.conf
- Startup T1.
I would also like to account for a scenario where T1 is unable to shut down cleanly (e.g a crash). When T1 is back up, I would like to use this as the master again. Since T1 and T2 may not have been in total sync before the crash (as there may have been some WAL records not sent by T1), I assume one way of getting T1 back up would be to
- Disable writing to T2.
- Create a base backup of T2 on T1.
- Shut down T2 and configure it to be a slave.
- Start T1
- Start T2
My questions about the above steps are as follows:
Would streaming replication work if I do not disable writing on T2?
Must two clusters be completely consistent for streaming replication to start? If I make some writes to T1 before starting T2, how would T2 know which WAL segments it needs to catch up to T1? What if I make writes to both T1 and T2 before configuring T2 to be a slave?
Assuming T1 and T2 were in sync before T1 crashed, and assuming that WAL Archiving was enabled, would I be able to place T1 in recovery mode and replay all the WAL segments generated by T2?
Is there a better way to approach this problem?