pgpool-II and Postgres docker image : automated failover and online recovery via rsa key


I’ve been following this documentation for pgpool-ii https://www.pgpool.net/docs/latest/en/html/example-cluster.html

I’m having a hard time setting up rsa on my postgres streaming cluster built in official docker image https://hub.docker.com/_/postgres.

I was able to do the streaming now i’m on the part of setting up failover.

part of the documentation says.

To use the automated failover and online recovery of Pgpool-II, the settings that allow passwordless SSH to all backend servers between Pgpool-II execution user (default root user) and postgres user and between postgres user and postgres user are necessary. Execute the following command on all servers to set up passwordless SSH. The generated key file name is id_rsa_pgpool.  
     [all servers]# cd ~/.ssh      [all servers]# ssh-keygen -t rsa -f id_rsa_pgpool      [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server1      [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server2      [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server3       [all servers]# su - postgres      [all servers]$   cd ~/.ssh      [all servers]$   ssh-keygen -t rsa -f id_rsa_pgpool      [all servers]$   ssh-copy-id -i id_rsa_pgpool.pub postgres@server1      [all servers]$   ssh-copy-id -i id_rsa_pgpool.pub postgres@server2      [all servers]$   ssh-copy-id -i id_rsa_pgpool.pub postgres@server3 

Is it possible to set it up inside a container from postgre’s official image? I would like to get an idea on how to do it from some samples or existing solution.

Moreover, Since I can’t do the rsa thing as of the moment.

I decided to create a script that is using a psql command on my pgpool server to the new master

#!/bin/bash # This script is run by failover_command.  set -e  # Special values: #   %d = failed node id #   %h = failed node hostname #   %p = failed node port number #   %D = failed node database cluster path #   %m = new master node id #   %H = new master node hostname #   %M = old master node id #   %P = old primary node id #   %r = new master port number #   %R = new master database cluster path #   %N = old primary node hostname #   %S = old primary node port number #   %% = '%' character  FAILED_NODE_ID="$  1" FAILED_NODE_HOST="$  2" FAILED_NODE_PORT="$  3" FAILED_NODE_PGDATA="$  4" NEW_MASTER_NODE_ID="$  5" NEW_MASTER_NODE_HOST="$  6" OLD_MASTER_NODE_ID="$  7" OLD_PRIMARY_NODE_ID="$  8" NEW_MASTER_NODE_PORT="$  9" NEW_MASTER_NODE_PGDATA="$  {10}" OLD_PRIMARY_NODE_HOST="$  {11}" OLD_PRIMARY_NODE_PORT="$  {12}"  #set -o xtrace #exec > >(logger -i -p local1.info) 2>&1  new_master_host=$  NEW_MASTER_NODE_HOST ## If there's no master node anymore, skip failover. if [ $  NEW_MASTER_NODE_ID -lt 0 ]; then     echo "All nodes are down. Skipping failover."     exit 0 fi  ## Promote Standby node. echo "Primary node is down, promote standby node" $  {NEW_MASTER_NODE_HOST}.  PGPASSWORD=postgres psql -h $  {NEW_MASTER_NODE_HOST} -p 5432 -U postgres <<-EOSQL  select pg_promote(); EOSQL  #logger -i -p local1.info failover.sh: end: new_master_node_id=$  NEW_MASTER_NODE_ID started as the primary node #exit 0 

The above script is working if i simulate that my primary is down.

However, in my new primary this is the log

2020-10-07 20:25:31.924 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:25:31.924 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:25:32.939 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:25:32.939 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history 2020-10-07 20:25:32.939 UTC [1165] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:26:33.003 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:26:33.003 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:26:34.012 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:26:34.012 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:26:35.026 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:26:35.026 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history 2020-10-07 20:26:35.026 UTC [1165] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:27:35.096 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:27:35.096 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:27:36.110 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:27:36.110 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:27:37.123 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:27:37.123 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history 2020-10-07 20:27:37.123 UTC [1165] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:28:37.177 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:28:37.177 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:28:38.221 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:28:38.221 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history cp: cannot create regular file '/archives/00000002.history': No such file or directory 2020-10-07 20:28:39.230 UTC [1165] LOG:  archive command failed with exit code 1 2020-10-07 20:28:39.230 UTC [1165] DETAIL:  The failed archive command was: cp pg_wal/00000002.history /archives/00000002.history 2020-10-07 20:28:39.230 UTC [1165] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later 

still trying to execute the WAL part.

moreover, on my other standby it is still looking for the old master.

2020-10-07 20:29:07.818 UTC [1365] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:12.827 UTC [1367] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:17.832 UTC [1369] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:22.835 UTC [1371] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:27.826 UTC [1373] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:32.836 UTC [1375] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:37.836 UTC [1377] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:42.850 UTC [1379] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:47.857 UTC [1381] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 2020-10-07 20:29:52.855 UTC [1383] FATAL:  could not connect to the primary server: could not translate host name "pg-1" to address: Name or service not known 

and dealing with this I think is more complicated than setting up the rsa part so that i could utilize the existing fail_command script that pgpool has.

Thanks for the response.