MongoDB cross datacenter replication without elections or high data availaility

I want to replicate a MongoDB database to another node, located in another datacenter. This is to help guard against data loss in the case of a hardware failure.

We don’t want/need high availability or elections; just a ‘close to live’ read-only copy of the database, located in another DC.

Everything I read says that you need an odd number of nodes, due to elections, but this isn’t something we need/want and I can’t find anything related to just having one primary, and one secondary (I might be being blind).

Is this something we can achieve with MongoDB, and if so are there any ‘gotchas’ or serious downsides we should consider?

SQL Server Snapshot replication – Failing to start the Publisher snapshot agent

I am trying to set up a Snapshot replication using an Azure SQL Managed Instance. While trying to see the Snapshot Agent status, I see this error. Failed to connect to Azure Storage ” with OS error: 53.

While configuring the Distribution wizard, I had the option to set the Snapshot folder as well as the Storage account connection string.

I got the storage account connection string from the Azure Portal and pasted that in. I am in doubt about the Snapshot folder. What value should I set in there ?

Is it a folder inside Azure Storage account or in the Distributor SQL Server instance ? In the Distribution wizard, it said, that the details of the folder would also be in the Azure portal. Is there a place where I could get it ?

I am getting a feeling that if I have the correct setting for this, my snapshot replication would work just fine.

Can anybody guide me to find out the problem ?

Applied Pi calculus: Evaluation context that distinguishes replication with different restrictions

For an exercise, I need to find an evaluation context $ C[\_]$ s.t. the transition systems of $ C[X]$ and $ C[Y]$ are different (=they are not bisimulation equivalent), where $ X$ and $ Y$ are the following processes:

$ $ X = ( \nu z) (!\overline{c}\langle z \rangle.0)$ $ and $ $ Y= !((\nu z) \overline{c}\langle z \rangle.0)$ $

Intuitively, the difference seems to be that in process $ X$ , all replications of the process output the same $ z$ on channel $ c$ , while in process $ Y$ , all processes output a different $ z$ . Is this correct? And how could this be used to construct an evaluation context such that $ C[X] \neq C[Y]$ ?

Why is disk IO higher on Debian 10 (MariaDB 10.3) with MySQL replication?

I have a MySQL/MariaDB master-master replication setup that has been working well for several years, the db and tables are not very large (under 200MB for 18 tables). These were on 2 servers running Debian 9 and MariaDB 10.1.44. Now I’ve spun up 2 new servers running Debian 10 and I’m in the process of moving things over to them, but stopped half-way because I’m seeing much higher disk IO usage on the new servers (about 6x more).

So currently, one of the Debian 9 servers and one of the Debian 10 servers are in master-master relationship, with one Debian 9 still being a slave of the master Debian 9 server, and same on the Debian 10 side of things.

I didn’t notice the increased disk IO until after all read/write operations were moved to the Debian 10 master. I was trying to browse tables and saw how slow it was outputting the query results, and it felt like I was on a dial-up connection watching the rows scroll across. It turned out there was some disk contention with the virtual host that was partly responsible, and that problem is now mostly gone.

Now, as you can imagine, none of this is crashing the server with such a "small" set of tables, but as things continue to grow, I’m concerned that there is some underlying mis-configuration which will rear its ugly head at an inopportune time. On the Debian 9 servers, iotop shows steady write IO at around 300-600Kb/s, but on Debian 10 it spikes as high as 6MB/s, and averages around 3MB/s.

Here is the standard config on all 4 servers, everything else is default Debian settings (or MariaDB, as the case may be), full config for Debian 10 at https://pastebin.com/Lk2FR4e3:

max_connections = 1000 query_cache_limit       = 4M query_cache_size        = 0 query_cache_type        = 0 server-id               = 1 # different for each server log_bin                 = /var/log/mysql/mysql-bin.log binlog_do_db            = optimizer replicate-do-db         = optimizer report-host             = xyz.example.com #changed obviously log-slave-updates       = true innodb_log_file_size    = 32M innodb_buffer_pool_size = 256M 

Here are some other settings I’ve tried that don’t seem to make any difference (checked each one by one):

binlog_annotate_row_events = OFF binlog_checksum = NONE binlog_format = STATEMENT innodb_flush_method = O_DIRECT_NO_FSYNC innodb_log_checksums = OFF log_slow_slave_statements = OFF replicate_annotate_row_events = OFF 

I’ve gone through all the settings here that have changed from MariaDB 10.1 to 10.3, and can’t seem to find any that make a difference: https://mariadb.com/kb/en/replication-and-binary-log-system-variables/

I also did a full listing of the server variables and compared the configs on 10.1 to the 10.3 configuration and didn’t find anything obvious. But either I’m missing something, or the problem lies with Debian 10 itself.

Results of SHOW ENGINE INNODB STATUS are here: https://pastebin.com/mJdLQv8k

Now, how about that disk IO, what is it actually doing? I include 3 screenshots here to show what I mean by increased disk IO: Resource graphs on the Debian 10 master

That is from the Debian 10 master, and you can see where I moved operations back to the Debian 9 server (more on that in a second). Notice the disk IO does go down slightly at that point, but not to the levels that we’ll see on the Debian 9 master. Also note that the public bandwidth chart is pretty much only replication traffic, and that the disk IO far outstrips the replication traffic. The private traffic is all the reads/writes from our application servers.

Resource graphs on Debian 9 master

This is the Debian 9 master server, and you can see where I moved all operations back to this server, the private traffic shoots up, but the write IO hovers around 500kB/s. I didn’t have resource graphs being recorded on the old servers, thus the missing bits on the left.

Debian 10 slave server resource graphs

And lastly, for reference, here is the Debian 10 slave server (that will eventually be half of the master<–>master replication). There are no direct reads/writes on this server, all disk IO is from replication.

Just to see what would happen (as I alluded to above), I reverted all direct read/write operations to the Debian 9 master server. While disk IO did fall somewhat on the Debian 10 server, it did not grow on the Debian 9 server to any noticeable extent.

Also, on the Debian 10 slave server, I did STOP SLAVE once to see what happened, and the disk IO went to almost nothing. Doing the same on the Debian 10 master server barely did not have the same drastic effect, though it’s possible there WAS some change that wasn’t obvious; the disk IO numbers on iostat fluctuate much more wildly on the Debian 10 servers than they do on the Debian 9 servers.

So, what is going on here? How can I figure out why MariaDB is writing so much data to disk apparently and/or how can I stop it?

Thanks in advance!

MariaDB Replication – Replicate only specific tables and views

Note: A backend developer here with little to no experience in setting up database servers and replication.

What needs to be done

Setup DB replication of an existing database with the following constraints:

  1. Replicate only a specific list of tables/views with different names in the replicated database.
  2. Change the name of the tables/views in the replicated database (during the replication process)
  3. Setup a user on the replicated DB with further restrictions with which only a set of table/view can be viewed/updated/deleted

Progress so far

I have already read the document here, however, I did not find anything concrete to help me move forward with all the use-cases I wish to support!

Use Case

Show only essential data to the external vendor.

PS: If there are any other approaches other than replication, would be happy to consider and implement that as well.

PostgreSQL Streaming Replication with Switchover

I’m trying to provide streaming replication between a master (T1) and slave (T2) and swapping their roles when necessary (i.e letting T1 be a slave to T2). So far I am able to get this working if I’m able to shut down the T1 cleanly, as it undergoes the following process:

  1. Shut down T1
  2. Promote T2
  3. Configure T1 to work as a slave by configuring recovery.conf
  4. Startup T1.

I would also like to account for a scenario where T1 is unable to shut down cleanly (e.g a crash). When T1 is back up, I would like to use this as the master again. Since T1 and T2 may not have been in total sync before the crash (as there may have been some WAL records not sent by T1), I assume one way of getting T1 back up would be to

  1. Disable writing to T2.
  2. Create a base backup of T2 on T1.
  3. Shut down T2 and configure it to be a slave.
  4. Start T1
  5. Start T2

My questions about the above steps are as follows:

  • Would streaming replication work if I do not disable writing on T2?

  • Must two clusters be completely consistent for streaming replication to start? If I make some writes to T1 before starting T2, how would T2 know which WAL segments it needs to catch up to T1? What if I make writes to both T1 and T2 before configuring T2 to be a slave?

  • Assuming T1 and T2 were in sync before T1 crashed, and assuming that WAL Archiving was enabled, would I be able to place T1 in recovery mode and replay all the WAL segments generated by T2?

  • Is there a better way to approach this problem?

Thanks!

MySQL replication master-slave

I’m learning master-slave replication with MySQL and I could get it to work and also use the Percona backup tool to restore a slave. I learnt from this project https://github.com/vbabak/docker-mysql-master-slave

Now I wonder if the master-slave replication must be two separate mysql instances of if it is possible to configure master-slave replication between two databases only instead of needing two separate mysql instances. I think it’s not possible.

The reason I want to know is that I want to automate failover and restore scenario and in my environment a new mysql instance always runs on the default port because of infrastructure automation, and therefore it is not possible today to start two mysql instances on the same host machine server, therefore it looks like I need to create two VMs with one master and one slave just to perform the test which is quite overkill for a test scenario and would be slow.

Why is two-phase commit better than naive master-slave replication?

In the context of a distributed database, I’m trying to understand why 2PC (as described in e.g. https://www.cs.princeton.edu/courses/archive/fall16/cos418/docs/L6-2pc.pdf) is better than the following hypothetical protocol between a client, master, and slave:

  • client tells master to commit
  • master commits it
  • master tells client the commit succeeded
  • master tells slave to replicate the commit. If the slave fails, master keeps on trying until it succeeds and gets the slave caught up on all edits.

This seems to me to satisfy the same properties as 2PC:

  • Safety: If the master commits, the slave will also eventually commit. If the slave commits, the master must have committed first. I suppose an advantage of 2PC is that if a participant fails before starting the commit, the transaction will be failed instead of only committing on the TC. However, in the proposed protocol, the commit on the master still eventually gets to the slave.
  • Liveness: This protocol does not hang.
  • Both rely on the master / TC durably recording the decision to commit. Both assume failed slaves / participants eventually wake up and catch up with the master / TC.
  • Both fail if the master / TC goes down.
  • In both protocols, it’s possible to have temporary inconsistencies where the master / TC has finalized a decision, but the slaves / participants haven’t yet committed.

It seems to me that the key theoretical difference in 2PC is that the participant (slave) can vote “no” to the commit, as opposed to merely temporarily failing. That would break the conclusion above where the slave eventually catches up. However, I don’t see why the slave would need to vote “no” in the first place. Given the assumption that the slave / participant does not permanently fail, it seems it should either vote “yes” or fail to respond. (Unlike the bank account example, I expect the slave to blindly replicate the master.)

Distilling all this down, it seems to me that 2PC’s assumption that participants don’t permanently fail makes it unnecessary to give participants a chance to vote “no” in the “prepare” phase.

What am I missing here? Presumably there’s some advantage to 2PC over the above that I’m not understanding, since 2PC is actually used to build distributed databases.

  • Am I incorrect in concluding that a slave shouldn’t need to explicitly vote “no”, as opposed to temporarily failing? (I’m only talking about the data replication use case, rather than the bank account example.)
  • Given the same assumptions as 2PC, and assuming slaves only say “success” or “try again”, is there some guarantee 2PC offers that the naive replication above doesn’t?

For the purpose of the question, I’d like to ignore practicalities, unless they’re critical to the answer. In particular, I’d like to ignore things that could be interpreted as being disallowed by the no-permanent-failure assumption, such as disk full, slave mis-configured, slave corrupt, operator error, buggy software, etc.

postgresql logical replication — subscriber size discrepancy

I have a postgres 11 database in Amazon RDS with a little over 2 TB data in it whose tables have incrementing integer ids that are nearing exhaustion. I’ve been vetting using logical replication to have the existing db synch data to a new db with a schema modified to have bigint ids. Everything looks ok with a lot of due diligence and I am about ready to make the cutover.

One thing I have noticed is that the subscriber database being a smaller overall size, even though many of its columns and indexes have been switched to using bigint ids vs ints. Does anyone have any idea why this might be the case? This is the one question I have not been able to discover a confirmation for vs my guesses (fragmentation, etc…).

Using the queries described here, I have the following usage:

  • total: 2.171 TB in publisher, 2.020 TB in subscriber
  • indexes: 0.737 TB in publisher, 0.581 TB in subscriber
  • toast: 0.212 TB in publisher, 0.215 TB in subscriber
  • tables: 1.223 TB in publisher,1.223 TB in subscriber