I have performance issues with one particulary large table, 500+ Million rows, 300Gb data, Postgres 10.5. It is already partitioned. I am working on optimising it here and there, but that is not trivial and only provides small improvements. Table is constanlty growing and we expect our userbase increace significantly so I need a way to scale up.
I want to use multi-tenant sharding approach. X tenants per shard. Shard resolving on app layer. Most of tenants have relateively small datasets, but few are huge and I want to be able to place them to separate shards . To do that I need lookup tables. Cross-shard queries are not concern at all, naturaly we have almost all of our queris per tenant, so all the data for the tenant will sit in same shard.
I will be using logical sharding, 4 phisical shards x32 logical (that is twice more shards than partitions currenly). Each logical shard is separate database. In most tutorials/talks people seem to use schemas instead of databases. Why? Databases are more isolated, and when moving single tenant or virtual shard to other location it does not seem to have any difference. So db looks like a better candidate to me.
Drawbacks look acceptable: update existing code (significantly), app should be shard aware
The question is: How do I handle migrations(schema changes)?
As first step I will have to create 128 databases, ensuring all of them have all tables, indexes, etc. I also want each of dbs have its own sequences to have ids unique accross all shards. Not trivial to me.
But further changes are problem aswell. Do I just iterate all connections and aplly changes? Is there a better (maybe async) way? What do I do if at some point shema in one shard is different from another.