Aurora PostgreSQL database using a slower query plan than a normal PostgreSQL for an identical query?

Following the migration of an application and its database from a classical PostgreSQL database to an Amazon Aurora RDS PostgreSQL database (both using 9.6 version), we have found that a specific query is running much slower — around 10 times slower — on Aurora than on PostgreSQL.

Both databases have the same configuration, be it for the hardware or the pg_conf.

The query itself is fairly simple. It is generated from our backend written in Java and using jOOQ for writing the queries:

with "all_acp_ids"("acp_id") as (     select acp_id from temp_table_de3398bacb6c4e8ca8b37be227eac089 )  select distinct "public"."f1_folio_milestones"."acp_id",      coalesce("public"."sa_milestone_overrides"."team",      "public"."f1_folio_milestones"."team_responsible")  from "public"."f1_folio_milestones"  left outer join      "public"."sa_milestone_overrides" on (         "public"."f1_folio_milestones"."milestone" = "public"."sa_milestone_overrides"."milestone"          and "public"."f1_folio_milestones"."view" = "public"."sa_milestone_overrides"."view"          and "public"."f1_folio_milestones"."acp_id" = "public"."sa_milestone_overrides"."acp_id" ) where "public"."f1_folio_milestones"."acp_id" in (     select "all_acp_ids"."acp_id" from "all_acp_ids" ) 

With temp_table_de3398bacb6c4e8ca8b37be227eac089 being a single-column table, f1_folio_milestones (17 million entries) and sa_milestone_overrides (Around 1 million entries) being similarly designed tables having indexes on all the columns used for the LEFT OUTER JOIN.

When we run it on the normal PostgreSQL database, it generates the following query plan:

Unique  (cost=4802622.20..4868822.51 rows=8826708 width=43) (actual time=483.928..483.930 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.005 rows=1 loops=1)   ->  Sort  (cost=4802598.60..4824665.37 rows=8826708 width=43) (actual time=483.927..483.927 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Hash Left Join  (cost=46051.06..3590338.34 rows=8826708 width=43) (actual time=483.905..483.917 rows=4 loops=1)               Hash Cond: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))               ->  Nested Loop  (cost=31.16..2572.60 rows=8826708 width=37) (actual time=0.029..0.038 rows=4 loops=1)                     ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.009..0.010 rows=1 loops=1)                           Group Key: all_acp_ids.acp_id                           ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.006..0.007 rows=1 loops=1)                     ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..12.65 rows=5 width=37) (actual time=0.018..0.025 rows=4 loops=1)                           Index Cond: (acp_id = all_acp_ids.acp_id)               ->  Hash  (cost=28726.78..28726.78 rows=988178 width=34) (actual time=480.423..480.423 rows=987355 loops=1)                     Buckets: 1048576  Batches: 1  Memory Usage: 72580kB                     ->  Seq Scan on sa_milestone_overrides  (cost=0.00..28726.78 rows=988178 width=34) (actual time=0.004..189.641 rows=987355 loops=1) Planning time: 3.561 ms Execution time: 489.223 ms 

And it goes pretty smoothly as one can see — less than a second for the query. But on the Aurora instance, this happens:

Unique  (cost=2632927.29..2699194.83 rows=8835672 width=43) (actual time=4577.348..4577.350 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.001..0.001 rows=1 loops=1)   ->  Sort  (cost=2632903.69..2654992.87 rows=8835672 width=43) (actual time=4577.348..4577.348 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Merge Left Join  (cost=1321097.58..1419347.08 rows=8835672 width=43) (actual time=4488.369..4577.330 rows=4 loops=1)               Merge Cond: ((f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))               ->  Sort  (cost=1194151.06..1216240.24 rows=8835672 width=37) (actual time=0.039..0.040 rows=4 loops=1)                     Sort Key: f1_folio_milestones.view, f1_folio_milestones.milestone, f1_folio_milestones.acp_id                     Sort Method: quicksort  Memory: 25kB                     ->  Nested Loop  (cost=31.16..2166.95 rows=8835672 width=37) (actual time=0.022..0.028 rows=4 loops=1)                           ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.006..0.006 rows=1 loops=1)                                 Group Key: all_acp_ids.acp_id                                 ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.003..0.004 rows=1 loops=1)                           ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..10.63 rows=4 width=37) (actual time=0.011..0.015 rows=4 loops=1)                                 Index Cond: (acp_id = all_acp_ids.acp_id)               ->  Sort  (cost=126946.52..129413.75 rows=986892 width=34) (actual time=4462.727..4526.822 rows=448136 loops=1)                     Sort Key: sa_milestone_overrides.view, sa_milestone_overrides.milestone, sa_milestone_overrides.acp_id                     Sort Method: quicksort  Memory: 106092kB                     ->  Seq Scan on sa_milestone_overrides  (cost=0.00..28688.92 rows=986892 width=34) (actual time=0.003..164.348 rows=986867 loops=1) Planning time: 1.394 ms Execution time: 4583.295 ms 

It effectively has a lower global cost, but takes almost 10 times as much time than before!

Disabling merge joins makes Aurora revert to a hash join, which gives the expected execution time — but permanently disabling it is not an option. Curiously though, disabling nested loops gives an even better result while still using a merge join…

Unique  (cost=3610230.74..3676431.05 rows=8826708 width=43) (actual time=2.465..2.466 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.004 rows=1 loops=1)   ->  Sort  (cost=3610207.14..3632273.91 rows=8826708 width=43) (actual time=2.464..2.464 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Merge Left Join  (cost=59.48..2397946.87 rows=8826708 width=43) (actual time=2.450..2.455 rows=4 loops=1)               Merge Cond: (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text)               Join Filter: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text))               ->  Merge Join  (cost=40.81..2267461.88 rows=8826708 width=37) (actual time=2.312..2.317 rows=4 loops=1)                     Merge Cond: (f1_folio_milestones.acp_id = all_acp_ids.acp_id)                     ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..2223273.29 rows=17653416 width=37) (actual time=0.020..2.020 rows=1952 loops=1)                     ->  Sort  (cost=40.24..40.74 rows=200 width=32) (actual time=0.011..0.012 rows=1 loops=1)                           Sort Key: all_acp_ids.acp_id                           Sort Method: quicksort  Memory: 25kB                           ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.008..0.008 rows=1 loops=1)                                 Group Key: all_acp_ids.acp_id                                 ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.005..0.005 rows=1 loops=1)               ->  Materialize  (cost=0.42..62167.38 rows=987968 width=34) (actual time=0.021..0.101 rows=199 loops=1)                     ->  Index Scan using sa_milestone_overrides_acp_id_index on sa_milestone_overrides  (cost=0.42..59697.46 rows=987968 width=34) (actual time=0.019..0.078 rows=199 loops=1) Planning time: 5.500 ms Execution time: 2.516 ms 

We have asked the AWS support team, they are still looking at the issue, but we are wondering what could cause that issue to happen. What could explain such a behaviour difference?

While looking at some of the documentation for the database, I read that Aurora favors cost over time — and hence it uses the query plan that has the lowest cost.

But as we can see, it’s far from being optimal given its response time… Is there a threshold or a setting that could make the database use a more expensive — but faster — query plan?

Master – Master setup on Postgresql 10 + Ubuntu 18.04 + Pgpool II

Does anybody have experience in configuring Pgpool II with Postgresql 10 on Ubuntu 18.04?

I am trying to setup Master – Master setup on Postgresql 10 + Ubuntu. I am trying to use Pgpool II

I will have two or more mater DB servers running on different IPs and my objective is synced with each other.

I am looking for an open-source solution/s Your thoughts, suggestion and experiences are kindly welcome. Cheers

install postgresql without creating instance (for use with repmgr)

I’m trying to get REPMGR setup, and I’m following the steps at https://repmgr.org/docs/current/quickstart-standby-preparation.html to get the standby setup.

I noticed it warns On the standby, do not create a PostgreSQL instance. However I believe this happens automatically with how I installed Postgres

wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" |sudo tee  /etc/apt/sources.list.d/pgdg.list sudo apt update sudo apt -y install postgresql-12 postgresql-client-12 

because when I try to clone the primary onto the standby as mentioned in https://repmgr.org/docs/current/quickstart-standby-clone.html
$ repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run

I get

postgres@empty2:~$   repmgr -h 192.168.1.102 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run NOTICE: destination directory "/var/lib/postgresql/12/main" provided ERROR: specified data directory "/var/lib/postgresql/12/main" appears to contain a running PostgreSQL instance HINT: ensure the target data directory does not contain a running PostgreSQL instance 

Now I’m just ASSUMING that this is because the database instance was created when I installed postgres on the standby. and I’m also ASSUMING that I can just delete everything in the data directory on the standby, and everything will work ok…

But (assuming my assumptions are correct….) what is the correct way to install postgres12 without creating an instance and the corresponding data files?

Create postgresql function to be triggered before insert row

I’ve created a function to be executed before inserting rows in a table, in order to avoid the error “malformed array literal”. It still throws the error and I deduct that the process starts earlier than my trigger. Is there other trigger which I can use to intercept the event earlier?

CREATE TABLE "user" ( permissions text[] );  CREATE FUNCTION format_array() RETURNS trigger AS $  func$       BEGIN         NEW.permissions := translate(NEW.permissions::text, '[]', '{}')::text[];         RETURN NEW;     END; $  func$   LANGUAGE plpgsql;  CREATE TRIGGER format_array BEFORE INSERT ON "user"     FOR EACH ROW EXECUTE PROCEDURE format_array();  INSERT INTO "user" (permissions)     VALUES ('["test", "test2"]'); 

https://dbfiddle.uk/?rdbms=postgres_10&fiddle=ae1561db1f61f9458fabf02007f388ed

What is the correct way of grabbing a RANDOM record from a PostgreSQL table, which isn’t painfully slow or not-random?

I always used to do:

SELECT column FROM table ORDER BY random() LIMIT 1; 

For large tables, this was unbearably, impossibly slow, to the point of being useless in practice. That’s why I started hunting for more efficient methods. People recommended:

SELECT column FROM table TABLESAMPLE BERNOULLI(1) LIMIT 1; 

While fast, it also provides worthless randomness. It appears to always pick the same damn records, so this is also worthless.

I’ve also tried:

SELECT column FROM table TABLESAMPLE BERNOULLI(100) LIMIT 1; 

It gives even worse randomness. It picks the same few records every time. This is completely worthless. I need actual randomness.

Why is it apparently so difficult to just pick a random record? Why does it have to grab EVERY record and then sort them (in the first case)? And why do the “TABLESAMPLE” versions just grab the same stupid records all the time? Why aren’t they random whatsoever? Who would ever want to use this “BERNOULLI” stuff when it just picks the same few records over and over? I can’t believe I’m still, after all these years, asking about grabbing a random record… it’s one of the most basic possible queries.

What is the actual command to use for grabbing a random record from a table in PG which isn’t so slow that it takes several full seconds for a decent-sized table?

Auto table based multi-tenancy where clause in Postgresql

Premise: I am a novice and I am trying to understand.

I am from Rails and am using Postgresql.

I am rewriting the whole application in Golang (without the magic of Rails ActiveRecord, I am usinggo-pg (https://github.com/go-pg/pg) as ORM).

So I can’t use packages like https://github.com/ErwinM/acts_as_tenant or https://github.com/influitive/apartment.

Since go-pg doesn’t natively offer what I’m looking for (https://github.com/go-pg/pg/issues/1179) I would like to understand if I can do it directly in the DB.

I would like to understand if I can automatically insert a where clause in some SELECT using the value of the tenant_id column of the tenants table based on the user I am using for this query.

Code example:

What I need is to avoid from this:

var players []*models.Player db.Model(&players).Order("id desc").Select() return players, nil 

to this:

var players []*models.Player user := ctx.Value(auth.CTXKeyUser).(*models.User) db.Model(&players).Order("id desc").Where("tenant_id = ?", user.TenantID).Select() return players, nil 

postgresql logical replication — subscriber size discrepancy

I have a postgres 11 database in Amazon RDS with a little over 2 TB data in it whose tables have incrementing integer ids that are nearing exhaustion. I’ve been vetting using logical replication to have the existing db synch data to a new db with a schema modified to have bigint ids. Everything looks ok with a lot of due diligence and I am about ready to make the cutover.

One thing I have noticed is that the subscriber database being a smaller overall size, even though many of its columns and indexes have been switched to using bigint ids vs ints. Does anyone have any idea why this might be the case? This is the one question I have not been able to discover a confirmation for vs my guesses (fragmentation, etc…).

Using the queries described here, I have the following usage:

  • total: 2.171 TB in publisher, 2.020 TB in subscriber
  • indexes: 0.737 TB in publisher, 0.581 TB in subscriber
  • toast: 0.212 TB in publisher, 0.215 TB in subscriber
  • tables: 1.223 TB in publisher,1.223 TB in subscriber

Why does WAL on postgresql takes so long to replay?

Context:

I’m running postgresql 11 using the official postgresql docker image. The configuration is the default one without any tweak. Sometimes, my docker server crashes and kills all the container with it. As a result the postgresql gets stopped in a dirty way and postgresql has to replay the WALs. It takes more than 1h to replay the logs thought the server is mostly idling. I mean when the server crashed it hardly had any writes in the logs. And when I check in pg_wal I only see 65mb of data in the directory so it shouldn’t take so much time to replay.

Here’s some configurations:

max_wal_size: 1gb min_wal_size: 80gb checkpoint_timeout: 3min 

So how come it take so long to restart? I have a different server with much less databases but similar configurations. Both are mostly idle all the time but the other one quickly restart while they have the same configuration.

Most efficient way to query a date range in Postgresql

I have a table with a timestamp with tz column. I want to perform a count / group by query for rows within a certain date range:

select count(1), url  from mytable  where viewed_at between '2019-01-01' and '2020-01-01'; 

viewed_at has a btree index applied, but when I view explain... it doesn’t appear to be using the index:

postgres=# explain select count(1), url from app_pageview where viewed_at < '2019-01-01' group by 2 order by 1 desc limit 10;                                                       QUERY PLAN                                                        -----------------------------------------------------------------------------------------------------------------------  Limit  (cost=2165636.99..2165637.02 rows=10 width=32)    ->  Sort  (cost=2165636.99..2165637.24 rows=101 width=32)          Sort Key: (count(1)) DESC          ->  Finalize GroupAggregate  (cost=2165609.22..2165634.81 rows=101 width=32)                Group Key: url                ->  Gather Merge  (cost=2165609.22..2165632.79 rows=202 width=32)                      Workers Planned: 2                      ->  Sort  (cost=2164609.20..2164609.45 rows=101 width=32)                            Sort Key: url                            ->  Partial HashAggregate  (cost=2164604.82..2164605.83 rows=101 width=32)                                  Group Key: url                                  ->  Parallel Seq Scan on app_pageview  (cost=0.00..2059295.33 rows=21061898 width=24)                                        Filter: (viewed_at < '2019-01-01 00:00:00+00'::timestamp with time zone)  JIT:    Functions: 13    Options: Inlining true, Optimization true, Expressions true, Deforming true (16 rows) 

I have generated ~100M rows of dummy data to test this out.

How can I make it more efficient?

Would storing the viewed_at field as two different fields be any use (date and time)