Aurora PostgreSQL database using a slower query plan than a normal PostgreSQL for an identical query?

Following the migration of an application and its database from a classical PostgreSQL database to an Amazon Aurora RDS PostgreSQL database (both using 9.6 version), we have found that a specific query is running much slower — around 10 times slower — on Aurora than on PostgreSQL.

Both databases have the same configuration, be it for the hardware or the pg_conf.

The query itself is fairly simple. It is generated from our backend written in Java and using jOOQ for writing the queries:

with "all_acp_ids"("acp_id") as (     select acp_id from temp_table_de3398bacb6c4e8ca8b37be227eac089 )  select distinct "public"."f1_folio_milestones"."acp_id",      coalesce("public"."sa_milestone_overrides"."team",      "public"."f1_folio_milestones"."team_responsible")  from "public"."f1_folio_milestones"  left outer join      "public"."sa_milestone_overrides" on (         "public"."f1_folio_milestones"."milestone" = "public"."sa_milestone_overrides"."milestone"          and "public"."f1_folio_milestones"."view" = "public"."sa_milestone_overrides"."view"          and "public"."f1_folio_milestones"."acp_id" = "public"."sa_milestone_overrides"."acp_id" ) where "public"."f1_folio_milestones"."acp_id" in (     select "all_acp_ids"."acp_id" from "all_acp_ids" ) 

With temp_table_de3398bacb6c4e8ca8b37be227eac089 being a single-column table, f1_folio_milestones (17 million entries) and sa_milestone_overrides (Around 1 million entries) being similarly designed tables having indexes on all the columns used for the LEFT OUTER JOIN.

When we run it on the normal PostgreSQL database, it generates the following query plan:

Unique  (cost=4802622.20..4868822.51 rows=8826708 width=43) (actual time=483.928..483.930 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.005 rows=1 loops=1)   ->  Sort  (cost=4802598.60..4824665.37 rows=8826708 width=43) (actual time=483.927..483.927 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Hash Left Join  (cost=46051.06..3590338.34 rows=8826708 width=43) (actual time=483.905..483.917 rows=4 loops=1)               Hash Cond: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))               ->  Nested Loop  (cost=31.16..2572.60 rows=8826708 width=37) (actual time=0.029..0.038 rows=4 loops=1)                     ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.009..0.010 rows=1 loops=1)                           Group Key: all_acp_ids.acp_id                           ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.006..0.007 rows=1 loops=1)                     ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..12.65 rows=5 width=37) (actual time=0.018..0.025 rows=4 loops=1)                           Index Cond: (acp_id = all_acp_ids.acp_id)               ->  Hash  (cost=28726.78..28726.78 rows=988178 width=34) (actual time=480.423..480.423 rows=987355 loops=1)                     Buckets: 1048576  Batches: 1  Memory Usage: 72580kB                     ->  Seq Scan on sa_milestone_overrides  (cost=0.00..28726.78 rows=988178 width=34) (actual time=0.004..189.641 rows=987355 loops=1) Planning time: 3.561 ms Execution time: 489.223 ms 

And it goes pretty smoothly as one can see — less than a second for the query. But on the Aurora instance, this happens:

Unique  (cost=2632927.29..2699194.83 rows=8835672 width=43) (actual time=4577.348..4577.350 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.001..0.001 rows=1 loops=1)   ->  Sort  (cost=2632903.69..2654992.87 rows=8835672 width=43) (actual time=4577.348..4577.348 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Merge Left Join  (cost=1321097.58..1419347.08 rows=8835672 width=43) (actual time=4488.369..4577.330 rows=4 loops=1)               Merge Cond: ((f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))               ->  Sort  (cost=1194151.06..1216240.24 rows=8835672 width=37) (actual time=0.039..0.040 rows=4 loops=1)                     Sort Key: f1_folio_milestones.view, f1_folio_milestones.milestone, f1_folio_milestones.acp_id                     Sort Method: quicksort  Memory: 25kB                     ->  Nested Loop  (cost=31.16..2166.95 rows=8835672 width=37) (actual time=0.022..0.028 rows=4 loops=1)                           ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.006..0.006 rows=1 loops=1)                                 Group Key: all_acp_ids.acp_id                                 ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.003..0.004 rows=1 loops=1)                           ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..10.63 rows=4 width=37) (actual time=0.011..0.015 rows=4 loops=1)                                 Index Cond: (acp_id = all_acp_ids.acp_id)               ->  Sort  (cost=126946.52..129413.75 rows=986892 width=34) (actual time=4462.727..4526.822 rows=448136 loops=1)                     Sort Key: sa_milestone_overrides.view, sa_milestone_overrides.milestone, sa_milestone_overrides.acp_id                     Sort Method: quicksort  Memory: 106092kB                     ->  Seq Scan on sa_milestone_overrides  (cost=0.00..28688.92 rows=986892 width=34) (actual time=0.003..164.348 rows=986867 loops=1) Planning time: 1.394 ms Execution time: 4583.295 ms 

It effectively has a lower global cost, but takes almost 10 times as much time than before!

Disabling merge joins makes Aurora revert to a hash join, which gives the expected execution time — but permanently disabling it is not an option. Curiously though, disabling nested loops gives an even better result while still using a merge join…

Unique  (cost=3610230.74..3676431.05 rows=8826708 width=43) (actual time=2.465..2.466 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.004 rows=1 loops=1)   ->  Sort  (cost=3610207.14..3632273.91 rows=8826708 width=43) (actual time=2.464..2.464 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Merge Left Join  (cost=59.48..2397946.87 rows=8826708 width=43) (actual time=2.450..2.455 rows=4 loops=1)               Merge Cond: (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text)               Join Filter: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text))               ->  Merge Join  (cost=40.81..2267461.88 rows=8826708 width=37) (actual time=2.312..2.317 rows=4 loops=1)                     Merge Cond: (f1_folio_milestones.acp_id = all_acp_ids.acp_id)                     ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..2223273.29 rows=17653416 width=37) (actual time=0.020..2.020 rows=1952 loops=1)                     ->  Sort  (cost=40.24..40.74 rows=200 width=32) (actual time=0.011..0.012 rows=1 loops=1)                           Sort Key: all_acp_ids.acp_id                           Sort Method: quicksort  Memory: 25kB                           ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.008..0.008 rows=1 loops=1)                                 Group Key: all_acp_ids.acp_id                                 ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.005..0.005 rows=1 loops=1)               ->  Materialize  (cost=0.42..62167.38 rows=987968 width=34) (actual time=0.021..0.101 rows=199 loops=1)                     ->  Index Scan using sa_milestone_overrides_acp_id_index on sa_milestone_overrides  (cost=0.42..59697.46 rows=987968 width=34) (actual time=0.019..0.078 rows=199 loops=1) Planning time: 5.500 ms Execution time: 2.516 ms 

We have asked the AWS support team, they are still looking at the issue, but we are wondering what could cause that issue to happen. What could explain such a behaviour difference?

While looking at some of the documentation for the database, I read that Aurora favors cost over time — and hence it uses the query plan that has the lowest cost.

But as we can see, it’s far from being optimal given its response time… Is there a threshold or a setting that could make the database use a more expensive — but faster — query plan?