PostgreSQL: query slow, planner estimates 0.01-0.1 of actual results


I will preface this by saying I’m not well-versed in SQL at all. I mainly work with ORMs, and this recent headache has brought me to dive into the world of queries, planners, etc..

A very common query is behaving weirdly on my website. I have tried various techniques to solve it but nothing is really helping, except narrowing down the released_date field from 30 days to 7 days. However, from my understanding the tables we’re talking about aren’t very big and PostgreSQL should satisfy my query in acceptable times.

Some statistics:

core_releasegroup row count: 3,240,568

core_artist row count: 287,699

core_subscription row count: 1,803,960

Relationships:

Each ReleaseGroup has M2M with Artist, each Artist has M2M with UserProfile through Subscription. I’m using Django which auto-creates indices on foreign-keys, etc..

Here’s the query:

SELECT "core_releasegroup"."id", "core_releasegroup"."title", "core_releasegroup"."type", "core_releasegroup"."release_date", "core_releasegroup"."applemusic_id", "core_releasegroup"."applemusic_image", "core_releasegroup"."geo_apple_music_link", "core_releasegroup"."amazon_aff_link", "core_releasegroup"."is_explicit", "core_releasegroup"."spotify_id", "core_releasegroup"."spotify_link"  FROM "core_releasegroup"  INNER JOIN "core_artist_release_groups"  ON ("core_releasegroup"."id" = "core_artist_release_groups"."releasegroup_id")  WHERE ("core_artist_release_groups"."artist_id"  IN  (SELECT U0."artist_id" FROM "core_subscription" U0 WHERE U0."profile_id" = 1)  AND "core_releasegroup"."type" IN ('Album', 'Single', 'EP', 'Live', 'Compilation', 'Remix', 'Other')  AND "core_releasegroup"."release_date" BETWEEN '2020-08-20'::date AND '2020-10-20'::date); 

Here’s the table schema:

CREATE TABLE public.core_releasegroup (     id integer NOT NULL,     created_date timestamp with time zone NOT NULL,     modified_date timestamp with time zone NOT NULL,     title character varying(560) NOT NULL,     type character varying(30) NOT NULL,     release_date date,     applemusic_id character varying(512),     applemusic_image character varying(512),     applemusic_link character varying(512),     spotify_id character varying(512),     spotify_image character varying(512),     spotify_link character varying(512),     is_explicit boolean NOT NULL,     spotify_last_refresh timestamp with time zone,     spotify_next_refresh timestamp with time zone,     geo_apple_music_link character varying(512),     amazon_aff_link character varying(620) ); 

I have an index both on type, release_date, and applemusic_id.

Here’s the PostgreSQL execution plan: (notice the estimates)

 Nested Loop  (cost=2437.52..10850.51 rows=4 width=495) (actual time=411.911..8619.311 rows=362 loops=1)    Buffers: shared hit=252537 read=29104    ->  Nested Loop  (cost=2437.09..10578.84 rows=569 width=499) (actual time=372.265..8446.324 rows=36314 loops=1)          Buffers: shared hit=143252 read=29085          ->  Bitmap Heap Scan on core_releasegroup  (cost=2436.66..4636.70 rows=567 width=495) (actual time=372.241..7707.466 rows=32679 loops=1)                Recheck Cond: ((release_date >= '2020-08-20'::date) AND (release_date <= '2020-10-20'::date) AND ((type)::text = ANY ('{Album,Single,EP,Live,Compilation,Remix,Other}'::text[])))                Heap Blocks: exact=29127                Buffers: shared hit=10222 read=27872                ->  BitmapAnd  (cost=2436.66..2436.66 rows=567 width=0) (actual time=366.750..366.750 rows=0 loops=1)                      Buffers: shared hit=15 read=8952                      ->  Bitmap Index Scan on core_releasegroup_release_date_03a267f7  (cost=0.00..342.46 rows=16203 width=0) (actual time=8.834..8.834 rows=32679 loops=1)                            Index Cond: ((release_date >= '2020-08-20'::date) AND (release_date <= '2020-10-20'::date))                            Buffers: shared read=92                      ->  Bitmap Index Scan on core_releasegroup_type_58b6243d_like  (cost=0.00..2093.67 rows=113420 width=0) (actual time=355.071..355.071 rows=3240568 loops=1)                            Index Cond: ((type)::text = ANY ('{Album,Single,EP,Live,Compilation,Remix,Other}'::text[]))                            Buffers: shared hit=15 read=8860          ->  Index Scan using core_artist_release_groups_releasegroup_id_cea5da71 on core_artist_release_groups  (cost=0.43..10.46 rows=2 width=8) (actual time=0.018..0.020 rows=1 loops=32679)                Index Cond: (releasegroup_id = core_releasegroup.id)                Buffers: shared hit=133030 read=1213    ->  Index Only Scan using core_subscription_profile_id_artist_id_a4d3d29b_uniq on core_subscription u0  (cost=0.43..0.48 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=36314)          Index Cond: ((profile_id = 1) AND (artist_id = core_artist_release_groups.artist_id))          Heap Fetches: 362          Buffers: shared hit=109285 read=19  Planning Time: 10.951 ms  Execution Time: 8619.564 ms 

Please note that the above is a stripped down version of the actual query I need. Because of its unbearable slowness, I’ve stripped down this query to a bare-minimum and reverted to doing some filtering and ordering of the returned objects in Python (which I know is usually slower). As you can see, it’s still very slow.

After a while, probably because the memory/cache are populated, this query becomes much faster:

 Nested Loop  (cost=2437.52..10850.51 rows=4 width=495) (actual time=306.337..612.232 rows=362 loops=1)    Buffers: shared hit=241776 read=39865 written=4    ->  Nested Loop  (cost=2437.09..10578.84 rows=569 width=499) (actual time=305.216..546.749 rows=36314 loops=1)          Buffers: shared hit=132503 read=39834 written=4          ->  Bitmap Heap Scan on core_releasegroup  (cost=2436.66..4636.70 rows=567 width=495) (actual time=305.195..437.375 rows=32679 loops=1)                Recheck Cond: ((release_date >= '2020-08-20'::date) AND (release_date <= '2020-10-20'::date) AND ((type)::text = ANY ('{Album,Single,EP,Live,Compilation,Remix,Other}'::text[])))                Heap Blocks: exact=29127                Buffers: shared hit=16 read=38078 written=4                ->  BitmapAnd  (cost=2436.66..2436.66 rows=567 width=0) (actual time=298.382..298.382 rows=0 loops=1)                      Buffers: shared hit=16 read=8951                      ->  Bitmap Index Scan on core_releasegroup_release_date_03a267f7  (cost=0.00..342.46 rows=16203 width=0) (actual time=5.619..5.619 rows=32679 loops=1)                            Index Cond: ((release_date >= '2020-08-20'::date) AND (release_date <= '2020-10-20'::date))                            Buffers: shared read=92                      ->  Bitmap Index Scan on core_releasegroup_type_58b6243d_like  (cost=0.00..2093.67 rows=113420 width=0) (actual time=289.917..289.917 rows=3240568 loops=1)                            Index Cond: ((type)::text = ANY ('{Album,Single,EP,Live,Compilation,Remix,Other}'::text[]))                            Buffers: shared hit=16 read=8859          ->  Index Scan using core_artist_release_groups_releasegroup_id_cea5da71 on core_artist_release_groups  (cost=0.43..10.46 rows=2 width=8) (actual time=0.003..0.003 rows=1 loops=32679)                Index Cond: (releasegroup_id = core_releasegroup.id)                Buffers: shared hit=132487 read=1756    ->  Index Only Scan using core_subscription_profile_id_artist_id_a4d3d29b_uniq on core_subscription u0  (cost=0.43..0.48 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=36314)          Index Cond: ((profile_id = 1) AND (artist_id = core_artist_release_groups.artist_id))          Heap Fetches: 362          Buffers: shared hit=109273 read=31  Planning Time: 1.088 ms  Execution Time: 612.360 ms 

This is still slow in SQL terms (I guess?), but much more acceptable. The problem is, even though this is a very common view in my web-app (an often executed query), it is still not kept around in RAM/cache, and so I see these huge response-time spikes too often.

I’ve tried every combination of constructing those queries. My last attempt is to see whether the planner estimations are to blame here, and if they’re fixable. If not, I’ll start considering de-normalization.

Or is there something else I’m missing?