In asking a recent question, some mysterious startup time components came up in my EXPLAIN ANALYZE output. I was playing further, and discovered that the startup time drops to near 0 if I remove the regex WHERE
clause.
I ran the following bash script as a test:
for i in $ (seq 1 10) do if (( $ RANDOM % 2 == 0 )) then echo "Doing plain count" psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2" else echo "Doing regex count" psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo'" fi done
The first query returns a count of ~30 million, and the second counts only 7 rows. They are running on a PG 12.3 read replica in RDS with minimal other activity. Both versions take roughly the same amount of time, as I’d expect. Here is some output filtered with grep
:
Doing plain count -> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.086..38622.215 rows=10114306 loops=3) Doing regex count -> Parallel Seq Scan on ui_events_v2 (cost=0.00..3091955.34 rows=897 width=0) (actual time=16856.679..41398.062 rows=2 loops=3) Doing plain count -> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.162..39454.499 rows=10114306 loops=3) Doing plain count -> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.036..39213.171 rows=10114306 loops=3) Doing regex count -> Parallel Seq Scan on ui_events_v2 (cost=0.00..3091955.34 rows=897 width=0) (actual time=12711.308..40015.734 rows=2 loops=3) Doing plain count -> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.244..39277.683 rows=10114306 loops=3) Doing regex count ^CCancel request sent
So, a few questions:
-
What goes into this startup component of "actual time" in the regex scan, and why is it so much larger? (10-20s vs 0-1s)
-
Although "cost" and "time" aren’t comparable units, the planner seems to think the startup cost should be 0 in all cases – is it being fooled?
-
Why do the strategies seem different? Both plans mention Partial Aggregate
, but the regex query says actual rows is 2
, but the plain version says actual rows is ~10 million (I guess this is some kind of average between 2 workers and 1 leader, summing to ~30 million). If I had to implement this myself, I would probably add up the results of several count(*)
operations, instead of merging rows and counting – do the plans indicate how exactly its doing that?
So I don’t hide anything, below are full versions of the query plan for each:
EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=3093171.59..3093171.60 rows=1 width=8) (actual time=39156.499..39156.499 rows=1 loops=1) -> Gather (cost=3093171.37..3093171.58 rows=2 width=8) (actual time=39156.356..39157.850 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=3092171.37..3092171.38 rows=1 width=8) (actual time=39154.405..39154.406 rows=1 loops=3) -> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060587.90 rows=12633390 width=0) (actual time=0.033..38413.690 rows=10115030 loops=3) Planning Time: 7.968 ms Execution Time: 39157.942 ms (8 rows) EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo' QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=3093173.83..3093173.84 rows=1 width=8) (actual time=39908.495..39908.495 rows=1 loops=1) -> Gather (cost=3093173.61..3093173.82 rows=2 width=8) (actual time=39908.408..39909.848 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=3092173.61..3092173.62 rows=1 width=8) (actual time=39906.317..39906.318 rows=1 loops=3) -> Parallel Seq Scan on ui_events_v2 (cost=0.00..3092171.37 rows=897 width=0) (actual time=17250.058..39906.308 rows=2 loops=3) Filter: (page ~ 'foo'::text) Rows Removed by Filter: 10115028 Planning Time: 0.803 ms Execution Time: 39909.921 ms (10 rows)