Interpreting startup time and varying plans for Postgres seq scans

In asking a recent question, some mysterious startup time components came up in my EXPLAIN ANALYZE output. I was playing further, and discovered that the startup time drops to near 0 if I remove the regex WHERE clause.

I ran the following bash script as a test:

for i in $  (seq 1 10) do     if (( $  RANDOM % 2 == 0 ))     then         echo "Doing plain count"         psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2"     else         echo "Doing regex count"         psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo'"     fi done 

The first query returns a count of ~30 million, and the second counts only 7 rows. They are running on a PG 12.3 read replica in RDS with minimal other activity. Both versions take roughly the same amount of time, as I’d expect. Here is some output filtered with grep:

Doing plain count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.086..38622.215 rows=10114306 loops=3) Doing regex count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3091955.34 rows=897 width=0) (actual time=16856.679..41398.062 rows=2 loops=3) Doing plain count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.162..39454.499 rows=10114306 loops=3) Doing plain count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.036..39213.171 rows=10114306 loops=3) Doing regex count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3091955.34 rows=897 width=0) (actual time=12711.308..40015.734 rows=2 loops=3) Doing plain count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.244..39277.683 rows=10114306 loops=3) Doing regex count ^CCancel request sent 

So, a few questions:

  1. What goes into this startup component of "actual time" in the regex scan, and why is it so much larger? (10-20s vs 0-1s)

  2. Although "cost" and "time" aren’t comparable units, the planner seems to think the startup cost should be 0 in all cases – is it being fooled?

  3. Why do the strategies seem different? Both plans mention Partial Aggregate, but the regex query says actual rows is 2, but the plain version says actual rows is ~10 million (I guess this is some kind of average between 2 workers and 1 leader, summing to ~30 million). If I had to implement this myself, I would probably add up the results of several count(*) operations, instead of merging rows and counting – do the plans indicate how exactly its doing that?

So I don’t hide anything, below are full versions of the query plan for each:

 EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2                                                                        QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------  Finalize Aggregate  (cost=3093171.59..3093171.60 rows=1 width=8) (actual time=39156.499..39156.499 rows=1 loops=1)    ->  Gather  (cost=3093171.37..3093171.58 rows=2 width=8) (actual time=39156.356..39157.850 rows=3 loops=1)          Workers Planned: 2          Workers Launched: 2          ->  Partial Aggregate  (cost=3092171.37..3092171.38 rows=1 width=8) (actual time=39154.405..39154.406 rows=1 loops=3)                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060587.90 rows=12633390 width=0) (actual time=0.033..38413.690 rows=10115030 loops=3)  Planning Time: 7.968 ms  Execution Time: 39157.942 ms (8 rows)   EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo'                                                                    QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------  Finalize Aggregate  (cost=3093173.83..3093173.84 rows=1 width=8) (actual time=39908.495..39908.495 rows=1 loops=1)    ->  Gather  (cost=3093173.61..3093173.82 rows=2 width=8) (actual time=39908.408..39909.848 rows=3 loops=1)          Workers Planned: 2          Workers Launched: 2          ->  Partial Aggregate  (cost=3092173.61..3092173.62 rows=1 width=8) (actual time=39906.317..39906.318 rows=1 loops=3)                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3092171.37 rows=897 width=0) (actual time=17250.058..39906.308 rows=2 loops=3)                      Filter: (page ~ 'foo'::text)                      Rows Removed by Filter: 10115028  Planning Time: 0.803 ms  Execution Time: 39909.921 ms (10 rows)