I come from a SQL Server, Oracle, Sybase DBA background but I am now looking into an AWS Aurora cluster running PostgreSQL 9.6.12 and have noticed something which I think is odd, but maybe it’s not, which is why I am here to ask the question. I have looked everywhere but can not find an answer. The default autovacuum and autoanalyze values are still set. Autovacuum does seem to get around to doing what it needs to do on application tables, eventually, but what I have noticed is that it seems to spend most of its time frequently vacuuming and analysing a small set of system tables. They are:
- pg_type
- pg_shdepend
- pg_attribute
- pg_class
- pg_depend
I am seeing this both through AWS Performance Insights data and also through direct queries to the database instance using this code:
WITH rel_set AS ( SELECT oid, CASE split_part(split_part(array_to_string(reloptions, ','), 'autovacuum_analyze_threshold=', 2), ',', 1) WHEN '' THEN NULL ELSE split_part(split_part(array_to_string(reloptions, ','), 'autovacuum_analyze_threshold=', 2), ',', 1)::BIGINT END AS rel_av_anal_threshold, CASE split_part(split_part(array_to_string(reloptions, ','), 'autovacuum_vacuum_threshold=', 2), ',', 1) WHEN '' THEN NULL ELSE split_part(split_part(array_to_string(reloptions, ','), 'autovacuum_vacuum_threshold=', 2), ',', 1)::BIGINT END AS rel_av_vac_threshold, CASE split_part(split_part(array_to_string(reloptions, ','), 'autovacuum_analyze_scale_factor=', 2), ',', 1) WHEN '' THEN NULL ELSE split_part(split_part(array_to_string(reloptions, ','), 'autovacuum_analyze_scale_factor=', 2), ',', 1)::NUMERIC END AS rel_av_anal_scale_factor, CASE split_part(split_part(array_to_string(reloptions, ','), 'autovacuum_vacuum_scale_factor=', 2), ',', 1) WHEN '' THEN NULL ELSE split_part(split_part(array_to_string(reloptions, ','), 'autovacuum_vacuum_scale_factor=', 2), ',', 1)::NUMERIC END AS rel_av_vac_scale_factor FROM pg_class ) SELECT PSUT.relname, -- to_char(PSUT.last_analyze, 'YYYY-MM-DD HH24:MI') AS last_analyze, to_char(PSUT.last_autoanalyze, 'YYYY-MM-DD HH24:MI') AS last_autoanalyze, -- to_char(PSUT.last_vacuum, 'YYYY-MM-DD HH24:MI') AS last_vacuum, to_char(PSUT.last_autovacuum, 'YYYY-MM-DD HH24:MI') AS last_autovacuum, to_char(C.reltuples, '9G999G999G999') AS n_tup, to_char(PSUT.n_dead_tup, '9G999G999G999') AS dead_tup, to_char(coalesce(RS.rel_av_anal_threshold, current_setting('autovacuum_analyze_threshold')::BIGINT) + coalesce(RS.rel_av_anal_scale_factor, current_setting('autovacuum_analyze_scale_factor')::NUMERIC) * C.reltuples, '9G999G999G999') AS av_analyze_threshold, to_char(coalesce(RS.rel_av_vac_threshold, current_setting('autovacuum_vacuum_threshold')::BIGINT) + coalesce(RS.rel_av_vac_scale_factor, current_setting('autovacuum_vacuum_scale_factor')::NUMERIC) * C.reltuples, '9G999G999G999') AS av_vacuum_threshold, CASE WHEN (coalesce(RS.rel_av_anal_threshold, current_setting('autovacuum_analyze_threshold')::BIGINT) + coalesce(RS.rel_av_anal_scale_factor, current_setting('autovacuum_analyze_scale_factor')::NUMERIC) * C.reltuples) < PSUT.n_dead_tup THEN '*' ELSE '' end AS expect_av_analyze, CASE WHEN (coalesce(RS.rel_av_vac_threshold, current_setting('autovacuum_vacuum_threshold')::BIGINT) + coalesce(RS.rel_av_vac_scale_factor, current_setting('autovacuum_vacuum_scale_factor')::NUMERIC) * C.reltuples) < PSUT.n_dead_tup THEN '*' ELSE '' end AS expect_av_vacuum, PSUT.autoanalyze_count, PSUT.autovacuum_count FROM pg_stat_all_tables PSUT JOIN pg_class C ON PSUT.relid = C.oid JOIN rel_set RS ON PSUT.relid = RS.oid ORDER BY PSUT.autoanalyze_count DESC; --C.reltuples
At first I thought that it may be due to a lot of temporary tables being created and then destroyed or something similar as I would periodically see the numbers of tuples go from, for example, roughly 8,000 to 8,000,000 and then back again in several of the previously mentioned tables. But I haven’t been able to find any evidence of temp table creation and the offshore developers say they don’t use them.
Is this sort of behaviour normal in normal PostgreSQL or Aurora (PostgreSQL)? Is there anything anyone could suggest to look at to ascertain what may be happening here if this is not normal? This database is about a terabyte in size on an instance with 122GB of RAM (75% allocated to shared_buffers – the default for Aurora.)
I am looking to change the autovaccum settings from the defaults to handle this databases much larger tables but just wanted to make sure that wouldn’t be a waste of time if the tables in question just monopolise autovacuum/autoanalyse’s time.
Current settings (from pg_settings): -autovacuum on -autovacuum_analyze_scale_factor 0.05 -autovacuum_analyze_threshold 50 -autovacuum_freeze_max_age 200000000 -autovacuum_max_workers 3 -autovacuum_multixact_freeze_max_age 400000000 -autovacuum_naptime 5 -autovacuum_vacuum_cost_delay 5 -autovacuum_vacuum_cost_limit -1 -autovacuum_vacuum_scale_factor 0.1 -autovacuum_vacuum_threshold 50
In summary, what I am asking is: Is it normal for a small set of system tables to be constantly and consistently autovacuumed? Any insights would be greatly appreciated.