Show more details on Postgres logical replication errors

Wonder if there is a way to add more details (like the column name, database) to logical replication errors in case of missing columns. I got general log entries like this:

2021-09-16 14:47:37.149 CDT [32910] ERROR:  logical replication target relation "public.users" is missing some replicated columns 

I could not find anything related in the documentation. I am trying to detect these kinds of errors to trigger an alert or something like that. The only idea that I have is to watch the logs for entries like the above one. Any idea will be welcome!

Collation for accent-insensitive comparison on Postgres?

On PG 13 documentation, there are several examples of ICU collations for specialized purposes. It is also mentioned that ICU locales exist that allow creating collations to ignore accents, and that they can be found on https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml

However, after reading that document, I am still unclear on what is the locale I should use to create an ICU collation for accent insensitive comparisons in Spanish.

What is the name of such an ICU locale? Is there any list of ICU Spanish collations?

Postgres function not returning any rows but no errors

I have this below function which returns no rows, but no errors as well. Am I missing any statement over here to return the rows?. If I execute the select statement alone I do get results.

CREATE OR REPLACE FUNCTION public.create_new_conversation()  RETURNS TABLE(chat_id text, members jsonb)  LANGUAGE plpgsql AS $  function$   declare    the_record record; begin    select conversation_id, participants    into the_record    from public.conversations    where conversation_id='123456';     --    return conversationid; end; $  function$   

Massive slowdown after doing an ALTER to change index from int to bigint, with Postgres

I have a table like this:

create table trades (     instrument varchar(20)      not null,     ts         timestamp        not null,     price      double precision not null,     quantity   double precision not null,     direction  integer          not null,     id         serial         constraint trades_pkey             primary key ); 

I wanted to move the id to bigint, so I did:

ALTER TABLE trades ALTER id TYPE BIGSERIAL;

then, after, I did:

ALTER SEQUENCE trades_id_seq AS BIGINT;

and now, pretty much any large query, using the id in the WHERE expression, will be so slow it will timeout.

The database is AWS RDS Postgres.

Could it be a problem with the index itself?


Here is the query:

EXPLAIN (ANALYZE, BUFFERS)  SELECT id, instrument, ts, price, quantity, direction FROM binance_trades WHERE id >= 119655532 ORDER BY ts LIMIT 50; 

and output:

50 rows retrieved starting from 1 in 1 m 4 s 605 ms (execution: 1 m 4 s 353 ms, fetching: 252 ms)

INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('Limit  (cost=0.57..9.86 rows=50 width=44) (actual time=86743.860..86743.878 rows=50 loops=1)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('  Buffers: shared hit=20199328 read=1312119 dirtied=111632 written=109974'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('  I/O Timings: read=40693.524 write=335.051'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('  ->  Index Scan using idx_extrades_ts on binance_trades  (cost=0.57..8015921.79 rows=43144801 width=44) (actual time=86743.858..86743.871 rows=50 loops=1)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('        Filter: (id >= 119655532)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('        Rows Removed by Filter: 119654350'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('        Buffers: shared hit=20199328 read=1312119 dirtied=111632 written=109974'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('        I/O Timings: read=40693.524 write=335.051'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('Planning Time: 0.088 ms'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('Execution Time: 86743.902 ms'); 

The activity on AWS:

enter image description here

it’s a 2 cores, 8gb ARM server. Before I did the alter, the same request was < 1 sec. now, small requests are slow and long ones will timeout.

Left Join with multiple Depth in postgres

I have a schema that looks like this:

                  +------------+ +------------+    + acc_email  | | account    |    +============+ +============+    | id         | | id         |<-+-+ account_id | +------------+  | | email      |                 | +------------+                 |  +------------+                 |  | password   |                 |  |============|                 |  | id         |                 +--+ account_id |                    | password   |                    | iterations |                    | salt       |                    +------------+ 

The users login via email, need to find the account, than find the password. I got this far:

       SELECT *           FROM acc_email     LEFT JOIN account            ON acc_email.account_id = account.id 

How do I join and retrieve the account email, account and latest password in a single query?

Postgres Role access privileges

I am trying to assign select privs via a role for current and future tables, and i cant see to figure this out. Please advise.

create role dev_role; grant usage on schema address to dev_role grant select on all tables in schema address to dev_role  alter default privileges in schema address grant select on tables to dev_role;   grant dev_role to test1; 

Now, Test2 user creates a table in address schema that has grant all privileges.

\c dev test2 create table address.t1(t integer);  \c dev test1 select * from address_match.t1; ERROR:  permission denied for table t1 

Query RDS postgres log size

I don’t have console access, only query access to a postgres 12.5 aws rds db.

show log_destination ;--stderr show logging_collector ; --on show log_directory ; --/rdsdbdata/log/error show log_filename ; --postgresql.log.%Y-%m-%d-%H 

Is there a sql command I can run to determine the total size of all the logs in my rds instance?

Several Postgres partitioning questions (hierarchical partitioning, HASH, PK order)

I’m pretty new to Postgres. I have a few related questions about the performance benefits of partitioning.

Background: I am trying to fix slow batch queries on a 4 column table with over 300 million rows. The PK is on all columns and the 3 columns that aren’t at the beginning of the PK also have single column indexes.

The indexes are growing out of control and are collectively larger than all RAM, and 3x the size of the underlying data. Batches regularly read and write a couple hundred million rows and they are slow.

What I have already done/understand: I’ve analyzed the code and determined that only the first and last columns are used in SELECT WHERE clauses. Other than INSERTs, no other queries hit this table (besides a DELETE that will be replaced with partition dropping). So I already know I can drop two of the three indexes since they are unused, which will lead to reduced index size, fewer indexes, and hopefully improved INSERT/SELECT performance.

In addition to dropping the unused indexes, I am going to use partitioning to replace a lengthy DELETE statement (which has a WHERE on the two middle columns) with, instead, LIST partitioning on those columns so I can DROP partitions.

Where I have questions: My proposed LIST partition also breaks up the table somewhat, so the partitions are no more than 1/4 the size of the original table. However, even that size is very large. I am considering adding a second hierarchical partition layer with a HASH partition on the 4th column, which could further considerably reduce partition size. My hope is that this would further reduce RAM consumption and/or improve performance in other ways.

However, I have a few questions about this:

  1. While I would be reducing partition size, my queries actually select on nearly every value in the column where I would be using HASH partitioning. I don’t know the distribution of those selected values, so there might be hot and cold partitions, but few totally cold partitions. In this case, would HASH partitioning even help, or would it be no improvement over the single level of partitioning? It’s kind of confusing to me, because since hashing is random by design, I would expect this to be a problem whenever HASH is used.

The only reason I think it might help is by reducing the height of the index trees, since the indexes exist within partitions. But all indexes would still be used. Basically what I’m wondering is, is it still an improvement to reduce index height even if all the indexes may be used, versus one huge index? Is there a best practice as to when HASH might help?

  1. The upper level of the partition hierarchy is a LIST on the two middle columns because that’s how my DELETE (that I am replacing) is defined. But those columns are never used in WHERE clauses. Is it a problem if the top partition hierarchy level isn’t even used in WHERE clauses? I could reverse the order so the HASH is the parent partition, since I actually select by that, but then I’d need to drop multiple partitions when I DROP. Should I just bite the bullet and switch the order?

  2. Is it best for the order of the partition hierarchy to match the order of the PK? In other words, if my parent partitioning is LIST(col2, col3) and my child partitioning is HASH(col4), should I change my PK from 1,2,3,4 to 2,3,4,1 to match, or does it not matter?