NOT EXISTS with two subquery fields that match 2 fields in main query

Background: Two different document types in a document management system. Both Doc Type A and Doc Type B have a Ticket #, and a Ticket Date. What we’re looking for: Doc Type A docs that don’t have a matching Doc Type B doc (NOT EXISTS) with the same Ticket # and Ticket Date. There like are Doc Type B docs that have the same Ticket # but NOT the same Ticket Date. We want to ignore those. Seems simple…. but I am stuck. So far what I have is something like this:

SELECT distinct ki110.keyvaluechar AS "Ticket #", ki101.keyvaluedate AS "Ticket Date"  FROM itemdata  left outer join hsi.keyitem110 ki110 on ( itemdata.itemnum = ki110.itemnum ) left outer join hsi.keyitem101 ki101 on ( itemdata.itemnum = ki101.itemnum ) WHERE   ki101.keyvaluedate BETWEEN '01-01-2021' AND '01-31-2021' AND ( itemdata.itemtypenum  = 178  ) -- this is Doc Type A  AND NOT EXISTS (select ki110.keyvaluechar, ki101.keyvaluedate from itemdata, keyitem110 ki110 , keyitem101 ki101 where --(itemdata.itemnum = ki110.itemnum) --Ticket # 

— ** the problem is here for Date: I need to say Date in Doc Type B doc is not the same as Date in Doc Type A doc using ki101.keyvaluedate)

AND itemdata.itemtypenum = 183) -- this  is DOC Type B 

Query to find the second highest row in a subquery

The goal is to send notifications about the customer updates but only for the first one if there are consecutive updates from the customer in a ticketing system.

This is the simplified query that I’m using to get the data that I need. There are a few more columns in the original query and this subquery for threads is kind of required so I can also identify if this is a new ticket or if existing one was updated (in case of update, the role for the latest threads will be a customer):

SELECT t.ref, m.role    FROM tickets t    LEFT JOIN threads th ON (t.id = th.ticket_id)    LEFT JOIN members m ON (th.member_id = m.id)   WHERE th.id IN ( SELECT MAX(id)                      FROM threads                     WHERE ticket_id = t.id                 ) 

It will return a list of tickets so the app can send notifications based on that:

+------------+----------+ | ref        | role     | +------------+----------+ | 210117-001 | customer | | 210117-002 | staff    | +------------+----------+ 

Now, I want to send only a single notification if there a multiply consecutive updates from the customer.

Question:

How I can pull last and also one before last row to identify if this is consecutive reply from the customer?

I was thinking about GROUP_CONCAT and then parse the output in the app but tickets can have many threads so that’s not optimal and there are also a few more fields in the query so it will violate the ONLY_FULL_GROUP_BY SQL mode.

db<>fiddle here

MySQL – multiple counts on relations based on conditions – JOIN VS. SUBQUERY

I don’t want to share my exact DB structure, so let’s assume this analogy :

--categories-- id name  --products-- id name cat_id 

I then have SQL like this :

SELECT categories.*, count(CASE WHEN products.column1=something1 and products.column2=something2 THEN 1 END) as count1, count(CASE WHEN products.column3=something3) as count2  FROM categories LEFT JOIN products ON products.cat_id=categories.id GROUP BY categories.id 

The problem here is that the GROUP BY is taking too long, it’s a difference between 0.2s query and 2.5s query.

How do you generate random row order in a subquery?

I know other answers here (and here) say to order by newid(). However if I am selecting top 1 in a subquery – so as to generate a random selection per row in the outer query – using newid() yields the same result each time.

That is:

select *,     (select top 1 [value] from lookupTable where [code] = 'TEST') order by newid()) from myTable 

… yields the same lookupTable.value value on each row returned from myTable.

postgresql: filter with subquery and then insert

having the following query, my plan was to filter with the subquery first and then to INSERT:

INSERT INTO cals_new (listing_id,date,available,price,timestamp)                                          (SELECT listing_id,date,available,price,timestamp                                           FROM   cals c                                           WHERE  NOT EXISTS ( SELECT                                                                FROM   update                                                                WHERE  id = c.id                                                                )); 

But EXPLAIN gives the following:

QUERY PLAN

Insert on cals_new (cost=513077.42..5338467.65 rows=109800833 width=44) -> Merge Anti Join (cost=513077.42..5338467.65 rows=109800833 width=44) Merge Cond: (c.id = update.id) -> Index Scan using cals_pkey on cals c (cost=0.57..3958576.01 rows=113119496 width=44) -> Sort (cost=513076.85..521373.51 rows=3318663 width=8) Sort Key: update.id -> Seq Scan on update (cost=0.00..62881.63 rows=3318663 width=8) 

If I understend the EXPLAIN output correctly, it does the expansive INSERT first, and only filters after.

How to optimise the query so that it works as expected? More specifically, filters with the subquery first and then does the INSERT?

Thank you!

Postgres UPDATE with data from another table – index only scan used for correlated subquery but not join

Context

I’m tuning a bulk UPDATE which selects from another (large) table. My intention is to provide a covering index to support an index only scan of the source table. I realise the source table must be vacuumed to update its visibility map.

My investigations so far suggest the optimiser elects to index only scan the source table when the UPDATE uses a correlated subquery, but appears to use a standard index scan when a join is used (UPDATE...FROM). I’m asking this question to understand why.

I provide a simplified example here to illustrate the differences.

I’m using Postgres 9.6.8, but get very similar plans for 10.11 and 11.6. I have reproduced the plans on a vanilla 9.6 Postgres installation in Docker using the official image, and also on db<>fiddle here.

Setup

CREATE TABLE lookup (     surrogate_key   BIGINT PRIMARY KEY,     natural_key     TEXT NOT NULL UNIQUE,     data            TEXT NOT NULL);  INSERT INTO lookup SELECT id, 'nk'||id, random()::text FROM generate_series(1,400000) id;  CREATE UNIQUE INDEX lookup_ix ON lookup(natural_key, surrogate_key);  VACUUM ANALYSE lookup;  CREATE TABLE target (     target_id               BIGINT PRIMARY KEY,     lookup_natural_key      TEXT NOT NULL,     lookup_surrogate_key    BIGINT,     data                    TEXT NOT NULL );  INSERT INTO target (target_id, lookup_natural_key, data) SELECT id+1000, 'nk'||id, random()::text FROM generate_series(1,1000) id;  ANALYSE target; 

UPDATE using join

EXPLAIN (ANALYSE, VERBOSE, BUFFERS) UPDATE target SET lookup_surrogate_key = surrogate_key FROM lookup WHERE lookup_natural_key = natural_key; 

Standard index scan on lookup_ix – so heap blocks are read from lookup table:

Update on public.target  (cost=0.42..7109.00 rows=1000 width=54) (actual time=76.688..76.688 rows=0 loops=1)   Buffers: shared hit=8514 read=550 dirtied=16   ->  Nested Loop  (cost=0.42..7109.00 rows=1000 width=54) (actual time=0.050..62.493 rows=1000 loops=1)         Output: target.target_id, target.lookup_natural_key, lookup.surrogate_key, target.data, target.ctid, lookup.ctid         Buffers: shared hit=3479 read=535         ->  Seq Scan on public.target  (cost=0.00..19.00 rows=1000 width=40) (actual time=0.013..7.691 rows=1000 loops=1)               Output: target.target_id, target.lookup_natural_key, target.data, target.ctid               Buffers: shared hit=9         ->  Index Scan using lookup_ix on public.lookup  (cost=0.42..7.08 rows=1 width=22) (actual time=0.020..0.027 rows=1 loops=1000)               Output: lookup.surrogate_key, lookup.ctid, lookup.natural_key               Index Cond: (lookup.natural_key = target.lookup_natural_key)               Buffers: shared hit=3470 read=535 Planning time: 0.431 ms Execution time: 76.826 ms 

UPDATE using correlated subquery

EXPLAIN (ANALYSE, VERBOSE, BUFFERS) UPDATE target SET lookup_surrogate_key = (     SELECT surrogate_key     FROM lookup     WHERE lookup_natural_key = natural_key); 

Index only scan on lookup_ix as intended:

Update on public.target  (cost=0.00..4459.00 rows=1000 width=47) (actual time=52.947..52.947 rows=0 loops=1)   Buffers: shared hit=8050 read=15 dirtied=16   ->  Seq Scan on public.target  (cost=0.00..4459.00 rows=1000 width=47) (actual time=0.052..40.306 rows=1000 loops=1)         Output: target.target_id, target.lookup_natural_key, (SubPlan 1), target.data, target.ctid         Buffers: shared hit=3015         SubPlan 1           ->  Index Only Scan using lookup_ix on public.lookup  (cost=0.42..4.44 rows=1 width=8) (actual time=0.013..0.019 rows=1 loops=1000)                 Output: lookup.surrogate_key                 Index Cond: (lookup.natural_key = target.lookup_natural_key)                 Heap Fetches: 0                 Buffers: shared hit=3006 Planning time: 0.130 ms Execution time: 52.987 ms 

db<>fiddle here

I understand that the queries are not logically identical (different behaviour when there a no/multiple rows in lookup for a given natural_key), but I’m surprised by the different usage of lookup_ix.

Can anyone explain why the join version could not use an index only scan please?

Postgres how to use changing value in a row inside of subquery

I have a query that looks like this:

select ipam_networks.ipam_id, ipam_networks.net_name, ipam_networks.net_cidr, zone, count(*) from ipam_addresses left outer join ipam_networks on ipam_addresses.parent_id = ipam_networks.ipam_id where ipam_addresses.ip_state != 'DHCP_FREE' group by ipam_networks.ipam_id, ipam_networks.net_name, ipam_networks.net_cidr;

I am attempting to update the query to output the percentage of hosts that are in the ipam_addresses table that have the column host_id set to 0, and also that host is within the current CIDR network range.

However, I don’t know how to compare the current value of “address” (ip address) to the corresponding CIDR network range.

I have changed the query to look like this:

select ipam_networks.ipam_id, ipam_networks.net_name, ipam_networks.net_cidr, zone, count(*),
to_char(100*(select count(*) from ipam_addresses where host_id = 0 and address << ipam_networks.net_cidr)/(select count(*) from ipam_addresses where address << ipam_networks.net_cidr), '999D99%') a from ipam_addresses left outer join ipam_networks on ipam_addresses.parent_id = ipam_networks.ipam_id where ipam_addresses.ip_state != 'DHCP_FREE' group by ipam_networks.ipam_id, ipam_networks.net_name, ipam_networks.net_cidr;

The updated query throws ERROR: division by zero due to the second half of the division statement (select count(*) from ipam_addresses where address << ipam_networks.net_cidr

Is there any way I can use the current value for net_cidr successfully for each row?

I am far from an expert in SQL so bear with me.

PostgreSQL query with subquery doesn’t working as expected

I’m trying to make a query that should return records based on a subquery. Here’s an example:

select … where tablename like (‘%subquery_result%’);

The subquery is generating the string as expected:

appldb=# select '''%_p' ||  replace(substring(CAST(current_date - INTERVAL '1 MONTH' AS text), 1, 7),'-', '_') || '%''';    ?column? ---------------  '%_p2019_09%' (1 row) 

Using the string above, records are returned:

select tablename from pg_tables  where tablename like '%_p2019_09%' limit 2;       tablename ---------------------  part_p2019_09_26  part_p2019_09_29 (2 rows) 

But when I use the complete query I have no return:

appldb=# select tablename  from pg_tables  where tablename like ( select '''%_p' ||  replace(substring(CAST(current_date - INTERVAL '1 MONTH' AS text), 1, 7),'-', '_') || '%''' );  tablename ----------- (0 rows) 

I have already tried to remove parentheses around the subquery. But the query got in error.

Did I miss any steps to ensure where clause to interpret the subquery?

Postresql using subquery select in order by clause

Is it possible to order by a selected columnname?

  SELECT * FROM tablex ORDER BY  quote_ident((   select column_name from information_schema.columns   where table_name = tablex AND column_name LIKE 'Index')) ASC; 

The output of this is not ordered according to the columnname Index.

So basically my question is if i can use a select query in the order by clause like this simple one:

SELECT * FROM tablex ORDER BY (SELECT Index FROM table WHERE....); 

What did i oversee?

Select splat in a subquery

Obviously, 'select *' is never a good idea.

HOWEVER, I have taken a job with an org that allowed this cancer to spread.

They have huge queries, using select * as SUB-QUERIES, when the coder only needed 3 or 4 columns.

 select t.field1, t.field2, t.field3, x.field4 ...    from table t    left join (select * from table where .... ) x      on x.field5 = t.field5     left join (select * from table where .... ) y      on y.field6 = t.field6    left join (select * from table where .... ) z      on z.field7 = t.field7  

Performance on this beast is a dog.

The databases we pull from we don’t own, so I don’t have rights to get an estimated or actual execution plan.

Before I start rewriting these queries, is the query optimizer on the M$ SQL Server smart enough to translate the splat into just the needed columns? Or do I start targeting one query a day at lunch?

Thank you for your time and consideration.