Why do two queries run faster than combined subquery?

I’m running postgres 11 on Azure.

If I run this query:

select min(pricedate) + interval '2 days' from pjm.rtprices 

It takes 0.153 sec and has the following explain:

    "Result  (cost=2.19..2.20 rows=1 width=8)"     "  InitPlan 1 (returns $  0)"     "    ->  Limit  (cost=0.56..2.19 rows=1 width=4)"     "          ->  Index Only Scan using rtprices_pkey on rtprices  (cost=0.56..103248504.36 rows=63502562 width=4)"     "                Index Cond: (pricedate IS NOT NULL)" 

If I run this query:

    select pricedate, hour, last_updated, count(1) as N      from pjm.rtprices     where pricedate<= '2020-11-06 00:00:00'     group by pricedate, hour, last_updated     order by pricedate desc, hour 

it takes 5sec with the following explain:

    "GroupAggregate  (cost=738576.82..747292.52 rows=374643 width=24)"     "  Group Key: pricedate, hour, last_updated"     "  ->  Sort  (cost=738576.82..739570.68 rows=397541 width=16)"     "        Sort Key: pricedate DESC, hour, last_updated"     "        ->  Index Scan using rtprices_pkey on rtprices  (cost=0.56..694807.03 rows=397541 width=16)"     "              Index Cond: (pricedate <= '2020-11-06'::date)" 

However when I run

    select pricedate, hour, last_updated, count(1) as N      from pjm.rtprices     where pricedate<= (select min(pricedate) + interval '2 days' from pjm.rtprices)     group by pricedate, hour, last_updated     order by pricedate desc, hour 

I get impatient after 2 minutes and cancel it.

The explain on the long running query is:

    "Finalize GroupAggregate  (cost=3791457.04..4757475.33 rows=3158115 width=24)"     "  Group Key: rtprices.pricedate, rtprices.hour, rtprices.last_updated"     "  InitPlan 2 (returns $  1)"     "    ->  Result  (cost=2.19..2.20 rows=1 width=8)"     "          InitPlan 1 (returns $  0)"     "            ->  Limit  (cost=0.56..2.19 rows=1 width=4)"     "                  ->  Index Only Scan using rtprices_pkey on rtprices rtprices_1  (cost=0.56..103683459.22 rows=63730959 width=4)"     "                        Index Cond: (pricedate IS NOT NULL)"     "  ->  Gather Merge  (cost=3791454.84..4662729.67 rows=6316230 width=24)"     "        Workers Planned: 2"     "        Params Evaluated: $  1"     "        ->  Partial GroupAggregate  (cost=3790454.81..3932679.99 rows=3158115 width=24)"     "              Group Key: rtprices.pricedate, rtprices.hour, rtprices.last_updated"     "              ->  Sort  (cost=3790454.81..3812583.62 rows=8851522 width=16)"     "                    Sort Key: rtprices.pricedate DESC, rtprices.hour, rtprices.last_updated"     "                    ->  Parallel Seq Scan on rtprices  (cost=0.00..2466553.08 rows=8851522 width=16)"     "                          Filter: (pricedate <= $  1)" 

Clearly, the last query has it doing a very expensive gathermerge so how to avoid that?

I did a different approach here:

    with lastday as (select distinct pricedate from pjm.rtprices order by pricedate limit 3)         select rtprices.pricedate, hour, last_updated - interval '4 hours' as last_updated, count(1) as N          from pjm.rtprices         right join lastday on rtprices.pricedate=lastday.pricedate         where rtprices.pricedate<= lastday.pricedate         group by rtprices.pricedate, hour, last_updated         order by rtprices.pricedate desc, hour 

which took just 2 sec with the following explain:

    "GroupAggregate  (cost=2277449.55..2285769.50 rows=332798 width=32)"     "  Group Key: rtprices.pricedate, rtprices.hour, rtprices.last_updated"     "  CTE lastday"     "    ->  Limit  (cost=0.56..1629038.11 rows=3 width=4)"     "          ->  Result  (cost=0.56..105887441.26 rows=195 width=4)"     "                ->  Unique  (cost=0.56..105887441.26 rows=195 width=4)"     "                      ->  Index Only Scan using rtprices_pkey on rtprices rtprices_1  (cost=0.56..105725202.47 rows=64895517 width=4)"     "  ->  Sort  (cost=648411.43..649243.43 rows=332798 width=16)"     "        Sort Key: rtprices.pricedate DESC, rtprices.hour, rtprices.last_updated"     "        ->  Nested Loop  (cost=0.56..612199.22 rows=332798 width=16)"     "              ->  CTE Scan on lastday  (cost=0.00..0.06 rows=3 width=4)"     "              ->  Index Scan using rtprices_pkey on rtprices  (cost=0.56..202957.06 rows=110933 width=16)"     "                    Index Cond: ((pricedate <= lastday.pricedate) AND (pricedate = lastday.pricedate))" 

This last one is all well and good but if my subquery wasn’t extensible to this hack, is there a better way for my subquery to have similar performance to the one at a time approach?

How to solve this inconsistency timing of MySQL queries?


Table Details :-

SHOW CREATE TABLE foldertable

Result

CREATE TABLE `foldertable` (  `serverToken` bigint(1) NOT NULL,  `folderName` varchar(255) NOT NULL,  `folderid` varchar(255) NOT NULL,  `RootFolderPath` longtext NOT NULL,  `createdTime` datetime NOT NULL,  `LastEdited` datetime NOT NULL,  `RootFolderPreviewPath` longtext NOT NULL,  `userFolderPathName` text NOT NULL,  `starred` tinyint(1) NOT NULL,  `trashed` tinyint(1) NOT NULL,  PRIMARY KEY (`folderid`) USING BTREE,  UNIQUE KEY `folderid` (`folderid`),  KEY `serverToken` (`serverToken`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4  

Point to note-

I am always executing second query while first transaction is running . The sleep(10) is there intentionaly

Tests

First session

First Query :-

mysqli_begin_transaction($  conn);  $  sql = "SELECT * FROM foldertable where folderid  = '1234567' FOR  UPDATE" ; $  sql = "SELECT * FROM foldertable where serverToken  = 1234567 FOR  UPDATE" ; mysqli_query($  conn , $  sql) ; echo (mysqli_error($  conn)) ; echo(mysqli_affected_rows($  conn)) ; sleep (10); mysqli_commit($  conn) ;  

2nd Query

INSERT INTO `foldertable`(     `serverToken`,     `folderName`,     `folderid`,     `RootFolderPath`,     `createdTime`,     `LastEdited`,     `RootFolderPreviewPath`,     `userFolderPathName`,     `starred`,     `trashed` ) VALUES(     12345,     '',     'ABCDE',     '',     '',     '',     '',     '',     '',     '' );   

Result

1 row inserted. (Query took 9.4569 seconds.)

Second session

First Query :-

mysqli_begin_transaction($  conn);  $  sql = "SELECT * FROM foldertable where folderid  = '1234567' FOR  UPDATE" ; $  sql = "SELECT * FROM foldertable where serverToken  = 1234567 FOR  UPDATE" ; mysqli_query($  conn , $  sql) ; echo (mysqli_error($  conn)) ; echo(mysqli_affected_rows($  conn)) ; sleep (10); mysqli_commit($  conn) ;  

2nd Query

INSERT INTO `foldertable`(     `serverToken`,     `folderName`,     `folderid`,     `RootFolderPath`,     `createdTime`,     `LastEdited`,     `RootFolderPreviewPath`,     `userFolderPathName`,     `starred`,     `trashed` ) VALUES(     12345,     '',     'ABCDEf',     '',     '',     '',     '',     '',     '',     '' ); 

Result

1 row inserted. (Query took 9.4569 seconds.)

Third session

First Query :-

mysqli_begin_transaction($  conn);  $  sql = "SELECT * FROM foldertable where folderid  = '1234567' FOR  UPDATE" ; $  sql = "SELECT * FROM foldertable where serverToken  = 1234567 FOR  UPDATE" ; mysqli_query($  conn , $  sql) ; echo (mysqli_error($  conn)) ; echo(mysqli_affected_rows($  conn)) ; sleep (10); mysqli_commit($  conn) ;  

2nd Query

INSERT INTO `foldertable`(     `serverToken`,     `folderName`,     `folderid`,     `RootFolderPath`,     `createdTime`,     `LastEdited`,     `RootFolderPreviewPath`,     `userFolderPathName`,     `starred`,     `trashed` ) VALUES(     1234567,     '',     'ABCDEfg',     '',     '',     '',     '',     '',     '',     '' ); 

Result

1 row inserted. (Query took 9.4569 seconds.)

Forth session

First Query :-

mysqli_begin_transaction($  conn);  $  sql = "SELECT * FROM foldertable where folderid  = '1234567' FOR  UPDATE" ; $  sql = "SELECT * FROM foldertable where serverToken  = 1234567 FOR  UPDATE" ; mysqli_query($  conn , $  sql) ; echo (mysqli_error($  conn)) ; echo(mysqli_affected_rows($  conn)) ; sleep (10); mysqli_commit($  conn) ;  

2nd Query

INSERT INTO `foldertable`( `serverToken`, `folderName`, `folderid`, `RootFolderPath`, `createdTime`, `LastEdited`, `RootFolderPreviewPath`, `userFolderPathName`, `starred`, `trashed` ) VALUES( 123, '', 'ABCDEfgh', '', '', '', '', '', '', '' ) 

Result

1 row inserted. (Query took 0.0073 seconds.)

Fifth session

First Query :-

mysqli_begin_transaction($  conn);  $  sql = "SELECT * FROM foldertable where folderid  = '1234567' FOR  UPDATE" ; $  sql = "SELECT * FROM foldertable where serverToken  = 1234567 FOR  UPDATE" ; mysqli_query($  conn , $  sql) ; echo (mysqli_error($  conn)) ; echo(mysqli_affected_rows($  conn)) ; sleep (10); mysqli_commit($  conn) ;  

2nd Query

INSERT INTO `foldertable`( `serverToken`, `folderName`, `folderid`, `RootFolderPath`, `createdTime`, `LastEdited`, `RootFolderPreviewPath`, `userFolderPathName`, `starred`, `trashed` ) VALUES( 1235465, '', 'ABCEfgh', '', '', '', '', '', '', '' ) 

Result

1 row inserted. (Query took 9.5040 seconds.)

Fifth session

First Query :-

mysqli_begin_transaction($  conn);  $  sql = "SELECT * FROM foldertable where folderid  = '1234567' FOR  UPDATE" ; $  sql = "SELECT * FROM foldertable where serverToken  = 1234567 FOR  UPDATE" ; mysqli_query($  conn , $  sql) ; echo (mysqli_error($  conn)) ; echo(mysqli_affected_rows($  conn)) ; sleep (10); mysqli_commit($  conn) ;  

2nd Query

INSERT INTO `foldertable`(     `serverToken`,     `folderName`,     `folderid`,     `RootFolderPath`,     `createdTime`,     `LastEdited`,     `RootFolderPreviewPath`,     `userFolderPathName`,     `starred`,     `trashed` ) VALUES(     1235465,     '',     'ABCEfghi',     '',     '',     '',     '',     '',     '',     '' ); 

Result

1 row inserted. (Query took 0.0087 seconds.)

I am quite new to mysql . I am using xampp for Testing and phpmyadmin for query and postman for transaction api .

So How can I solve this strange behaviour .

Indexed columns in SQL Server do not appear to work for basic queries according to execution plan

Disclaimer: I’m not a DBA. I have picked up a few things from this board in the past that I’m building from.

I have a table of google analytics session start times. I have an index on each column. I want to filter for all sessions that were started between two dates. Screenshot below shows the query, and the index.

Query text and index properties

The query runs quickly but I do not believe it’s using the index based on the Execution plan which both says that there’s a missing index and shows a table scan rather than an index scan:

execution

Why?

Is it because of something about the way I’m searching through the datetime? If instead of looking between dates, I set it equal to a date, the execution plan shows it using the index:

Using index

But it’s not just this table or datetime. Here’s a different table with an index on a varchar column:

metadata index

And a simple query on this one also tells me I’m missing the index:

missing md index

I’m stumped.

Joining two left join queries

i have two queries the first one :

 Select VCRNUM_0 As 'Tranx_Number', ACTQTY_0 As 'Quanity', sto.CREDAT_0 As 'Create_Date', sto.PROD_DATE_0, sto.PROD_TIME_0, sto.CREUSER_0, sto.ITMREF_0, sto.VCRNUMORI_0 as 'Work_Order_number', itm.ITMWEI_0, sto.ACTQTY_0 * itm.ITMWEI_0 As Weight, itm.ITMDES1_0 as 'SKU_Description', a1.TEXTE_0 As 'Work_Center_Description', gope.CPLWST_0 as 'Work_Center_Number'  From ZSTOJOU sto JOIN MFGOPE gope on sto.VCRNUMORI_0 = gope.MFGNUM_0 LEFT JOIN ATEXTRA a1 ON gope.CPLWST_0= a1.IDENT1_0 and a1.CODFIC_0 = 'WORKSTATIO' and a1.ZONE_0='WSTDESAXX' AND a1.LANGUE_0='ENG' LEFT JOIN APLSTD APL ON sto.TRSTYP_0= APL.LANNUM_0 and APL.LAN_0= 'ENG' and APL.LANCHP_0 = 704 LEFT JOIN ITMMASTER itm on sto.ITMREF_0=itm.ITMREF_0 Left Join ATEXTRA a2 On itm.TSICOD_6=a2.IDENT2_0 and a1.CODFIC_0 = 'ATABDIV' and a2.ZONE_0='LNGDES'AND a2.LANGUE_0='ENG'And a2.IDENT1_0=26 WHERE sto.TRSTYP_0 =5 and  sto.VCRTYPORI_0=10 and sto.VCRTYPREG_0 = 0  AND gope.CPLWST_0 NOT IN('22500L','22600L','225C0L','612B0l','611A0L','214G0','81000L','22050')  and gope.CPLWST_0 is not null 

then i have second query :

Select TOP 5 ITMMASTER.ITMREF_0, ITMMASTER.ITMDES1_0,ITMMASTER.TCLCOD_0, a3.TEXTE_0 from ITMMASTER join ITMCATEG cat ON ITMMASTER.TCLCOD_0= cat.TCLCOD_0 LEFT Join ATEXTRA a3 ON ITMMASTER.TCLCOD_0= a3.IDENT1_0 and a3.CODFIC_0 = 'ITMCATEG' and a3.ZONE_0='TCLAXX' AND a3.LANGUE_0='ENG' Where ITMMASTER.ITMREF_0 ='2AL00HR0' 

my question is how do i left join the two.?

MariaDB views: I want to replace repeated multi-table joins in my queries with a view – are there any issues to watch out for?

Rather than

SELECT a.pk, b.pk, c.pk, d.name  FROM a JOIN b on b.pk = a.fk JOIN c on c.pk = b.fk JOIN d on d.pk = c.fk 

I can do

SELECT a_pk, b_pk, c_pk, d_name  FROM view_a_b_c_d 

I have a lot of this sort of thing through my code.

I’ve done a performance test, and the differences seem to be negligible, and I feel it would greatly tidy my codebase up and remove a lot of repetition.

But before I commit to that (as it’d be a big change with a lot of work and testing), I want to check that this IS a good thing to do. I didn’t study computer science and have no formal DBA training. I’m also a sole dev working on my own closed-source product. So I don’t get much input from the outside world, unless I strike out and ask for it.

Thank you – any opinions/experience appreciated.

Postgres Combine Summed Values from 2 Queries / Tables into Single Row

Say I had the following 2 queries, summing values from separate tables.

I would like the sum of recorded time

SELECT      SUM(minutes) as recorded_minutes,     SUM(hours) as recorded_hours FROM recorded_time WHERE     project_id = 1 

To be combined with the sum of budgeted time in a single row

SELECT      SUM(minutes) as budgeted_minutes,     SUM(hours) as budgeted_hours FROM budgeted_time WHERE     project_id = 1 

Is it possible to do this in a single query?

How does Google rank different queries?

I’m looking for help from a backend expert or someone who can teach me more about the process search engines use to code, value, or weigh different search queries around the same topic. I also have a few specific questions about how interlinking between different sites work. I want to know if there are resources to learn more about how search engines are generally coded and developed.

I have reviewed Google’s official guidelines but am looking for something beyond Googlebots, crawlers, etc. I’m aware of the different factors that affect a site’s rankings, but I’m looking for something more technical about the algorithm itself.

Thanks for taking the time to read this; I appreciate any answers I receive.

What can cause higher CPU time and duration for a given set of queries in trace(s) ran on two separate environments?

I’m troubleshooting a performance issue in a SQL Server DR environment for a customer. They are running queries that consistently take longer in their environment than our QA environment. After analyzing traces that were performed in both environments with the same parameters/filters and with the same version of SQL Server (2016 SP2) and the exact same database, we observed that both environment were picking the same execution plan(s) for the queries in question, and the number of reads/writes were close in both environments, however the total duration of the process in question and the CPU time logged in the trace were significantly higher in the customer environment. Duration of all processes in our QA environment was around 18 seconds, the customer was over 80 seconds, our CPU time was close to 10 seconds, theirs was also over 80 seconds. Also worth mentioning, both environments are currently configured to MAXDOP 1.

The customer has less memory (~100GB vs 120GB), and slower disks (10k HHD vs SSD) than our QA environment, but but more CPUs. Both environments are dedicated to this activity and should have little/no external load that wouldn’t match. I don’t have all the details on CPU architecture they are using, waiting for some of that information now. The customer has confirmed they have excluded SQL Server and the data/log files from their virus scanning. Obviously there could be a ton of issues in the hardware configuration.

I’m currently waiting to see a recent snapshot of their wait stats and system DMVs, the data we originally received, didn’t appear to have any major CPU, memory or Disk latency pressure. I recently asked them to check to see if the windows power setting was in performance or balanced mode, however I’m not certain that would have the impact we’re seeing or not if the CPUs were being throttled.

My question is, what factors can affect CPU time and ultimately total duration? Is CPU time, as shown in a sql trace, based primarily on the speed of the processors or are their other factors I should be taking in to consideration. The fact that both are generating the same query plans and all other things being as close as possible to equal, makes me think it’s related to the hardware SQL is installed on.

With respect to differential privacy how to find the global sensitivity of queries like ‘maximum height’ ‘Average height’ etc

As much as I have understood,for any query f(x), we need to take maximum of |f(x)-f(y)| over all neighboring databases.

please explain how to find global sensitivity of queries like average height or maximum height.