Shortcut for ‘Select Top 1000 Rows’

There has to be a better way!

When writing a subquery code I need all the rows listed from a table so I can manipulate what I want grouped/summed/etc. The only way I have found to do this is to find that table in the SQL Object Explorer, right click, and hit ‘Select Top 1000 Rows’, then copying and pasting into my subquery.

Isn’t there a shortcut or something so I don’t have to get out of my query to do that? Why can’t I select the table and right click or something.

Please someone tell me there is a better way!!

Just delete 50.000.000 of rows on a PostgreSQL table and DB still very slow

sorry for my poor english.

I have a postgres DB runing on amazon RDS (db.t3.small), with django as a backend. i have made a mistake and created 50.000.000 rows. when i figure out (because queries on that table where ultra slow) i delete it all. but the queries i make on that table stills super slow. it only have 300 rows now.

i have to clean some cache? i have to wait something? the configuration of the RDS in aws is default.

the engine version of postgres is 12.5, also have postgis installed in it.

i check for vacuum issues and run this command:

SELECT relname AS TableName,n_live_tup AS LiveTuples,n_dead_tup AS DeadTuples,last_autovacuum AS Autovacuum,last_autoanalyze AS Autoanalyze FROM pg_stat_user_tables; 

the table with the problem says:

'appointment_timeslot', 1417, 0, datetime.datetime(2021, 7, 21, 18, 13, 8, 193967, tzinfo=<UTC>), datetime.datetime(2021, 7, 21, 18, 13, 30, 551750, tzinfo=<UTC>) 

check for indexes that Django creates automaticly on that table and i find 4

[ ('appointment_timeslot_pkey', 'CREATE UNIQUE INDEX appointment_timeslot_pkey ON public.appointment_timeslot USING btree (id)') 'appointment_timeslot_home_visit_id_62df4faf', 'CREATE INDEX appointment_timeslot_home_visit_id_62df4faf ON public.appointment_timeslot USING btree (home_visit_id)') ('appointment_timeslot_office_id_6871b47b', 'CREATE INDEX appointment_timeslot_office_id_6871b47b ON public.appointment_timeslot USING btree (office_id)') ('appointment_timeslot_time_range_id_715578fa', 'CREATE INDEX appointment_timeslot_time_range_id_715578fa ON public.appointment_timeslot USING btree (time_range_id)') ] 

Finding rows that are missing specific value in a set

The database that I am using is MySQL. I have an options table that roughly looks like this

id          int(10) PRIMARY annex_id    int(10) title       varchar(255) sort_order  INT(10) NOT NULL 

Every annex (annex_id) has bunch of options enumerated by sort_order column which varies from 1 to N and are controlled by humans.

Due to software bug from a couple of months a lot of annexes do not have an option with sort_order == 1 but rather start from 2 or 3.

What I want to do is get all the annex_id‘s which contain options that do not have sort_order == 1. I’ll show an exemplary table because I feel like I’m explaining it wrong (which could be the reason I can’t come up with a query for it and I’ve been at it for hours).

ID Annex ID Title Sort Order
12 567 Title #1 1
13 567 Title #2 2
14 567 Title #3 3
15 890 Another title #1 2
16 890 Another title #2 3

From the given table I pretty much want craft a query that would get annex id 890 because it doesn’t have a row with sort order == 1.

The table has hundreds of thousands of rows and going at it by hand is insane. I’ve been dealing with databases for years and either I’m overthinking it or idk what but I can’t come up with an efficient query to do this job.

Is it even possible to have a query like this?

SQL Query to Return Rows that don’t exist as blank

I have a database of information of a possible 1-100 rows. The issue is some of the numbers don’t exist.

When I run a query the following results are retuned:

Unit,Location,Name,Description
1,1,a,xxxx
1,2,a,xxxx
1,3,a,xxxx
1,6,a,xxxx
1,7,a,xxxx
1,9,a,xxxx
1,10,a,xxxx

How can I get the following result instead:

Unit,Location,Name,Description
1,1,a,xxxx
1,2,a,xxxx
1,3,a,xxxx
1,4,-,—-
1,5,-,—-
1,6,a,xxxx
1,7,a,xxxx
1,8,-,—-
1,9,a,xxxx
1,10,a,xxxx

How do I join rows from the same table with slightly different info?

We have a new sector of our company that is being put into our employee record feed. However this sector is listed as contractors as there are still parts of the sector that need access to systems that require domain access… so…

I have:

TABLE EMP

EmpID, email, INFO1, INFO2, INF03…

and for Tom Smith 5763, tom.smith@co1.com, blah blah, blah blah, blah blah…

and also for Tom Smith ZZ98401, tom.smith@co2.com, null, null, null…

Not everyone on the feed is like this, only about 10%. Most are in case number 2.

What I am looking for is the easiest way to equate the two rows associated with Tom and fill in as much info on Tom as I can because some fields come from one account and other fields come from the other. The only thing that can accurately make the two rows common is the beginning of the email address – this is standardized. So there should be a few thousand people (rows of duplicates) on my feed that I can combine if it can be done right.

I would also like to capture both EmpIDs somehow in the same row and both email addresses.

copy postgresql db rows from one table to another depending on an id

I want to restructure my database. Therefore I want to move some columns from one table to another. The problem is that I do not want to simply insert the new column somehow into the destinationtable. Both tables have an id and I want to copy the column data depending on the id of both tables.

Table1:

id bigint not null a character varying b date ... 

Table2:

id bigint not null c character varying d date ... 

I have tried with insert into to no avail because this would add new rows in the destination table which is not what I want. Same with select into.

insert into table2 as t2 (c,d) select t1.a, t1.b from table1 as t1 where t1.id = t2.id 

Non-deterministic performance of query on select, from 1s to 60s on table with 1 billion rows

I’m trying to investigate why the performance of this query is so non-deterministic. It can take anywhere from 1 seconds, to 60 seconds and above. The nature of the query is to select a "time window", and get all rows from within that time window.

Here’s the query in question, running on a table of approximately 1 billion rows:

SELECT CAST(extract(EPOCH from ts)*1000000 as bigint) as ts     , ticks     , quantity     , side FROM order_book WHERE ts >= TO_TIMESTAMP(1618882633073383/1000000.0)     AND ts < TO_TIMESTAMP(1618969033073383/1000000.0)     AND zx_prod_id = 0 ORDER BY ts ASC, del desc 

The values within TO_TIMESTAMP will keep sliding forward as I walk the whole table. Here is the EXPLAIN ANALYZE output for the same query on two different time windows:

Slow Performance

Gather Merge  (cost=105996.20..177498.48 rows=586308 width=18) (actual time=45196.559..45280.769 rows=539265 loops=1)   Workers Planned: 6   Workers Launched: 6   Buffers: shared hit=116386 read=42298   ->  Sort  (cost=104996.11..105240.40 rows=97718 width=18) (actual time=45169.717..45176.775 rows=77038 loops=7)         Sort Key: (((date_part('epoch'::text, _hyper_16_214_chunk.ts) * '1000000'::double precision))::bigint), _hyper_16_214_chunk.del DESC         Sort Method: quicksort  Memory: 9327kB         Worker 0:  Sort Method: quicksort  Memory: 8967kB         Worker 1:  Sort Method: quicksort  Memory: 9121kB         Worker 2:  Sort Method: quicksort  Memory: 9098kB         Worker 3:  Sort Method: quicksort  Memory: 9075kB         Worker 4:  Sort Method: quicksort  Memory: 9019kB         Worker 5:  Sort Method: quicksort  Memory: 9031kB         Buffers: shared hit=116386 read=42298         ->  Result  (cost=0.57..96897.07 rows=97718 width=18) (actual time=7.475..45131.932 rows=77038 loops=7)               Buffers: shared hit=116296 read=42298               ->  Parallel Index Scan using _hyper_16_214_chunk_order_book_ts_idx on _hyper_16_214_chunk  (cost=0.57..95187.01 rows=97718 width=18) (actual time=7.455..45101.670 rows=77038 loops=7)                     Index Cond: ((ts >= '2021-04-22 01:34:31.357179+00'::timestamp with time zone) AND (ts < '2021-04-22 02:34:31.357179+00'::timestamp with time zone))                     Filter: (zx_prod_id = 0)                     Rows Removed by Filter: 465513                     Buffers: shared hit=116296 read=42298 Planning Time: 1.107 ms JIT:   Functions: 49   Options: Inlining false, Optimization false, Expressions true, Deforming true   Timing: Generation 9.273 ms, Inlining 0.000 ms, Optimization 2.008 ms, Emission 36.235 ms, Total 47.517 ms Execution Time: 45335.178 ms 

Fast Performance

Gather Merge  (cost=105095.94..170457.62 rows=535956 width=18) (actual time=172.723..240.628 rows=546367 loops=1)   Workers Planned: 6   Workers Launched: 6   Buffers: shared hit=158212   ->  Sort  (cost=104095.84..104319.16 rows=89326 width=18) (actual time=146.702..152.849 rows=78052 loops=7)         Sort Key: (((date_part('epoch'::text, _hyper_16_214_chunk.ts) * '1000000'::double precision))::bigint), _hyper_16_214_chunk.del DESC         Sort Method: quicksort  Memory: 11366kB         Worker 0:  Sort Method: quicksort  Memory: 8664kB         Worker 1:  Sort Method: quicksort  Memory: 8986kB         Worker 2:  Sort Method: quicksort  Memory: 9116kB         Worker 3:  Sort Method: quicksort  Memory: 8858kB         Worker 4:  Sort Method: quicksort  Memory: 9057kB         Worker 5:  Sort Method: quicksort  Memory: 6611kB         Buffers: shared hit=158212         ->  Result  (cost=0.57..96750.21 rows=89326 width=18) (actual time=6.145..127.591 rows=78052 loops=7)               Buffers: shared hit=158122               ->  Parallel Index Scan using _hyper_16_214_chunk_order_book_ts_idx on _hyper_16_214_chunk  (cost=0.57..95187.01 rows=89326 width=18) (actual time=6.124..114.023 rows=78052 loops=7)                     Index Cond: ((ts >= '2021-04-22 01:34:31.357179+00'::timestamp with time zone) AND (ts < '2021-04-22 02:34:31.357179+00'::timestamp with time zone))                     Filter: (zx_prod_id = 4)                     Rows Removed by Filter: 464498                     Buffers: shared hit=158122 Planning Time: 0.419 ms JIT:   Functions: 49   Options: Inlining false, Optimization false, Expressions true, Deforming true   Timing: Generation 10.405 ms, Inlining 0.000 ms, Optimization 2.185 ms, Emission 39.188 ms, Total 51.778 ms Execution Time: 274.413 ms 

I interpreted this output as most of the blame lying on this parallel index scan.

At first, I tried to raise work_mem to 1 GB and shared_buffers to 24 GB, thinking that maybe it couldn’t fit all the stuff it needed in RAM, but that didn’t seem to help.

Next, I tried creating an index on (zx_prod_id, ts), thinking that the filter on the parallel index scan might be taking a while, but that didn’t seem to do anything either.

I’m no database expert, and so I’ve kind of exhausted the limits of my knowledge.

Thanks in advance for any suggestions!

What is best practice for referencing data rows within stored procedures – via PK, code column or data value?

Suppose you have a table for colours with columns:

  • id = automatically incrementing integer, primary key
  • code = short code reference for the colour, unique
  • colour = human-readable name of colour, unique

Example values might be:

  • 1, BL, Blue
  • 2, GR, Green

Now imagine you have a stored procedure that, at some point, needs to reference this table. Let’s say the business logic says to obtain the colour "Green". To achieve this, you could have any of the following three WHERE clauses:

  • WHERE id = 2
  • WHERE code = GR
  • WHERE colour = Green

Now, if the system is designed such that it is agreed that a code value, once created, never changes, then, in my view, that is the best column to reference because:

  • It is an alternate key
  • It is human-readable for people who maintain the code
  • It will not be impacted when the business decides to change the colour value to ‘Sea Green’

However, if a legacy table lacks such code values, what, in your opinion, is best practice? To reference the id column, or the colour column?

If you reference the id column, the code is not readable unless you then also add comments – you shouldn’t have to comment simple things like this. It sucks figuring out what statements like WHERE id not in (1, 7, 17, 24, 56) mean.

I’m not sure how often, in reality, the id value might change – but consider if you run a script during development to insert new colours but then delete those and insert some more. If your stored procedure references the id values from that last set of colours inserted but when you create your new colours in your next environment you skip the step that inserted the colours which ended up deleted, then the id values won’t match in that next environment. Bad practice, but it can happen – a developer develops their script on a dev instance not thinking that the id values will conflict with production (which, for example, may have had additional colours created manually by the business before your colour creation script runs).

If you reference the colour column, you run the risk that if the business does ask to update the description from ‘Green’ to ‘Sea Green’, that your procedure will begin to fail.

I suppose a further solution is to implement the code column when you need it, if it isn’t there already – probably the best solution?

Select all rows that have all of the following column values

I have the following tables Genres, Films, Directors. They have the following schema:

CREATE TABLE GENRES(     GID INTEGER PRIMARY KEY,     GENRE VARCHAR(20) UNIQUE NOT NULL );  CREATE TABLE Films(     FID INTEGER PRIMARY KEY,     Title VARCHAR(45) UNIQUE NOT NULL,     DID INTEGER NOT NULL,     GID INTEGER NOT NULL,     FOREIGN KEY (DID) REFERENCES Directors(DID),     FOREIGN KEY (GID) REFERENCES Genres(DID) );  CREATE TABLE Directors(     DID INTEGER PRIMARY KEY,     First_Name VARCHAR(20) NOT NULL,     Last_Name VARCHAR(20) NOT NULL ); 

I want to write a query that will allow me to select all of Director information for every director that has made atleast one movie in the same genre(s) as another director. For example if Stanley Kubrick has made films in genres ‘Sci-Fi’, ‘Thriller’, and ‘Crime’, I want to select all the directors who have made at least 1 sci-fi AND 1 thriller AND 1 crime film.

I’ve tried the query seen below but this will give me directors who have made atleast 1 sci-fi OR 1 thriller OR 1 crime film.

SELECT DISTINCT D.DID, D.First_Name, D.Last_Name FROM Directors D LEFT JOIN Films F ON F.DID = D.DID LEFT JOIN Genres G ON G.GID = B.GID WHERE G.Genre IN (   SELECT DISTINCT G1.Genre   FROM Generes G1   LEFT JOIN Films F1   ON F1.GID = G1.GID   LEFT JOIN Directors D1   ON D1.DID = D1.DID   WHERE D1.First_Name = 'Stanley'   AND D1.Last_Name = 'Kubrick' );  Additionally, I am not able to check before hand which Genres the director in question has been involved with. The query should work with the only given information being the Directors First and Last name. 

Combine duplicate rows on column1 sum column2

I have the following data.

Id      ParentId    ProductId   Quantity 215236  19297       16300319    60 215221  19297       16314611    6 215234  19297       16314670    1    <- Duplicate productid 215235  19297       16314670    2    <- Duplicate productid 215195  19297       16314697    20 215205  19297       16321820    75 215216  19297       16329252    10 215233  19297       16331834    9 215224  19297       16519280    40 

Selecting unique records is easy. Or grouping the data on ProductId is also possible. I need a result that only contains a unique Product Id with the same parentid and the quantities summed up.

The result should look like this.

Id      ParentId    ProductId   Quantity 215236  19297       16300319    60 215221  19297       16314611    6 215234  19297       16314670    3 215195  19297       16314697    20 215205  19297       16321820    75 215216  19297       16329252    10 215233  19297       16331834    9 215224  19297       16519280    40 

The quantities of the duplicates

215234  19297       16314670    1    <- Duplicate productid 215235  19297       16314670    2    <- Duplicate productid 

should result in

215234  19297       16314670    3