Radix sort slower than Quick sort?

I would like to demonstrate that sometime radix-sort is better than quick-sort. In this example I am using the program below:

#include <stdio.h> #include <stdlib.h> #include <assert.h> #include <string.h> #include <time.h> #include <math.h>  int cmpfunc (const void * a, const void * b) {    return ( *(int*)a - *(int*)b ); }  void bin_radix_sort(int *a, const long long size, int digits) {     assert(digits % 2 == 0);      long long count[2];     int *b = malloc(size * sizeof(int));     int exp = 0;      while(digits--) {         // Count elements         count[0] = count[1] = 0;         for (int i = 0; i < size; i++)             count[(a[i] >> exp) & 0x01]++;          // Cumulative sum         count[1] += count[0];          // Build output array         for (int i = size - 1; i >= 0; i--)             b[--count[(a[i] >> exp) & 0x01]] = a[i];          exp++;         int *p = a; a = b; b = p;     };      free(b); }  struct timespec start;  void tic() {     timespec_get(&start, TIME_UTC); }  double toc() {     struct timespec stop;     timespec_get(&stop, TIME_UTC);     return stop.tv_sec - start.tv_sec + (         stop.tv_nsec - start.tv_nsec     ) * 1e-9; }  int main(void) {     const long long n = 1024 * 1024 * 50;     printf("Init memory (%lld MB)...\n", n / 1024 / 1024 * sizeof(int));      int *data = calloc(n, sizeof(int));      printf("Sorting n = %lld data elements...\n", n);      long long O;     tic();     O = n * log(n);     qsort(data, n, sizeof(data[0]), cmpfunc);     printf("%lld %lf s\n", O, toc());      int d = 6;     tic();     O = d * (n + 2);     bin_radix_sort(data, n, d);     printf("%lld %lf s\n", O, toc()); } 

It performs as follow:

$   gcc bench.c -lm $   ./a.out  Init memory (200 MB)... Sorting n = 52428800 data elements... 931920169 1.858300 s 314572812 1.541998 s 

I know that Quick Sort will perform in O(n log n) while Radix Sort will be in O(d (n + r)) ~= O(6 * n). For n = 52428800, log(n) = 17. I am then expecting Radix Sort to be 3 times faster than Quick Sort…

This is not what I observe.

What am I missing?

Aurora PostgreSQL database using a slower query plan than a normal PostgreSQL for an identical query?

Following the migration of an application and its database from a classical PostgreSQL database to an Amazon Aurora RDS PostgreSQL database (both using 9.6 version), we have found that a specific query is running much slower — around 10 times slower — on Aurora than on PostgreSQL.

Both databases have the same configuration, be it for the hardware or the pg_conf.

The query itself is fairly simple. It is generated from our backend written in Java and using jOOQ for writing the queries:

with "all_acp_ids"("acp_id") as (     select acp_id from temp_table_de3398bacb6c4e8ca8b37be227eac089 )  select distinct "public"."f1_folio_milestones"."acp_id",      coalesce("public"."sa_milestone_overrides"."team",      "public"."f1_folio_milestones"."team_responsible")  from "public"."f1_folio_milestones"  left outer join      "public"."sa_milestone_overrides" on (         "public"."f1_folio_milestones"."milestone" = "public"."sa_milestone_overrides"."milestone"          and "public"."f1_folio_milestones"."view" = "public"."sa_milestone_overrides"."view"          and "public"."f1_folio_milestones"."acp_id" = "public"."sa_milestone_overrides"."acp_id" ) where "public"."f1_folio_milestones"."acp_id" in (     select "all_acp_ids"."acp_id" from "all_acp_ids" ) 

With temp_table_de3398bacb6c4e8ca8b37be227eac089 being a single-column table, f1_folio_milestones (17 million entries) and sa_milestone_overrides (Around 1 million entries) being similarly designed tables having indexes on all the columns used for the LEFT OUTER JOIN.

When we run it on the normal PostgreSQL database, it generates the following query plan:

Unique  (cost=4802622.20..4868822.51 rows=8826708 width=43) (actual time=483.928..483.930 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.005 rows=1 loops=1)   ->  Sort  (cost=4802598.60..4824665.37 rows=8826708 width=43) (actual time=483.927..483.927 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Hash Left Join  (cost=46051.06..3590338.34 rows=8826708 width=43) (actual time=483.905..483.917 rows=4 loops=1)               Hash Cond: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))               ->  Nested Loop  (cost=31.16..2572.60 rows=8826708 width=37) (actual time=0.029..0.038 rows=4 loops=1)                     ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.009..0.010 rows=1 loops=1)                           Group Key: all_acp_ids.acp_id                           ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.006..0.007 rows=1 loops=1)                     ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..12.65 rows=5 width=37) (actual time=0.018..0.025 rows=4 loops=1)                           Index Cond: (acp_id = all_acp_ids.acp_id)               ->  Hash  (cost=28726.78..28726.78 rows=988178 width=34) (actual time=480.423..480.423 rows=987355 loops=1)                     Buckets: 1048576  Batches: 1  Memory Usage: 72580kB                     ->  Seq Scan on sa_milestone_overrides  (cost=0.00..28726.78 rows=988178 width=34) (actual time=0.004..189.641 rows=987355 loops=1) Planning time: 3.561 ms Execution time: 489.223 ms 

And it goes pretty smoothly as one can see — less than a second for the query. But on the Aurora instance, this happens:

Unique  (cost=2632927.29..2699194.83 rows=8835672 width=43) (actual time=4577.348..4577.350 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.001..0.001 rows=1 loops=1)   ->  Sort  (cost=2632903.69..2654992.87 rows=8835672 width=43) (actual time=4577.348..4577.348 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Merge Left Join  (cost=1321097.58..1419347.08 rows=8835672 width=43) (actual time=4488.369..4577.330 rows=4 loops=1)               Merge Cond: ((f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))               ->  Sort  (cost=1194151.06..1216240.24 rows=8835672 width=37) (actual time=0.039..0.040 rows=4 loops=1)                     Sort Key: f1_folio_milestones.view, f1_folio_milestones.milestone, f1_folio_milestones.acp_id                     Sort Method: quicksort  Memory: 25kB                     ->  Nested Loop  (cost=31.16..2166.95 rows=8835672 width=37) (actual time=0.022..0.028 rows=4 loops=1)                           ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.006..0.006 rows=1 loops=1)                                 Group Key: all_acp_ids.acp_id                                 ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.003..0.004 rows=1 loops=1)                           ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..10.63 rows=4 width=37) (actual time=0.011..0.015 rows=4 loops=1)                                 Index Cond: (acp_id = all_acp_ids.acp_id)               ->  Sort  (cost=126946.52..129413.75 rows=986892 width=34) (actual time=4462.727..4526.822 rows=448136 loops=1)                     Sort Key: sa_milestone_overrides.view, sa_milestone_overrides.milestone, sa_milestone_overrides.acp_id                     Sort Method: quicksort  Memory: 106092kB                     ->  Seq Scan on sa_milestone_overrides  (cost=0.00..28688.92 rows=986892 width=34) (actual time=0.003..164.348 rows=986867 loops=1) Planning time: 1.394 ms Execution time: 4583.295 ms 

It effectively has a lower global cost, but takes almost 10 times as much time than before!

Disabling merge joins makes Aurora revert to a hash join, which gives the expected execution time — but permanently disabling it is not an option. Curiously though, disabling nested loops gives an even better result while still using a merge join…

Unique  (cost=3610230.74..3676431.05 rows=8826708 width=43) (actual time=2.465..2.466 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.004 rows=1 loops=1)   ->  Sort  (cost=3610207.14..3632273.91 rows=8826708 width=43) (actual time=2.464..2.464 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Merge Left Join  (cost=59.48..2397946.87 rows=8826708 width=43) (actual time=2.450..2.455 rows=4 loops=1)               Merge Cond: (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text)               Join Filter: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text))               ->  Merge Join  (cost=40.81..2267461.88 rows=8826708 width=37) (actual time=2.312..2.317 rows=4 loops=1)                     Merge Cond: (f1_folio_milestones.acp_id = all_acp_ids.acp_id)                     ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..2223273.29 rows=17653416 width=37) (actual time=0.020..2.020 rows=1952 loops=1)                     ->  Sort  (cost=40.24..40.74 rows=200 width=32) (actual time=0.011..0.012 rows=1 loops=1)                           Sort Key: all_acp_ids.acp_id                           Sort Method: quicksort  Memory: 25kB                           ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.008..0.008 rows=1 loops=1)                                 Group Key: all_acp_ids.acp_id                                 ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.005..0.005 rows=1 loops=1)               ->  Materialize  (cost=0.42..62167.38 rows=987968 width=34) (actual time=0.021..0.101 rows=199 loops=1)                     ->  Index Scan using sa_milestone_overrides_acp_id_index on sa_milestone_overrides  (cost=0.42..59697.46 rows=987968 width=34) (actual time=0.019..0.078 rows=199 loops=1) Planning time: 5.500 ms Execution time: 2.516 ms 

We have asked the AWS support team, they are still looking at the issue, but we are wondering what could cause that issue to happen. What could explain such a behaviour difference?

While looking at some of the documentation for the database, I read that Aurora favors cost over time — and hence it uses the query plan that has the lowest cost.

But as we can see, it’s far from being optimal given its response time… Is there a threshold or a setting that could make the database use a more expensive — but faster — query plan?

Would Triple DES-X with 9 keys be much slower than standard Triple DES?

Since a single hardware pass of an XOR with a 64 bit key is very fast, would Triple DES-X using nine 64 bit keys used in the following manner be virtually identical in terms of code size, memory consumption, and execution speed to 3DES?

XOR (Key 1) DES (Key 2) XOR (Key 3)

XOR (Key 4) DES (Key 5) XOR (Key 6)

XOR (Key 7) DES (Key 8) XOR (Key 9)

Additionally, would it be significantly stronger? Would it still suffer from the same block size based vulnerability of DES-X?

Generalization of code is slower than particular case

I wrote the following Mathematica module:

QNlalternative[NN_, l_, f_] := Module[{s, wz, w, z, j, lvec},    s = 0;    Do[       wz = Table[weightsNodesQ1l@lvec@i, {i, NN}];       w = Table[wz[[i]][[1, All]], {i, NN}];       z = Table[wz[[i]][[2, All]], {i, NN}];       s = s + Function[Sum[(f @@ (Table[z[[i]][[j[i]]], {i, NN}]))*(Times @@ (Table[                 w[[i]][[j[i]]], {i, NN}])), ##]] @@                  Table[{j[k], 2^lvec[k] + 1}, {k, NN}],       ##       ] & @@ Table[{lvec[i], l + NN - 1 - Total@Table[lvec[k], {k, i - 1}]}, {i, NN}];    Return[s]    ]; 

This module calls another module:

sumPrime[v_List] := First[v]/2 + Total[Delete[v, 1]]  weightsNodes[NN_] := Module[{w, z},    w = Table[4/NN*sumPrime[Table[1/(1 - n^2)*Cos[n*k*Pi/NN], {n, 0., NN, 2}]], {k, 0., NN}];    z = Table[Cos[k*Pi/NN], {k, 0., NN}];    Return[{w, z}]    ];  weightsNodesQ1l[l_] := weightsNodes[2^l] 

This code is related to a mathematical problem I am solving (it is a modification). When I first was thinking about how to write the module QNlalternative, I wrote the particular case of NN=5 in a sloppy manner, using repeated statements, as follows:

Q5l[l_, f_] :=    Module[{s, wzl1, wzl2, wzl3, wzl4, wzl5, wl1, zl1, wl2, zl2, wl3,      zl3, wl4, zl4, wl5, zl5},    s = 0;    Do[     wzl1 = weightsNodesQ1l[l1];     wzl2 = weightsNodesQ1l[l2];     wzl3 = weightsNodesQ1l[l3];     wzl4 = weightsNodesQ1l[l4];     wzl5 = weightsNodesQ1l[l5];     wl1 = wzl1[[1, All]]; zl1 = wzl1[[2, All]];     wl2 = wzl2[[1, All]]; zl2 = wzl2[[2, All]];     wl3 = wzl3[[1, All]]; zl3 = wzl3[[2, All]];     wl4 = wzl4[[1, All]]; zl4 = wzl4[[2, All]];     wl5 = wzl5[[1, All]]; zl5 = wzl5[[2, All]];     s = s +  Sum[f[zl1[[i1]], zl2[[i2]], zl3[[i3]], zl4[[i4]], zl5[[i5]]]*         wl1[[i1]]*wl2[[i2]]*wl3[[i3]]*wl4[[i4]]*wl5[[i5]], {i1, 1,          2^l1 + 1}, {i2, 1, 2^l2 + 1}, {i3, 1, 2^l3 + 1}, {i4, 1,          2^l4 + 1}, {i5, 1, 2^l5 + 1}],     {l1, 1, l + 5 - 1}, {l2, 1, l + 5 - 1 - l1}, {l3, 1,       l + 5 - 1 - l1 - l2}, {l4, 1, l + 5 - 1 - l1 - l2 - l3}, {l5, 1,       l + 5 - 1 - l1 - l2 - l3 - l4}     ];    Return[s]    ]; 

The module Q5l is much faster than QNlalternative:

AbsoluteTiming[QNlalternative3[5, 6, Sin[Plus[##]]^2 &]] (* {19.4634, 6213.02} *)  AbsoluteTiming[Q5l[6, Sin[Plus[##]]^2 &]] (* {6.64357, 6213.02} *) 

Why is QNlalternative slower? Which step of the generalization of Q5l to an arbitrary NN is too slow?

MariaDB Inserts are getting slower and slower (7x tables, ~ 2.8M and 200MB)

I have auto increment on each of a table. One unique ID that is consistent of 10 numbers, and each table have ~ 6 big int columns (values are small from 1-60k), and from 0 to 4 var chars (~ up to 500 characters, on average from 5 to 50 characters).

I am fighting with this for months and can’t make it production stage :(, basically it drops from ~ 170 inserts (from app perspective) to ~ 40 just after ~ 200-500k inserts.

This is nothing as I’ve worked with DB that was holding trillions of columns and auto increment and huge varchars. (however paid solution :().

I already tweaked the config so many times but still getting to the point where server is using ~ 950% & .net core 25% (of all cores).

Machine has i9 9900k 8c/16t, 64GB RAM, 2x NVME 2TB

I can’t even run @ 5 minute API test as it won’t be able to process all data from a queue ūüôĀ (API can accept ~ 20k/s).

Buffers, read io, inno_db other tweaks for a commit etc. were applied, nothing seems to be working.

Looks like it cannot for some reason handle just so little data and I cannot figure out why (I never had any real experience with free databases, so I only assume that it should be able to insert 300k records within 60 seconds and sustain this for ~ 10TB).

Is there some theoretical verification or explanation of why KDTree gets slower when the dimensionality gets higher?

This post and this post indicate that when the dimensionality gets higher, KDTree gets slower.

Per scikit-learn doc, the KDTree becomes inefficient as D(dimensions) grows very large (say, $ D>20$ ): this is one manifestation of the so-called ‚Äúcurse of dimensionality‚ÄĚ.

Is there some theoretical verification or explanation of why KDTree gets slower when the dimensionality gets higher?

TP-LINK WN821N MADE MY INTERNET SLOWER?? Whyyy…What am i doing wrong?

I downloaded the driver for windows 10 on my laptop and my internet went from 20-10 basically cut it in half instead of improving it. I saw some posts about different drivers, kernal, bios info, and other possible solutions but i am not familiar with most of it. Any help would be greatly appreciated.

How to deliberately slow internet connection to better understand UX on slower connections?

I would like to better understand UX on slower connections.

Is there a set of tools or instructions on how to achieve this on mac/windows?

FYI my software isn’t a website or app, it’s desktop software that runs on mac and windows.

Perhaps there are tools available to help? (I don’t mind a DIY solution though)

Why is RAID 0 slower then single NVME drive

I created a MD RAID 0 array using two NVME drives. I realize using two different drives, rather than identical drives is not optimal, but when I just bought the new one they are all faster than my 1 year old drive. Using Patriot Scorch 512GB Nvme and Mushkin Helix 256GB. I made a 32GB partition on each, and created the MD RAID 0 array using 16k chunks, then formated to EXT4 to test.

kde-swiebel@T5600-kde:~$   cat /proc/mdstat Personalities : [raid0]  unused devices: <none> kde-swiebel@T5600-kde:~$   sudo mdadm -Cv -l0 -c16 -n2 /dev/md0   /dev/nvme0n1p3 /dev/nvme1n1p2  mdadm: /dev/nvme0n1p3 appears to contain an ext2fs file system        size=34277376K  mtime=Wed Dec 31 17:00:00 1969 mdadm: /dev/nvme1n1p2 appears to contain an ext2fs file system        size=34277376K  mtime=Wed Dec 31 17:00:00 1969 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. kde-swiebel@T5600-kde:~$   cat /proc/mdstat Personalities : [raid0]  md0 : active raid0 nvme1n1p2[1] nvme0n1p3[0]       68487168 blocks super 1.2 16k chunks  unused devices: <none> kde-swiebel@T5600-kde:~$   sudo mkfs.ext4 -F /dev/md0 mke2fs 1.44.6 (5-Mar-2019) Discarding device blocks: done                             Creating filesystem with 17121792 4k blocks and 4284416 inodes Filesystem UUID: f9826f0f-7ba4-40b2-9c9f-861379e82703 Superblock backups stored on blocks:          32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,          4096000, 7962624, 11239424  Allocating group tables: done                             Writing inode tables: done                             Creating journal (131072 blocks): done Writing superblocks and filesystem accounting information: done    

Benchmarks from Gnome-disk-utility 3.32.1 1000 samples, 1MB size

Patriot Scorch average read: 1362.2 MB/s Average write: 228.7 MB/s Average access time 0.07 msec

Mushkin Helix Average read: 1529.5 MB/s Average write: 1003.1 MB/s Average access time 0.06 msec

/dev/md0 Average read: 1751.3 MB/s Average write: 26.9 MB/s Average access time 0.05 msec

Change to 100 samples, 100MB size, read-only /dev/md0 Average read: 2927.6 MB/s Average write: NA Average access time 0.04 msec

The read speed, while it is faster than either drive alone, is not too impressive but when I changed the test to read-only, it greatly improved. The write speed, however, was horrible. Write speed was 10% of the slowest drive.

I tried removing the RAID array and creating it with different chunk sizes (4K, 32K, 128K) but little changed with each. Write speed was between 11MB/s to 32MB/s

According to Phoronix and other sites I’m doing this right and should see much more gain.

Using mdadm version 4.1-1ubuntu1, kernel 5.0.0-25-generic Running on a Dell T5600 with dual Xeon E5-2609(Quad core 2.4Ghz)

Any ideas what I’ve done wrong?

Do database actions inside a transaction become slower as the transaction grows?

I have a PostgreSQL database running a transaction. Inside that transaction I process around 160 records from an Excel file. As it runs, I see the processing becomes slower with each record (0.05 secs for the first to 0.20 for the last).

Could this be because of the transaction, or should I like elsewhere?