Performance of select from a 3d list – Mathematica slower than Python

I am creating a random 3d data set in Matematica 12.1. Then I am selecting all points that are in a certain range of one axis.

The same I am doing in Python (same computer, Python 3.8.5, numpy 1.19.2)

RESULT: It seems that Python is able to select much faster (1.7 sec) than Mathematica (5.2 sec). What is the reason for that? For selection in Mathematica I used the fastest solution, which is by Carl Woll (see here at bottom).

SeedRandom[1]; coordinates = RandomReal[10, {100000000, 3}];  selectedCoordinates =     Pick[coordinates,      Unitize@Clip[coordinates[[All, 1]], {6, 7}, {0, 0}],      1]; // AbsoluteTiming  {5.16326, Null}  Dimensions[coordinates]  {100000000, 3}  Dimensions[selectedCoordinates]  {10003201, 3} 


import time import numpy as np   np.random.seed(1) coordinates = np.random.random_sample((100000000,3))*10  start = time.time() selectedCoordinates = coordinates[(coordinates[:,0] > 6) & (coordinates[:,0] < 7)] end = time.time()  print(end-start)  print(coordinates.shape)  print(selectedCoordinates.shape)  1.6979997158050537  (100000000, 3)  (9997954, 3) 

Using double write buffer is 8x slower in SSD (compared with 2x~3x in HDD)

I understand the double-write-buffer enhances the reliability of data, so it makes transactions slower. But it is amazing that the slow down is such severe in the newest Samsung 980 pro (M.2 PCIe 4.0, which is about 400$ for 1TB).

Configurations: other parameters are defaults.
CPU: AMD Ryzen 3900XT
MEM: 64GB, 3200MHz
OS: Ubuntu 20.10, all disks are ext4
MySQL: 8.0.22

Why does this happen? Did I hit a performance bug?


en ter image description here

Cluster is causing my query to run SLOWER then before

So I have been stuck on this for 6 hours now and I have no clue what to do. I am doing university homework that requires us to create a unoptomized sql query (does not have to make sense) and then apply index’s and see if it makes it faster (which it did for me, from 0.70 elapsed time to 0.66) and then we had to apply clusters.

I applied clusters and it has now almost doubled the amount taken to finish the query. From 0.70 to 1.15. Below is how I specified my cluster:

CREATE CLUSTER customer2_custid25 (custid NUMBER(8))    SIZE 270  TABLESPACE student_ts;  

I tried all my previous times with INITIAL and NEXT but that seemed not to make a difference. Below are the tables:

CREATE TABLE CUSTOMER18 (       CustID         NUMBER(8) NOT NULL,     FIRST_NAME     VARCHAR2(15),     SURNAME     VARCHAR2(15),     ADDRESS     VARCHAR2(20),     PHONE_NUMBER NUMBER(12))       CLUSTER customer2_custid25(CustID);   CREATE TABLE product18(       ProdID     NUMBER(10) NOT NULL,     PName    Varchar2(6),     PDesc    Varchar2(15),     Price    Number(8),     QOH        Number(5));   CREATE TABLE sales18(       SaleID     NUMBER(10) NOT NULL,     SaleDate    DATE,     Qty            Number(5),     SellPrice    Number(10),     CustID        NUMBER(8),     ProdID        NUMBER(10))       CLUSTER customer2_custid25(CustID);     CREATE INDEX customer2_custid_clusterindxqg ON CLUSTER customer2_custid25 TABLESPACE student_ts ;  

I also tried taking the tablespace section in the cluster index away.

I followed this formula to help calculate cluster sizes:

"Size of a cluster is the size of a parent row + (size of Child row * average number of children). "

This brought me to the size of 270. However, after testing sizes (going up 20) from 250 to 350 I found 320 to be the fastest at 1.15.

No matter what I try, I can not for the love of me get it lower then my base query times.

Other students have done the same and halved their query time.

All help is really appreciated.

Ray-Box (AABB) is slower than without

at the moment im trying my best to write my own raytracer (and its quite fun actually). So the last days I tried implementing a bounding box algorithm to it. But im getting a much slower framerate with the bounding boxes turned on :(. I think it has something to do with checking the box with every ray but I dont know how I could change that. Here is my code:

the Intersection algorythm

bool Intersect(Ray r, float3 lb, float3 rt) {     float3 dir_inv = 1 / r.direction;          double t1 = (lb[0] - r.origin[0]) * dir_inv[0];     double t2 = (rt[0] - r.origin[0]) * dir_inv[0];      double tmin = min(t1, t2);     double tmax = max(t1, t2);      for (int i = 1; i < 3; ++i)     {         t1 = (lb[i] - r.origin[i]) * dir_inv[i];         t2 = (rt[i] - r.origin[i]) * dir_inv[i];          tmin = max(tmin, min(t1, t2));         tmax = min(tmax, max(t1, t2));     }      return tmax > max(tmin, 0.0); } 

My trace function:

RayHit Trace(Ray ray) {     RayHit bestHit = CreateRayHit();     uint count, stride, i;      // Trace ground plane     IntersectGroundPlane(ray, bestHit);      // Trace spheres     _Spheres.GetDimensions(count, stride);     for (i = 0; i < count; i++)     {         if (Intersect(ray, _Spheres[i].position - (_Spheres[i].radius), _Spheres[i].position + (_Spheres[i].radius)))             IntersectSphere(ray, bestHit, _Spheres[i]);     }          // Trace mesh objects     _MeshObjects.GetDimensions(count, stride);     for (i = 0; i < count; i++)     {         //if (Intersect(ray, float3(0.0f, 0.0f, 0.0f), float3(10.0f, 10.0f, 10.0f)))             //IntersectMeshObject(ray, bestHit, _MeshObjects[i]);     }      return bestHit; } 

thanks in advance

Radix sort slower than Quick sort?

I would like to demonstrate that sometime radix-sort is better than quick-sort. In this example I am using the program below:

#include <stdio.h> #include <stdlib.h> #include <assert.h> #include <string.h> #include <time.h> #include <math.h>  int cmpfunc (const void * a, const void * b) {    return ( *(int*)a - *(int*)b ); }  void bin_radix_sort(int *a, const long long size, int digits) {     assert(digits % 2 == 0);      long long count[2];     int *b = malloc(size * sizeof(int));     int exp = 0;      while(digits--) {         // Count elements         count[0] = count[1] = 0;         for (int i = 0; i < size; i++)             count[(a[i] >> exp) & 0x01]++;          // Cumulative sum         count[1] += count[0];          // Build output array         for (int i = size - 1; i >= 0; i--)             b[--count[(a[i] >> exp) & 0x01]] = a[i];          exp++;         int *p = a; a = b; b = p;     };      free(b); }  struct timespec start;  void tic() {     timespec_get(&start, TIME_UTC); }  double toc() {     struct timespec stop;     timespec_get(&stop, TIME_UTC);     return stop.tv_sec - start.tv_sec + (         stop.tv_nsec - start.tv_nsec     ) * 1e-9; }  int main(void) {     const long long n = 1024 * 1024 * 50;     printf("Init memory (%lld MB)...\n", n / 1024 / 1024 * sizeof(int));      int *data = calloc(n, sizeof(int));      printf("Sorting n = %lld data elements...\n", n);      long long O;     tic();     O = n * log(n);     qsort(data, n, sizeof(data[0]), cmpfunc);     printf("%lld %lf s\n", O, toc());      int d = 6;     tic();     O = d * (n + 2);     bin_radix_sort(data, n, d);     printf("%lld %lf s\n", O, toc()); } 

It performs as follow:

$   gcc bench.c -lm $   ./a.out  Init memory (200 MB)... Sorting n = 52428800 data elements... 931920169 1.858300 s 314572812 1.541998 s 

I know that Quick Sort will perform in O(n log n) while Radix Sort will be in O(d (n + r)) ~= O(6 * n). For n = 52428800, log(n) = 17. I am then expecting Radix Sort to be 3 times faster than Quick Sort…

This is not what I observe.

What am I missing?

Aurora PostgreSQL database using a slower query plan than a normal PostgreSQL for an identical query?

Following the migration of an application and its database from a classical PostgreSQL database to an Amazon Aurora RDS PostgreSQL database (both using 9.6 version), we have found that a specific query is running much slower — around 10 times slower — on Aurora than on PostgreSQL.

Both databases have the same configuration, be it for the hardware or the pg_conf.

The query itself is fairly simple. It is generated from our backend written in Java and using jOOQ for writing the queries:

with "all_acp_ids"("acp_id") as (     select acp_id from temp_table_de3398bacb6c4e8ca8b37be227eac089 )  select distinct "public"."f1_folio_milestones"."acp_id",      coalesce("public"."sa_milestone_overrides"."team",      "public"."f1_folio_milestones"."team_responsible")  from "public"."f1_folio_milestones"  left outer join      "public"."sa_milestone_overrides" on (         "public"."f1_folio_milestones"."milestone" = "public"."sa_milestone_overrides"."milestone"          and "public"."f1_folio_milestones"."view" = "public"."sa_milestone_overrides"."view"          and "public"."f1_folio_milestones"."acp_id" = "public"."sa_milestone_overrides"."acp_id" ) where "public"."f1_folio_milestones"."acp_id" in (     select "all_acp_ids"."acp_id" from "all_acp_ids" ) 

With temp_table_de3398bacb6c4e8ca8b37be227eac089 being a single-column table, f1_folio_milestones (17 million entries) and sa_milestone_overrides (Around 1 million entries) being similarly designed tables having indexes on all the columns used for the LEFT OUTER JOIN.

When we run it on the normal PostgreSQL database, it generates the following query plan:

Unique  (cost=4802622.20..4868822.51 rows=8826708 width=43) (actual time=483.928..483.930 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.005 rows=1 loops=1)   ->  Sort  (cost=4802598.60..4824665.37 rows=8826708 width=43) (actual time=483.927..483.927 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Hash Left Join  (cost=46051.06..3590338.34 rows=8826708 width=43) (actual time=483.905..483.917 rows=4 loops=1)               Hash Cond: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))               ->  Nested Loop  (cost=31.16..2572.60 rows=8826708 width=37) (actual time=0.029..0.038 rows=4 loops=1)                     ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.009..0.010 rows=1 loops=1)                           Group Key: all_acp_ids.acp_id                           ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.006..0.007 rows=1 loops=1)                     ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..12.65 rows=5 width=37) (actual time=0.018..0.025 rows=4 loops=1)                           Index Cond: (acp_id = all_acp_ids.acp_id)               ->  Hash  (cost=28726.78..28726.78 rows=988178 width=34) (actual time=480.423..480.423 rows=987355 loops=1)                     Buckets: 1048576  Batches: 1  Memory Usage: 72580kB                     ->  Seq Scan on sa_milestone_overrides  (cost=0.00..28726.78 rows=988178 width=34) (actual time=0.004..189.641 rows=987355 loops=1) Planning time: 3.561 ms Execution time: 489.223 ms 

And it goes pretty smoothly as one can see — less than a second for the query. But on the Aurora instance, this happens:

Unique  (cost=2632927.29..2699194.83 rows=8835672 width=43) (actual time=4577.348..4577.350 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.001..0.001 rows=1 loops=1)   ->  Sort  (cost=2632903.69..2654992.87 rows=8835672 width=43) (actual time=4577.348..4577.348 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Merge Left Join  (cost=1321097.58..1419347.08 rows=8835672 width=43) (actual time=4488.369..4577.330 rows=4 loops=1)               Merge Cond: ((f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))               ->  Sort  (cost=1194151.06..1216240.24 rows=8835672 width=37) (actual time=0.039..0.040 rows=4 loops=1)                     Sort Key: f1_folio_milestones.view, f1_folio_milestones.milestone, f1_folio_milestones.acp_id                     Sort Method: quicksort  Memory: 25kB                     ->  Nested Loop  (cost=31.16..2166.95 rows=8835672 width=37) (actual time=0.022..0.028 rows=4 loops=1)                           ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.006..0.006 rows=1 loops=1)                                 Group Key: all_acp_ids.acp_id                                 ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.003..0.004 rows=1 loops=1)                           ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..10.63 rows=4 width=37) (actual time=0.011..0.015 rows=4 loops=1)                                 Index Cond: (acp_id = all_acp_ids.acp_id)               ->  Sort  (cost=126946.52..129413.75 rows=986892 width=34) (actual time=4462.727..4526.822 rows=448136 loops=1)                     Sort Key: sa_milestone_overrides.view, sa_milestone_overrides.milestone, sa_milestone_overrides.acp_id                     Sort Method: quicksort  Memory: 106092kB                     ->  Seq Scan on sa_milestone_overrides  (cost=0.00..28688.92 rows=986892 width=34) (actual time=0.003..164.348 rows=986867 loops=1) Planning time: 1.394 ms Execution time: 4583.295 ms 

It effectively has a lower global cost, but takes almost 10 times as much time than before!

Disabling merge joins makes Aurora revert to a hash join, which gives the expected execution time — but permanently disabling it is not an option. Curiously though, disabling nested loops gives an even better result while still using a merge join…

Unique  (cost=3610230.74..3676431.05 rows=8826708 width=43) (actual time=2.465..2.466 rows=1 loops=1)   CTE all_acp_ids     ->  Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.004 rows=1 loops=1)   ->  Sort  (cost=3610207.14..3632273.91 rows=8826708 width=43) (actual time=2.464..2.464 rows=4 loops=1)         Sort Key: f1_folio_milestones.acp_id, (COALESCE(, f1_folio_milestones.team_responsible))         Sort Method: quicksort  Memory: 25kB         ->  Merge Left Join  (cost=59.48..2397946.87 rows=8826708 width=43) (actual time=2.450..2.455 rows=4 loops=1)               Merge Cond: (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text)               Join Filter: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text))               ->  Merge Join  (cost=40.81..2267461.88 rows=8826708 width=37) (actual time=2.312..2.317 rows=4 loops=1)                     Merge Cond: (f1_folio_milestones.acp_id = all_acp_ids.acp_id)                     ->  Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones  (cost=0.56..2223273.29 rows=17653416 width=37) (actual time=0.020..2.020 rows=1952 loops=1)                     ->  Sort  (cost=40.24..40.74 rows=200 width=32) (actual time=0.011..0.012 rows=1 loops=1)                           Sort Key: all_acp_ids.acp_id                           Sort Method: quicksort  Memory: 25kB                           ->  HashAggregate  (cost=30.60..32.60 rows=200 width=32) (actual time=0.008..0.008 rows=1 loops=1)                                 Group Key: all_acp_ids.acp_id                                 ->  CTE Scan on all_acp_ids  (cost=0.00..27.20 rows=1360 width=32) (actual time=0.005..0.005 rows=1 loops=1)               ->  Materialize  (cost=0.42..62167.38 rows=987968 width=34) (actual time=0.021..0.101 rows=199 loops=1)                     ->  Index Scan using sa_milestone_overrides_acp_id_index on sa_milestone_overrides  (cost=0.42..59697.46 rows=987968 width=34) (actual time=0.019..0.078 rows=199 loops=1) Planning time: 5.500 ms Execution time: 2.516 ms 

We have asked the AWS support team, they are still looking at the issue, but we are wondering what could cause that issue to happen. What could explain such a behaviour difference?

While looking at some of the documentation for the database, I read that Aurora favors cost over time — and hence it uses the query plan that has the lowest cost.

But as we can see, it’s far from being optimal given its response time… Is there a threshold or a setting that could make the database use a more expensive — but faster — query plan?

Would Triple DES-X with 9 keys be much slower than standard Triple DES?

Since a single hardware pass of an XOR with a 64 bit key is very fast, would Triple DES-X using nine 64 bit keys used in the following manner be virtually identical in terms of code size, memory consumption, and execution speed to 3DES?

XOR (Key 1) DES (Key 2) XOR (Key 3)

XOR (Key 4) DES (Key 5) XOR (Key 6)

XOR (Key 7) DES (Key 8) XOR (Key 9)

Additionally, would it be significantly stronger? Would it still suffer from the same block size based vulnerability of DES-X?

Generalization of code is slower than particular case

I wrote the following Mathematica module:

QNlalternative[NN_, l_, f_] := Module[{s, wz, w, z, j, lvec},    s = 0;    Do[       wz = Table[weightsNodesQ1l@lvec@i, {i, NN}];       w = Table[wz[[i]][[1, All]], {i, NN}];       z = Table[wz[[i]][[2, All]], {i, NN}];       s = s + Function[Sum[(f @@ (Table[z[[i]][[j[i]]], {i, NN}]))*(Times @@ (Table[                 w[[i]][[j[i]]], {i, NN}])), ##]] @@                  Table[{j[k], 2^lvec[k] + 1}, {k, NN}],       ##       ] & @@ Table[{lvec[i], l + NN - 1 - Total@Table[lvec[k], {k, i - 1}]}, {i, NN}];    Return[s]    ]; 

This module calls another module:

sumPrime[v_List] := First[v]/2 + Total[Delete[v, 1]]  weightsNodes[NN_] := Module[{w, z},    w = Table[4/NN*sumPrime[Table[1/(1 - n^2)*Cos[n*k*Pi/NN], {n, 0., NN, 2}]], {k, 0., NN}];    z = Table[Cos[k*Pi/NN], {k, 0., NN}];    Return[{w, z}]    ];  weightsNodesQ1l[l_] := weightsNodes[2^l] 

This code is related to a mathematical problem I am solving (it is a modification). When I first was thinking about how to write the module QNlalternative, I wrote the particular case of NN=5 in a sloppy manner, using repeated statements, as follows:

Q5l[l_, f_] :=    Module[{s, wzl1, wzl2, wzl3, wzl4, wzl5, wl1, zl1, wl2, zl2, wl3,      zl3, wl4, zl4, wl5, zl5},    s = 0;    Do[     wzl1 = weightsNodesQ1l[l1];     wzl2 = weightsNodesQ1l[l2];     wzl3 = weightsNodesQ1l[l3];     wzl4 = weightsNodesQ1l[l4];     wzl5 = weightsNodesQ1l[l5];     wl1 = wzl1[[1, All]]; zl1 = wzl1[[2, All]];     wl2 = wzl2[[1, All]]; zl2 = wzl2[[2, All]];     wl3 = wzl3[[1, All]]; zl3 = wzl3[[2, All]];     wl4 = wzl4[[1, All]]; zl4 = wzl4[[2, All]];     wl5 = wzl5[[1, All]]; zl5 = wzl5[[2, All]];     s = s +  Sum[f[zl1[[i1]], zl2[[i2]], zl3[[i3]], zl4[[i4]], zl5[[i5]]]*         wl1[[i1]]*wl2[[i2]]*wl3[[i3]]*wl4[[i4]]*wl5[[i5]], {i1, 1,          2^l1 + 1}, {i2, 1, 2^l2 + 1}, {i3, 1, 2^l3 + 1}, {i4, 1,          2^l4 + 1}, {i5, 1, 2^l5 + 1}],     {l1, 1, l + 5 - 1}, {l2, 1, l + 5 - 1 - l1}, {l3, 1,       l + 5 - 1 - l1 - l2}, {l4, 1, l + 5 - 1 - l1 - l2 - l3}, {l5, 1,       l + 5 - 1 - l1 - l2 - l3 - l4}     ];    Return[s]    ]; 

The module Q5l is much faster than QNlalternative:

AbsoluteTiming[QNlalternative3[5, 6, Sin[Plus[##]]^2 &]] (* {19.4634, 6213.02} *)  AbsoluteTiming[Q5l[6, Sin[Plus[##]]^2 &]] (* {6.64357, 6213.02} *) 

Why is QNlalternative slower? Which step of the generalization of Q5l to an arbitrary NN is too slow?