What can cause higher CPU time and duration for a given set of queries in trace(s) ran on two separate environments?

I’m troubleshooting a performance issue in a SQL Server DR environment for a customer. They are running queries that consistently take longer in their environment than our QA environment. After analyzing traces that were performed in both environments with the same parameters/filters and with the same version of SQL Server (2016 SP2) and the exact same database, we observed that both environment were picking the same execution plan(s) for the queries in question, and the number of reads/writes were close in both environments, however the total duration of the process in question and the CPU time logged in the trace were significantly higher in the customer environment. Duration of all processes in our QA environment was around 18 seconds, the customer was over 80 seconds, our CPU time was close to 10 seconds, theirs was also over 80 seconds. Also worth mentioning, both environments are currently configured to MAXDOP 1.

The customer has less memory (~100GB vs 120GB), and slower disks (10k HHD vs SSD) than our QA environment, but but more CPUs. Both environments are dedicated to this activity and should have little/no external load that wouldn’t match. I don’t have all the details on CPU architecture they are using, waiting for some of that information now. The customer has confirmed they have excluded SQL Server and the data/log files from their virus scanning. Obviously there could be a ton of issues in the hardware configuration.

I’m currently waiting to see a recent snapshot of their wait stats and system DMVs, the data we originally received, didn’t appear to have any major CPU, memory or Disk latency pressure. I recently asked them to check to see if the windows power setting was in performance or balanced mode, however I’m not certain that would have the impact we’re seeing or not if the CPUs were being throttled.

My question is, what factors can affect CPU time and ultimately total duration? Is CPU time, as shown in a sql trace, based primarily on the speed of the processors or are their other factors I should be taking in to consideration. The fact that both are generating the same query plans and all other things being as close as possible to equal, makes me think it’s related to the hardware SQL is installed on.

With respect to differential privacy how to find the global sensitivity of queries like ‘maximum height’ ‘Average height’ etc

As much as I have understood,for any query f(x), we need to take maximum of |f(x)-f(y)| over all neighboring databases.

please explain how to find global sensitivity of queries like average height or maximum height.

How to answer the following queries on a tree?

Given a tree of "N" nodes(each node has been assigned a value A[i],node-"1" is the root of the tree), and a constant "K" , we have Q queries of the following type : [w]

(which means find the lowest valued node in the sub-tree of [w] , only considering those nodes in the sub-tree of [w] which have a depth less than equal to K) .

Example :

Value of nodes of tree :

A[1] = 10

A[2] = 20

A[3] = 30

A[4] = 40

A[5] = 50

A[6] = 60

Edges of tree : [1-2],






Query-1 : [w]=1 . All nodes in subtree of [w] : (1,2,3,4,5,6) , now, all nodes in sub-tree of [w] having depth less than equal to K : (1,2) . Hence , minimum(A[1],A[2])=min(10,20)=10 is the answer .

Query-2 : [w]=4 . All nodes in subtree of [w] : (4,5,6) , now, all nodes in sub-tree of [w] having depth less than equal to K : (4,5,6). Hence , minimum(A[4],A[5],A[6]) = min(40,50,60)=40 is the answer .

Reconstructing an Array via Time-Intensive Subset Queries

I am trying to design an algorithm for a problem, and the following is an auxiliary problem for which a good solution would imply a faster algorithm for the original problem.

I am given access to an array of numbers. However, I am only allowed to query it by specifying an arbitrary subset of indices, in response to which I am then given the sum of the elements at those positions. These queries are quite costly, specifically they run in time $ \tilde{O}(n^2)$ time (where $ \tilde{O}(\cdot)$ hides polylogarithmic factors). I want to determine the element at each index in the array (i.e. reconstruct the array) using as little time as possible.

Of course, it is possible to do this by querying each element on its own. This algorithm does $ n$ queries and hence has total running time $ \tilde{O}(n^3)$ . I am wondering if there is a faster way. Adaptivity does not help with this problem, so any algorithm would have two steps: First, it executes a fixed sequence of queries, and then reconstructs all elements using the query answers. Ideally, both steps run in time $ o(n^3)$ . So far, any set of $ o(n)$ queries that I looked at makes recovery impossible. This might be the case for any such set of queries (and my intuition screams that this is probably the case), but I cannot see a proof for this.

I’d love an answer that either shows a faster algorithm or proves that $ o(n)$ queries are impossible, but answers with partial insights would also be great.

Why did “terminal commands” never get a version of SQL “parameterized queries”?

I was taught horrible bad practice when I initially "learned" SQL, which baked in user-submitted input with quotes and attempted to "escape" this (in the beginning, I didn’t even escape it at all…). I then had to spend many years unlearning this, to instead do things like:

SELECT * FROM table WHERE id = $  1; 

And then the $ 1‘s data is sent separately to the database, not part of the actual query string, to make it impossible for "SQL injections" to happen.

However, terminal commands frequently need to be sent untrusted user input, such as:

generate_PDF.exe --template="a path goes here" --title-of-report="arbitrary title from user" 

Every time I have to run such a command, I’m scared to death that my "terminal argument escape" function isn’t working correctly, or has some unknown bug, so that users can make a title along the lines of "; rm -rf /; to execute arbitrary code on my machine.

This becomes even more of a serious issue when the normal "OS quotes" cannot be used, such as:

pg_dump --format custom --file "a real path" --exclude-table="schema name"."table name" 

The "schema name"."table name" part has to be provided in full from the user, and thus I have to attempt to verify the syntax myself, as it cannot just be quoted in its entirety with the "terminal argument escaper" function wrapping it all. (Even if it might be possible in this specific context, I’m talking in general and just using this as an example of when it gets "hairy".)

This has made me wonder why the terminal commands, for example in PHP (since I use this myself for everything) cannot be done like this:

pg_dump --format custom --file $  1 --exclude-table=$  2 

And then we send the actual arguments separately as an array of strings, just like with the "parameterized queries" in SQL databases?

Please note that the $ 1 and $ 2 here do not refer to PHP variables, but to "placeholders" for the "engine" which interprets this and which lives either in PHP or the OS.

Why is this not a thing? Or maybe it is, only I haven’t heard of it? I’m continuously baffled by how many things which I constantly need and use just "sit there and rot" while they keep releasing a new programming language every week which nobody uses. I feel more and more frustrated about how "stale" everything I care about seems, but this risks getting off-topic, so I’ll stick to the question I’ve just asked for now.

Complexity of approximating a function value using queries

I am looking for information on problems of the following kind.

There is a function $ f: [0,1] \to \mathbb{R}$ that is continuous and monotonically-increasing, with $ f(0)<0$ and $ f(1)>0$ . You have to find the unique $ x\in[0,1]$ such that $ f(x)=0$ . You can access $ f$ only through queries of the type "what is $ f(x)$ ?". How many such queries do you need in order to approximate $ x$ up to some constant $ \epsilon$ ?

Here, the solution is simple: using binary search, the interval in which $ x$ can lie shrinks by 2 after each query, so $ \log_2(1/\epsilon)$ queries are sufficient. This is also an upper bound, since an adversary can always answer in such a way that the possible interval for $ x$ shrinks by at most 2 after each query.

However, one can think of more complicated problems of this kind, with several different functions and possibly different kinds of queries.

What is a term, and some references, for this kind of computational problems?

What is the suitable file structure for database if queries are select (relational algebra) operations only?

Searches related to A relation R(A, B, C, D) has to be accessed under the query σB=10(R). Out of the following possible file structures, which one should be chosen and why? i. R is a heap file. ii. R has a clustered hash index on B. iii. R has an unclustered B+ tree index on (A, B).

Multiple queries in phpmyadmin – Distance using coordinates, Slope, Intercept, Angle, and few more

I having around 500 excel sheets in .csv format with data captured for my experiment having following columns in place.

enter image description here

Now I need to calculate the following parameters using this data. I have done these in excel, however doing this repeatedly for each excel so many times is difficult, so I want to write an SQL query in PhpmyAdmin will help some time.

  1. Last charecter typed – need to capture last charecter from the column ‘CharSq’
  2. *Slope (in column J) =(B3-B2)/(A3-A2)
  3. Intercept (in column K) =B2-(A2*(J3))
  4. Angle (in degrees) =MOD(DEGREES(ATAN2((A3-A2),(B3-B2))), 360) –
  5. Index of Difficulty =LOG(((E1/7.1)+1),2)
  6. Speed Value length (if speed value length >3, then mark as 1 or else 0) = =IF(LEN(D3) >= 3, "1","0")
  7. Wrong Sequence (if I3=I2,then mark search time, else actual time) =IF(I3=I2,"Search Time","Actual Time")
  8. Mark charecter into (1,2,3) = =IF(I2="A",1, IF(I2="B",2, IF(I2="C",3, 0))) enter image description here

I have started with this SQL query SELECT id, type, charSq, substr(charSq,-1,1) AS TypedChar, xCoordinate, yCoordinate, angle, distance, timestamp, speed FROM table 1 WHERE 1

Need help for the rest of the parameters. Thanks.

Note – I am going to run this in phpMyAdmin SQL

Batching multiple nearest surface queries: Is it faster? Are there better algorithms?

I’m working on an algorithm that computes lots of "nearest point on a triangulated surface" queries in 3d as a way to resample data sets, and I’m wondering if there is any information out there on speeding up these queries. My gut tells me that partitioning the set of query points in a voxel grid or something, and doing them in batches could be a speedup, but I can’t quite see how I could efficiently use that. Also I’m not sure if the time cost of partitioning would balance the search speedup. Is running N independent queries really the best way?

I found that there are papers and research for the all-knn algorithm, but that’s for searching within a single set. And then, those speedups take advantage of the previously computed neighbors or structure within the single set, so I can’t use them. It feels close though.

Any help is appreciated.