2D kernel density estimation (SmoothKernelDistribution) with bin width estimation: what are the bin values that Mathematica chooses?

Mathematica has builtin bin estimation including the rules Scott, SheatherJones and Silverman (the default one); they work in both 1D and multiple dimensions. Most of the statistical documentation that I could find of these bin-width rules are for 1D data. Their implementation for 2D or higher dimensions seems not, as far as I know, so robust.

I could not find a Mathematica documentation on how exactly these rules are implemented in any dimensions. For the Silverman case, there is a nice question about it that raises very important subtleties: About Silverman's bandwidth selection in SmoothKernelDistribution .

For 2D data, my first guess was that Mathematica uses the same 1D algorithm, but for each of the axis, thus yielding a diagonal bin matrix. Hence, I extended the code provided in the previous link to 2D as follows:

Clear[data, silvermanBandwidth]; silvermanBandwidth[data_] := silvermanBandwidth[data] = Block[   {m, n},   m = MapThread[Min @ {#1, #2} &,     {       StandardDeviation @ data,       InterquartileRange[data, {{0, 0}, {1, 0}}]/1.349     }   ];   n = Length @ data;   0.9 m/n^(1/5) ]; 

(In the statistical literature I could find different conventions for rounding the real numbers that appear in the above code, I do not know precisely which version Mathematica picks; anyway the problem below is larger than these small rounding changes).

The approach above (and a few variations I tried) is quite close to what Mathematica does in 2D, but it is not identical. Here is an example:

data = RandomReal[1, {100, 2}]; silvermanWMDist = SmoothKernelDistribution @ data; silvermanMyDist = SmoothKernelDistribution[data, silvermanBandwidth @ data, "Gaussian"]; ContourPlot[PDF[silvermanWMDist, {x, y}],   {x, -0.1, 1.1},   {y, -0.1, 1.1} ] ContourPlot[PDF[silvermanMyDist, {x, y}],   {x, -0.1, 1.1},   {y, -0.1, 1.1} ] 

enter image description here

My questions are: how Silverman’s rule is implemented in Mathematica for 2D data? Is there a way to print out Mathematica’s derived bin matrix, either for Silverman or any other rule?

Laser Plane Estimation for Laser-Camera system?

I have to set up a system of a laser line/ plane projector and a web camera to localize the 3D position of the laser in the camera image. I’ve read / come across several resources but the idea is still not quite concrete in my head.

So my intuition is that since we have a set up of a laser projector and the camera, and we want to find the position of the laser point in the image – we have to find the ‘correct’ laser plane that intersects with the camera / image plane. I am confused as to how we find the relative pose of this plane with respect to the camera, and how we can use this to find the 3D coordinates?

simple question about epsilon and estimation turing machines

i am getting really confused by it. i got to a point i had to calculate the lim when $ n \rightarrow \infty$ for an optimization problem, and i got to the point that i had to calculate a fairly simple limit: $ lim_{n \rightarrow \infty} {3-\frac{7}{n}}$ .

now i used $ 3 – \epsilon$ and i am trying to show that there can’t be any $ \epsilon>0$ so that the estimation of the algorithm is $ 3-\epsilon$ , because there exists a “bigger estimation” – and this is the part i am not sure about, what is the correct direction of the inequality? $ 3-\frac{7}{n} > 3 – \epsilon$ or the opposite? i am trying to show that the estimation ration is close to 3.

i think that what i wrote is the correct way, but not sure. would appreciate knowing what is correct in this case. thanks.

Estimation of vertex cover in a constant boundsry

I would really appreciate your assistance with this:

for the following function: $ f\left(G,v\right)\:=\:size\:of\:minimal\:vertex\:cover\:v\:belongs\:to$ .

The function gets an undirected graph G and a vertex v and returns a natural number, which is the size of the smallest vertex cover in G that v belongs to.

problem: proving that if it is possible to estimate f in a constant boundary of 5 in polynomial time, then P=NP. meaning, if it is possible to compute in polynomial time a function $ g(G,v)$ and it is guaranteed that $ f(G,v)-5 \leq g(G,v) \leq f(G,v) + 5$ then P=NP.

I don’t understand why it happens and why if known that $ f(G,v)$ can be computed in polynomial time and $ f(G,v)-5 \leq g(G,v) \leq f(G,v) + 5$ then P=NP

Criteria based estimation of a task’s due date

Basically we have a ticket system, each day we get multiple tickets with different categorization (Question,Change,Error) and importance (High,Mid,Low) for different customers (A,B,C). My job is to create some kind of a system determining the due date of these “tasks” based on

  • The state of the previous ones, completed or not.
  • The availability of the developers.
  • A criteria of the previously mentioned attributes (categorization,importance and customer type).

How can one achieve that?

Starting with SQL Server 2019, does compatibility level no longer influence cardinality estimation?

In SQL Server 2017 & prior versions, if you wanted to get cardinality estimations that matched a prior version of SQL Server, you could set a database’s compatibility level to an earlier version.

For example, in SQL Server 2017, if you wanted execution plans whose estimates matched SQL Server 2012, you could set the compatibility level to 110 (SQL 2012), and get execution plan estimates that matched SQL Server 2012.

This is reinforced by the documentation, which states:

Changes to the Cardinality Estimator released on SQL Server and Azure SQL Database are enabled only in the default compatibility level of a new Database Engine version, but not on previous compatibility levels.

For example, when SQL Server 2016 (13.x) was released, changes to the cardinality estimation process were available only for databases using SQL Server 2016 (13.x) default compatibility level (130). Previous compatibility levels retained the cardinality estimation behavior that was available before SQL Server 2016 (13.x).

Later, when SQL Server 2017 (14.x) was released, newer changes to the cardinality estimation process were available only for databases using SQL Server 2017 (14.x) default compatibility level (140). Database Compatibility Level 130 retained the SQL Server 2016 (13.x) cardinality estimation behavior.

However, in SQL Server 2019, that doesn’t seem to be the case. If I take the Stack Overflow 2010 database, and run this query:

CREATE INDEX IX_LastAccessDate_Id ON dbo.Users(LastAccessDate, Id); GO ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 140; GO SELECT LastAccessDate, Id, DisplayName, Age   FROM dbo.Users   WHERE LastAccessDate > '2018-09-02 04:00'   ORDER BY LastAccessDate; 

I get an execution plan with 1,552 rows estimated coming out of the index seek operator:

SQL 2017, compat 2017

But if I take the same database, same query on SQL Server 2019, it estimates a different number of rows coming out of the index seek – it says “SQL 2019” in the comment at right, but note that it’s compat level 140:

SQL 2019, compat 2017

And if I set the compatibility level to 2019, I get that same estimate of 1,566 rows:

SQL 2019, compat 2019

So in summary, starting with SQL Server 2019, does compatibility level no longer influence cardinality estimation the way it did in SQL Server 2014-2017? Or is this a bug?

Sample complexity of mean estimation using empirical estimator and median-of-means estimator?

Given a random variable $ X$ with unknown mean $ \mu$ and variance $ \sigma^2$ , we want to produce an estimate $ \hat{\mu}$ based on $ n$ i.i.d. samples from $ X$ such that $ \rvert \hat{\mu} – \mu \lvert \leq \epsilon\sigma$ .

Empirical estimator: why are $ O(\epsilon^{-2}\cdot\delta^{-1})$ samples necessary? why are $ \Omega(\epsilon^{-2}\cdot\delta^{-1})$ samples sufficient?

Median-of-means estimator: why are $ O(\epsilon^{-2}\cdot\log\frac{1}{ \delta})$ samples necessary?

Big O notation – estimation of run time [migrated]

I am running very computationally intensive tasks and wish to adjust the parameters respective of how long it takes.

The function I am running is PLINK – for those who don’t know, it is used for genotype data.

The function is said to follow a O(n*m^2) w.r.t. big O.

I have the run time for two time points with different parameters for m and a constant n, they are: 3 hours and 648 hours.

From this I wish to estimate the run-time for different parameters of m, that would respect the O(n*m^2) relationship.

Can anybody provide some insight as to methods for estimating run-time with the constant n parameters however also, for running tests with different parameters as well in order to achieve an optimal run-time with respect to accuracy of results?

[GET][NULLED] – WP Cost Estimation & Payment Forms Builder v9.681


[GET][NULLED] – WP Cost Estimation & Payment Forms Builder v9.681