Nearest neighbors analysis in Red Black Trees

I use a Red Black Tree where each nodes contains a symbol sequence, e.g. top node BB, left child AB, right child CB.

Given a symbol sequence from the tree, is there an efficient way to retrieve n nearest neighbors or all neighbors within a distance radius?

Could I recursively search the children and parent nodes and keep adding each visited node until I have n of them or in the radius case the encountered node is outside the radius?

I know the efficient way for neighborhood queries is e.g. a KD Tree, but I am looking for a data structure that will allow me to insert new sequences without having to rebuild the whole tree each time.

What is the “continuity” as a term in computable analysis?


Background

I once implemented a datatype representing arbitrary real numbers in Haskell. It labels every real numbers by having a Cauchy sequence converging to it. That will let $ \mathbb{R}$ be in the usual topology. I also implemented addition, subtraction, multiplication, and division.

But my teacher said, "This doesn’t seem to be a good idea. Since comparison is undecidable here, this doesn’t look very practical. In particular, letting division by 0 to fall in an infinite loop doesn’t look good."

So I wanted my datatype to extend $ \mathbb{Q}$ . Since equality comparison of $ \mathbb{Q}$ is decidable, $ \mathbb{Q}$ is in discrete topology. That means a topology on $ \mathbb{R}$ must be finer than the discrete topology on $ \mathbb{Q}$ .

But, I think I found that, even if I could implement such datatype, it will be impractical.

Proof, step 1

Let $ \mathbb{R}$ be finer than $ \mathbb{Q}$ in discrete topology. Then $ \{0\}$ is open in $ \mathbb{R}$ . Assume $ + : \mathbb{R}^2 → \mathbb{R}$ is continuous. Then $ \{(x,-x): x \in \mathbb{R}\}$ is open in $ \mathbb{R}^2$ . Since $ \mathbb{R}^2$ is in product topology, $ \{(x,-x)\}$ is a basis element of $ \mathbb{R}^2$ for every $ x \in \mathbb{R}$ . It follows that $ \{x\}$ is a basis element of $ \mathbb{R}$ for every $ x \in \mathbb{R}$ . That is, $ \mathbb{R}$ is in discrete topology.

Proof, step 2

Since $ \mathbb{R}$ is in discrete topology, $ \mathbb{R}$ is computably equality comparable. This is a contradiction, so $ +$ is not continuous, and thus not computable.

Question

What is bugging me is the bolded text. It is well-known that every computable function is continuous (Weihrauch 2000, p. 6). Though the analytic definition and the topological definition of continuity coincide in functions from and to Euclidean spaces, $ \mathbb{R}$ above is not a Euclidean space. So I’m unsure whether my proof is correct. What is the definition of "continuity" in computable analysis?

Bubble Sort: Runtime complexity analysis like Cormen does

I’m trying to analyze Bubble Sort runtime in a method similar to how Cormen does in "Introduction to Algorithms 3rd Ed" for Insertion Sort (shown below). I haven’t found a line by line analysis like Cormen’s analysis of this algorithm online, but only multiplied summations of the outer and inner loops.

For each line of bubblesort(A), I have created the following times run. Appreciate any guidance if this analysis is correct or incorrect. If incorrect, how it should be analyzed. Also, I do not see the best case where $ T(n) = n$ as it appears the inner loop always runs completely. Maybe this is for "optimized bubble" sort, which is not shown here?

Times for each line with constant run time $ c_n$ , where $ n$ is the line number:

Line 1: $ c_1 n$

Line 2: $ c_2 \sum_{j=2}^n j $

Line 3: $ c_3 \sum_{j=2}^n j – 1$

Line 4: $ c_4 \sum_{j=2}^n j – 1$ Worst Case

$ T(n) = c_1 n + c_2 (n(n+1)/2 – 1) + c_3 (n(n-1)/2) + c_4 (n(n-1)/2)$

$ T(n) = c_1 n + c_2 (n^2/2) + c_2 (n/2) – c2 + c_3 (n^2/2) – c_3 (n/2) + c_4 (n^2/2) – c_4 (n/2)$

$ T(n) = (c_2/2+c_3/2+c_4/2) n^2 + (c_1 + c_2/2+c_3/2+c_4/2) n – c_2 $

$ T(n) = an^2 + bn – c$

Bubble Sort from Cormen

Insertion Sort from Cormen

File analysis solution

I scrolled through SA and didn’t find a better site. If there’s a better place to ask please let me know.

We’re in need of a file analysis solution. We need a way to analyze and assess files with unknown contents that in some cases may have confidential information within them. Due to this risk of confidential information we either need a reputable cloud based company with a strong privacy policy and security measures, or an on-prem solution, with on-prem being preferred.

Anyone here have any suggestions? We’re open to either static-analysis or dynamic-analysis in our situation.

What is the difference between available expressions analysis and very busy expressions analysis?

I am having trouble understanding the conceptual meaning of the two kinds of analysis. I know the equations and how to solve the problems and I know how one is a forward data-flow analysis while the other is a backwards data-flow analysis, but there is still something missing in the explanations I have seen so far, in a higher level.

I manage to connect to Azure Analysis Services from SSMS, but not from SSIS

I’m new to the Microsoft Server Suite.

I’ve downloaded SSMS and connected to Azure Analysis Services from it. I’m able to query my data using mdx without any problems.

However, I actually intend to build an ETL pipeline with the AAS cube as one of the sources. So I installed SSIS and have been trying to connect it to the AAS cube.

I first add "Analysis Services Processing Task" to the package. The result looks ok (when I click on "Test connection" the result is positive). But when I click on "Add", it doesn’t detect any cubes (there are two on the AAS server specified):

enter image description here

I assumed it worked anyway, but I can’t query the cube no matter how I try to do that. I added "Execute SQL task", but when I run it, it gives me an error:

enter image description here

enter image description here

enter image description here

The error message is:

An OLE DB record is available. Source: "Microsoft OLE DB Driver for SQL Server" Hresult: 0x80004005 Description: "Login timeout expired". An OLE DB record is available. Source: "Microsoft OLE DB Driver for SQL Server" Hresult: 0x80004005 Description: "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.". An OLE DB record is available. Source: "Microsoft OLE DB Driver for SQL Server" Hresult: 0x80004005 Description: "Named Pipes Provider: Could not open a connection to SQL Server [53]. ". Error: 0xC00291EC at Execute SQL Task, Execute SQL Task: Failed to acquire connection "asazure://northeurope.asazure.windows.net/xxxx". Connection may not be configured correctly or you may not have the right permissions on this connection. Task failed: Execute SQL Task Warning: 0x80019002 at Package: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. The Execution method succeeded, but the number of errors raised (1) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors. SSIS package "C:\Users176\source\repos\Integration Services Project1\Integration Services Project1\Package.dtsx" finished: Failure. The program ‘[18664] DtsDebugHost.exe: DTS’ has exited with code 0 (0x0).

Any ideas?

Trivial clarification with the analysis of the Dijkstra Algorithm as dealt with in Keneth Rosen’s “Discrete Mathematics and its Application”

I was going through the text, “Discrete Mathematics and its Application” by Kenneth Rosen where I came across the analysis of the Dijkstra Algorithm and felt that the values at some places of the analysis are not quite appropriate. The main motive of my question is not the analysis of the Dijkstra Algorithm in general( a better version and more clearer version exists in the CLRS text) but my main motive is analysis of the algorithm acurately as far as the mathematics is concerned, considering the below algorithm as just an unknown algorithm whose analysis is required to be done. I just want to check my progress by the fact that whether the thing which I pointed out as being weird, is actually weird or not.

Lets move on to the question. Below is the algorithm in the text.

ALGORITHM: Dijkstra’s Algorithm.

procedure Dijkstra(G: weighted connected simple graph, with all weights positive)        {G has vertices a = v[1], ... ,v[n] = z and weights w(v[j], v[j])      where w(v[j], v[j]) = ∞ if {v[i],v[j]) is not an edge in G}      for i: = 1 to n         L(v[i]) := ∞      L(a) := 0      S:=∅      {the labels are now initialized so that the label of a is 0 and all          other labels are ∞, and S is the empty set}       while z ∉ S          u := a vertex not in S with L(u) minimal          S:= S ∪ {u}          for all vertices v not in S              if L(u) + w(u, v) < L(v) then                  L(v) := L(u) + w(u, v)              {this adds a vertex to S with minimal label and updates the labels of vertices not in S}            return L(z)  {L(z) = length of a shortest path from a to z} 

The following is the analysis which they used:

We can now estimate the computational complexity of Dijkstra’s algorithm (in terms of additions and comparisons). The algorithm uses no more than $ n − 1$ iterations where $ n$ is the number of vertices in the graph, because one vertex is added to the distinguished set at each iteration. We are done if we can estimate the number of operations used for each iteration. We can identify the vertex not in S in the $ k$ th iteration with the smallest label using no more than $ n − 1$ comparisons. Then we use an addition and a comparison to update the label of each vertex not in S in the $ k$ th iteration . It follows that no more than $ 2(n − 1)$ operations are used at each iteration, because there are no more than $ n − 1$ labels to update at each iteration.

The algorithm uses no more than $ n − 1$ iterations where $ n$ is the number of vertices in the graph, because one vertex is added to the distinguished set at each iteration., What I feel is that it shall be $ n$ iterations and not $ n$ as in the very first iteration the vertex $ a$ is included in the set $ S$ and the process continues till $ z$ is inserted into the set $ S$ and $ z$ may be the last vertex in the ordering i.e.$ v_n$ .

The rest statements are fine I hope.