I have a large text files containing some strings, and I was trying to see if there is way to convert those strings into binary

Example: 0 0 0 0 0 0 0 0 0 Tokyo Kanto Japan

I want the words Tokyo, Kanto, and Japan to be converted

If insertion/deletion from a binary tree is efficient, while maintaining ability to get item by index

I can’t quite figure this out. Say you have a binary tree where the left or right nodes correspond to 0 or 1 and a group of levels form a chain which is the index of the node. So you have 10010 which is 18 in decimal, so index 18 (say we count from 1 instead of 0).

      1     0   1    0 1 0 1         0 1         ... 

We build up some binary tree/trie. I’m trying to figure out if you can delete and add nodes to the trie without having to rewrite the whole subbranch to the right of where you insert or delete, or if there is a simple few-step operation to somehow sort of rotate the tree branch on insert/delete such that you don’t have to rewrite a big chunk of the tree. The reason is, you want to maintain the position of the nodes, so if you have nodes at position 6, 7, and 8, if you remove node 4, then they become positions 5, 6, 7. Do you get this for free somehow, or do you have to rewrite all of there positions in the tree? I can’t quite see how it would look, wondering if one could explain how it would work.

Randomly built binary search trees

In Introduction to Algorithms (CLRS) 3rd Edition, page 299, the section attempts to prove:

The expected height of a randomly built binary search tree on $$n$$ distinct keys is $$O(\lg n)$$.

We define “randomly built binary search tree on $$n$$ keys” as:

a binary search tree that arises from inserting the keys in random order into an initially empty tree, where each of the $$n!$$ permutations of the input keys is equally likely.

In the proof, we defined some random variables:

Let $$X_n$$ denotes the height of a randomly built binary search tree on $$n$$ keys and the exponential height $$Y_n = 2^{X_n}$$. Of the $$n$$ keys, we choose one key as the root of the tree, and we let $$R_n$$ denotes the random variable that holds the position that this key would occupy if the set of keys were sorted (also know as ‘rank’ in the text). If we know that $$R_n=i$$, it follows that $$Y_n=2\cdot max(Y_{i-1},Y_{n-1})$$.

We also define indicator random variables $$Z_{n,1}, Z_{n,2}, …, Z_{n,n}$$, $$Z_{n,i} = I\{R_n=i\}$$.

$$$$\begin{split} E[Y_n] & = E\Big[\sum_{i=1}^{n} Z_{n,i} (2\cdot max(Y_{i-1}, Y_{n-i}))\Big] \ & = \sum_{i=1}^{n} E[Z_{n,i} (2\cdot max(Y_{i-1}, Y_{n-i}))] && \text{(by linearity of expectation)}\ & = \sum_{i=1}^{n} E[Z_{n,i}] E[(2\cdot max(Y_{i-1}, Y_{n-i}))] && \text{(by independence)}\ \end{split}$$$$

I would like to ask why is the last equality correct? In other words, why can we assume independence of $$Z_{n,i}$$ and $$max(Y_{i-1},Y_{n-i})$$?

In the proof, there was a brief explanation:

Having chosen $$R_n = i$$, the left subtree (whose ranks are less than $$i$$. This subtree is just like any other randomly built binary search tree on $$i-1$$ keys. Other than the number of keys it contains, this subtree’s structure is not affected at all by the choice of $$R_n=i$$, and hence the random variables $$Y_{i-1}$$ and $$Z_{n,i}$$ are independent. Likewise, the right subtree, whose exponential height is $$Y_{n-i}$$, is randomly built on the $$n-i$$ keys whose ranks are greater than $$i$$.

However, since $$R_n$$ does affect the number of keys $$Y_{i-1}$$ and $$Y_{n-i}$$ contain (as acknowledged by the explanation above), wouldn’t it mean that $$Y_{i-1}$$ and $$Y_{n-i}$$ are dependent on $$Z_{n,i}$$?

A note regarding similar questions posted on CS stackexchange

I have read the answers for the following questions which are very similar to mine:

1. Proof that a randomly built binary search tree has logarithmic height
2. Average height of a BST with n Nodes.
3. Randomized BST height analysis : How $Z_{n,i}$ and $Y_{k-1}$ are independent?

For 1 & 2, the question was a more general one which asked for an explanation for the entire proof. Also, the answer for both questions did not attempt to justify the independence and simply used the result.

For 3, the question was based on a MIT lecture, https://www.youtube.com/watch?v=vgELyZ9LXX4 and it lacked some of the details which I have included above. Also, the answer was also not clear.

Returning sorted lowest k elements in a binary heap

Given a binary heap of size $$n$$ and a number $$k\le n$$. How can I return an array with size $$k$$, which contains the $$k$$ lowest elements in the binary heap, so that it will be sorted in the end?

The problem is that the time-complexity needs to be $$\Theta(k \log(k))$$.

Maximum Width binary tree

Here is my code for the LeetCode problem

For a given binary tree, find the maximum width of the binary tree. The width of one level is defined as the length between the end-nodes even if there are None nodes in between

My code (in PyCharm) passess all the given tests but does not seem to pass on the LeetCode website. I am not sure why this is, so please don’t try plugging it into the website as I assume the way I have built a binary tree is different to their method.

class Node:      def __init__(self, data):         self.data = data         self.left = None         self.right = None      level_width = {0: [0, 0]}      def get_width(self, root, level=0, pointer=1):         if root.left:             if level + 1 not in self.level_width:                 self.level_width[level + 1] = [pointer * 2, 0]             self.level_width[level + 1][0] = min(self.level_width[level + 1][0], pointer * 2)             self.get_width(root.left, level + 1, pointer * 2)         if root.right:             if level + 1 not in self.level_width:                 self.level_width[level + 1] = [9999, pointer * 2 + 1]             self.level_width[level + 1][1] = max(self.level_width[level + 1][1], level + 1, pointer * 2 + 1)             self.get_width(root.right, level + 1, pointer * 2 + 1)         return      def widthOfBinaryTree(self, root):         """           :type root: TreeNode           :rtype: int           """         self.get_width(root)         max_distance = 1         for k, v in self.level_width.items():             max_distance = max(max_distance, v[1] - v[0] + 1)         return max_distance   root = Node(1) root.left = Node(3) root.left.left = Node(5) root.left.left.left = Node(6) root.right = Node(2) root.right.right = Node(9) root.right.right.right = Node(7)  print(root.widthOfBinaryTree(root)) 

Nodes in a binary search tree that span a range

I have a binary search tree of height $$h$$ with an integer in each leaf. I also have a range $$[\ell,u]$$. I want to find a set of nodes that span the range $$[\ell,u]$$, i.e., a set $$S$$ of nodes such that the leaves under $$S$$ form all (and only) those leaves containing integers in the range $$[\ell,u]$$. How large does $$S$$ need to be, in the worst case, as a function of the height $$h$$? How do I find such a set explicitly?

Compute binary sum with Python 3

I tried to make a function able to compute the binary sum. Binary numbers are passed as list composed of 1 and 0, like [1,1,0,1], [0,0,1], etc. It seems to work. I tried to improve the code, but I don’t think this is the best way to make it work. How can I improve this?

def check(value):     if value%2 == 0: #check if the number can be divided by 2         amount = int(value / 2) #if so, set the value to 0 and the amount as the number divided by 2         value = 0     elif value%2 != 0: #if not so, set the value to 0 and the amount as the previous even number divided by 2         amount = int((value-1)/2)         value = 1     return value, amount #returns the 2 value.  def binary_sum(*args): """The program compute the binary sum of 2+ binary lists, like [1,1,1,1] + [0,0,0,1]"""     i = -1 #control variable     result = [] #list that will be filled by results' binary numbers     value = 0      amount = 0 #amount carried over     binaryLength = [] #list of all args length     for x in args:  binaryLength.append(len(x)) #get the length of each args elements     maxLength = max(binaryLength) #get the max length among args elements     for x in args:         if len(set(binaryLength)) != 1: #checks if all elements have the same length             delta = maxLength - len(x) - 1 #if not so, it adds as many zeros as delta value indicates             while (delta) >= 0:                 x[:0] = [0]                 delta -= 1     while i >= -maxLength: #starts the sum from the end.         for j in args: value += j[i] #get the sum of all the last list element( for each arg)         value += amount          value, amount = check(value) #uses binary sum properties: it returns a value between 0 and 1, and the amount, an integer          result.insert(i, value) #inserts the value at the beginning of the list         i-=1 # needed to iterate the process         value = 0 #resets the value     if amount != 0: #At the end, if there's amount it inserts it at the beginning of result list:          while amount not in [0,1]: #if is not equal to 0 or 1 converts the int amount in binary number              value = amount             value, amount = check(value)             result.insert(i, value)             i-=1         result.insert(i, amount) #if is equal to 1 inserts it at the start of the list.     return result #returns the result as list. 

A function that can translate DNA sequence to binary code

I am designing a function that can translate DNA sequence to binary code in four dimension vector. e.g “A”-(1,0,0,0)| “G-(0,1,0,0)”…

we also find the () in for loop can actually influence the result. we hope to find the reason behind this. e.g. 4-1:7-1 & (4-1):7-1 is totally different, we want to find the knowledge behind this    NC1 <- function(data){    for(i in 1:length(data) ){     if(i==1){        DCfirst <- unlist(as.vector(strsplit(data[1],"",fixed = TRUE)))       DCsecond <- matrix(0,nrow = length(data),ncol = length(DCfirst))       DCsecond[1,] <-  DCfirst      }else{       DCsecond[i,] <- unlist(as.vector(strsplit(data[i],"",fixed = TRUE)))     }   }   return(DCsecond) }  binary<- function(data){   sequence_X<-NC1(data)   N=ncol(sequence_X)   X2<-matrix(NA,nrow=length(data),ncol=4*N)   for (i in 1 : N){     L1<-which(sequence_X[,i]=="A")     L2<-which(sequence_X[,i]=="G")     L3<-which(sequence_X[,i]=="C")     L4<-which(sequence_X[,i]=="U")     for (j in L1){       X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))     }     for (j in L2){       X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))     }     for (j in L3){       X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))     }     for (j in L4){       X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))     }   }     return (X2) }  TEST <- c("ACGUC","ACUAU","UCGUA","CGUCG","UAGUG") binary(TEST) 

The final result is showed us below:
I want the first four elements can translate the DNA sequence as well, not like in this matrix.

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [1,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1 [2,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1 [3,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1 [4,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1 [5,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1      [,18] [,19] [,20] [1,]     0     0     0 [2,]     0     0     0 [3,]     0     0     0 [4,]     0     0     0 [5,]     0     0     0 

Calculate checksum and write to file using binary writer in c# visualstudio

I’m trying to calculate checksum and write it to file using binarywriter. I need to calculate {1^2^3^(2 value from textbox)}. I have created texbox that takes numbers eg 45 and write as 34 35 to my file as 4 and 5 byte, now I need to calculate checksum and get 6 and 7 byte to my file. Please provide any suggestions or code to get those bytes to my file….

First 3 bytes are fixed 4 and 5 bytes from my textbox value and I need to get 6th and 7th byte..

I tried var AB = 1^2^3^ textbox, but I doesn’t work as operand “^” cannot be able to string or byte[].

Any help is appreciated.Thank you

True Error of a binary classifier

For a given classifier h, How is the true error over a distribution D defined? \begin{align*} L_D(h) &= \sideset{\mathbb{E}}{}{}_{x,y \sim D} \Pr[h(x) \neq y] \ &= \sideset{\mathbb{E}}{}{}_{x,y \sim D} \begin{cases} \Pr[y \neq 0|x] & \text{if } h(x) = 0, \ \Pr[y \neq 1|x] & \text{if } h(x) = 1. \end{cases} \end{align*}

I saw these two formulae here Showing that Bayes classifier is optimal Are these two equivalent?