Need help with Trie insertions, I am currently working on a DSA specialization on Coursera, Strings course

I am trying the insertion operation in a Trie and a read operation for the below implementation I am having trouble with insertion.

import java.util.*; class node{ public int val; public node ptrs[]; node(){     this.val =0;     ptrs = new node[26];     for (node ptr : ptrs) {         ptr = null;     }   }     } class Tree{ public node root = new node(); public int pass =0; void insert(String s) {     node trv = root;     for (int i = 0; i < s.length(); i++) {         if (trv.ptrs[s.charAt(i) - 'A'] == null) {             trv.ptrs[s.charAt(i) - 'A'] = new node();             trv.val = ++pass;           //  System.out.println(s.charAt(i)+" val : "+trv.val);         }          trv = trv.ptrs[s.charAt(i) - 'A'];     } } private void visit(node trv){     for(int i =0;i<26;i++){         if(trv.ptrs[i]!=null){             System.out.println((char)(i+'A')+" : "+trv.val);             visit(trv.ptrs[i]);         }     } } void call(){     this.visit(root);  }  } public class trie {   public static void main(String[] args) {     Scanner sc = new Scanner(System.in);     int n = sc.nextInt();     Tree t = new Tree();     while (n-- > 0) {         String s = sc.next();         t.insert(s);     }     t.call();     sc.close();  } } 

my output :

3 ATAGA ATC GAT A : 7 T : 2 A : 6 G : 4 A : 5 C : 6 G : 7 A : 8 T : 9 

expected output :

3 ATAGA ATC GAT A : 1 T : 2 A : 3 G : 4 A : 5 C : 6 G : 7 A : 8 T : 9 

Compressed Trie Run-time(s)

Consider a set $ X$ $ =$ $ \{x_1, x_2, …, x_n\}$ where every $ x_i$ is a positive integer and $ x_i \leq n^f$ for some constant $ f$ . We represent $ x_i$ as strings over an alphabet $ 0, 1, …, 2^t – 1$ (in other words, representing each $ x_i$ in base $ 2^t$ ) and store all $ x_i$ from $ X$ in a compressed trie $ T$ . For every node $ u$ of $ T$ we keep an array $ A(u)$ of size $ 2^t + 1$ . $ A(u)[i]$ contains a pointer to the child $ u_i$ of $ u$ such that the edge from $ u$ to $ u_i$ is marked with $ i$ ; $ A(u)[i] = NULL$ if there is no such $ u_i$ in $ T$ . Specify all of the following in big-$ O$ in terms of $ f$ , $ n$ , and $ t$ (as necessary). What is the maximum height of $ T$ ? What is the total space usage of $ T$ and all $ A(u)$ ? What time is needed to search for a key $ k$ in $ T$ ?

What is the point of the “check” array in a triple-array trie?

I am currently reading the definition of a triple-array trie as described here.

More precisely, I am trying to understand the nature of the three arrays:

  • base. Each element in base corresponds to a node of the trie. For a trie node s, base[s] is the starting index within the next and check pool (to be explained later) for the row of the node s in the transition table.
  • next. This array, in coordination with check, provides a pool for the allocation of the sparse vectors for the rows in the trie transition table. The vector data, that is, the vector of transitions from every node, would be stored in this array.
  • check. This array works in parallel to next. It marks the owner of every cell in next. This allows the cells next to one another to be allocated to different trie nodes. That means the sparse vectors of transitions from more than one node are allowed to be overlapped.

I understand how those three arrays are linked together, but I fail to see the point of the “check” array. It seems to me like you could walk the trie without using the “check” array at all.

What am I missing?

Space-efficent storage of a trie as array of integers

I’m trying to efficiently store a list of strings in an array with the following constraints:

  • All strings consist of 8-bit characters (0..255).
  • The final trie is static, i.e. once it is built, no strings have to be inserted or removed.
  • Looking up a string of length $ m$ must be done in $ O(m)$ with a constant factor as low as possible.
  • The only available memory structure to store the data is an array of integers. In particular, there are no pointers or dynamic memory allocation.
  • Once an array is allocated, it cannot be resized and its memory cannot be released anymore.
  • Memory is rare, so the final data structure should be as compact as possible and no unnecessarily long arrays should be allocated.
  • Computation time is not important for the building phase, but for memory usage the consraints above apply.

Preface

My current approach is a trie that is stored in the array with the following structure:

$ $ \fbox{$ \vphantom{^M_M} \;i_0 \;\ldots\;i_{255}\;$ }\, \fbox{$ \vphantom{^M_M} \;i^*_0 \;\ldots\;i^*_{255}\;$ }\, \fbox{$ \vphantom{^M_M} \;w\;$ }\, \fbox{$ \vphantom{^M_M} \;\mathit{last}\;$ }\, \fbox{$ \vphantom{^M_M} \;B_0\;$ }\,\fbox{$ \vphantom{^M_M} \;B_1\;$ }\,\ldots $ $

where $ i_k$ is a mapping from each unique input character $ k$ to an integer $ 1 \leq i_k(c) \leq w$ with $ i^*$ being the corresponding reverse mapping of $ i$ . Each node in the trie is stored as a block $ B$ of size $ w+1$ . The mapping $ i$ is used to reduce the size of each block, because not the whole character range has to be stored but only the number of characters actually used. This comes at the expense of having one more indirection when looking up words. (The field $ \mathit{last}$ here is used as a pointer to the field after the last block in the array, used to find the next allocation point.)

Each block looks like this:

$ $ \fbox{$ \vphantom{^M_M} \;b\;$ }\, \fbox{$ \vphantom{^M_M} \;c_1 \;\ldots\;c_w\;$ } $ $

$ b$ is either 1 if the word represented by that block is in the trie, and 0 otherwise. $ c_i$ represent all unique input characters (after the $ i$ mapping). If the value of $ c_i$ is equal to 0, there is no entry for this character. Otherwise $ c_i$ is the index into the array at which the block to the following letter starts.

To build the trie, the first step is calculate the bijection $ i$ /$ i^*$ and $ w$ . Then new blocks are added with each prefix that isn’t already present in the trie.

Problem

While this approach works so far, my main problem is memory usage. The current approach is extremly memory expensive when only few words share longer prefixes (which is usally the case). Some tests show that the typical number of non-empty fields is only about 2-3% of the whole array. Another problem is that the final number of needed array fields is only available after the trie has already been built, i.e. I have to be conservative when allocating the memory to not get out of memory while adding new blocks.

My idea now was to use a compressed trie/radix trie instead with two types of blocks: 1) the ones above that represent nodes with several children, and 2) compressed blocks (similar to C char arrays) that represent suffixes in the trie. For example, when the words apple juice and apple tree should be stored in the tree, there would be seven normal blocks for the common prefix apple and a compressed block for each juice and tree. (Perhaps that would also allow to merge common suffixes for words with different prefixes.)

The problem with this is that is may lead to gaps in the array while building the trie. Consider the situation in the above example, in which only apply juice is stored as a compressed block in the trie. Now apple tree is inserted, which would lead to a removal of the apple juice block and addition of juice and tree blocks instead, which will not fit into the left hole in general.

Under the given constraints, can anyone see an algorithm to store strings most efficiently in the array while keeping the linear lookup time?

Boggle using Trie and DFS

I already mentioned in my previous code review, there are two solutions for this problem.

https://www.geeksforgeeks.org/boggle-find-possible-words-board-characters/

this is the first Boggle (Find all possible words in a board of characters) C#

and here comes the second Original question is

Given a dictionary, a method to do a lookup in the dictionary and a M x N board where every cell has one character. Find all possible words that can be formed by a sequence of adjacent characters. Note that we can move to any of 8 adjacent characters, but a word should not have multiple instances of the same cell.

Example:  Input: dictionary[] = {"GEEKS", "FOR", "QUIZ", "GO"};        boggle[][]   = {{'G','I','Z'},                        {'U','E','K'},                        {'Q','S','E'}};       isWord(str): returns true if str is present in dictionary                    else false.  Output:  Following words of the dictionary are present          GEEKS          QUIZ 

Please review for performance and any other comments thanks.

using System; using System.Text; using Microsoft.VisualStudio.TestTools.UnitTesting;  namespace TrieQuestions {     [TestClass]     public class BoggleTrie     {         [TestMethod]         public void BoggleTrieTest()         {             string[] dictionary = { "GEEKS", "FOR", "QUIZ", "GEE" };             char[,] boggle = {{'G','I','Z'},                               {'U','E','K'},                               {'Q','S','E'}             };              Trie tree = new Trie();             foreach (var word in dictionary)             {                 tree.Insert(word);             }              FindWords(boggle, tree);         }          private void FindWords(char[,] boggle, Trie root)         {             int M = boggle.GetLength(0);             int N = boggle.GetLength(1);             bool[,] visited = new bool[M,N];             StringBuilder str = new StringBuilder();              for (int i = 0; i < M; i++)             {                 for (int j = 0; j < N; j++)                 {                     //all the words start with one of the letters in the head of the Trie                     if (root.Head.Edges.ContainsKey(boggle[i, j]))                     {                         str.Append(boggle[i, j]);                         SearchWord(root.Head.Edges[boggle[i,j]], boggle, i, j, visited, str);                     }                     str.Clear();                 }             }         }          private void SearchWord(TrieNode child, char[,] boggle, int i, int j, bool[,] visited, StringBuilder str)         {             if (child.IsTerminal)             {                 Console.WriteLine(str.ToString());             }              int M = boggle.GetLength(0);             int N = boggle.GetLength(1);              if (IsSafe(M, N, i, j, visited))             {                 visited[i, j] = true;                  foreach (var edge in child.Edges)                 {                     for (int row = i - 1; row <= i + 1; row++)                     {                         for (int col = j - 1; col <= j + 1; col++)                         {                             if (IsSafe(M, N, row, col, visited) && boggle[row,col] == edge.Key)                             {                                 SearchWord(edge.Value, boggle, row, col, visited, str.Append(edge.Key));                             }                         }                     }                  }                  visited[i, j] = false;             }         }          private bool IsSafe(int M, int N, int i, int j, bool[,] visited)         {             return i < M && i >= 0 && j < N && j >= 0 && !visited[i, j];         }     } } 

Trie vs. sorted array

For a prefix match of a large set of strings with a read-heavy workload, what would be the trade-offs between using a sorted array and a trie? When is it more appropriate to use one and when the other?

It seems to me that a sorted array has most of the benefits that a trie would, but it’s way simpler. Also feels like a sorted array might be more cache efficient (but maybe not, there’s a lot of jumping around during binary search as well), and if the data set grows really large the sorted array approach is more easily modified to paging out (à la B-trees) or distributing the workload across machines (using prefix sharding), both of which are a lot harder to do on tries.

If we take the length of the strings as a constant instead of a variable (so no O(M)), complexities would be:

Single lookup: sorted array O(log N), trie O(1) (“effectively constant” vs “constant”)

Prefix lookup: both O(N) since we potentially retrieve the entire dictionary.

Insertion: sorted array O(N), trie O(1) (but we said the workload was read-heavy so I don’t care so much).

In what cases do tries have definite advantages?

Boggle solver – Updated (with Trie)

Here is an update to my previous boggle solver. The new version has an almost instantaneous output, removing the need to limit the length of words being searched for.

Any comments on my general coding implementation or layout would be appreciated.

Python v3.7

"""Boggle game solver"""   class Trie:     """Trie class to speed up programme by stopping exploration of non-existent prefixes"""      def __init__(self):         """Initialise the trie"""         self.child = {}      def insert(self, word):         """Insert word into trie"""         current = self.child         for char in word:             if char not in current:                 current[char] = {}             current = current[char]      def search(self, word):         """Search the trie"""         current = self.child         for char in word:             if char not in current:                 return False             current = current[char]         return True   def words_from(board, row, column, running_string, list_words):     """Calculate all possible words from a given starting position [row, column]"""     if row in (4, -1) or column in (4, -1):         return     # Search the Trie     if not trie.search(running_string):         return     if board[row][column] != "-":         new_string = running_string + board[row][column]         board[row][column] = "-"         # Add new word         if len(new_string) >= 3:             list_words.append(new_string.lower())         # Find next word         next_move = [             (1, 1),             (-1, -1),             (1, -1),             (-1, 1),             (1, 0),             (0, 1),             (-1, 0),             (0, -1),         ]         for dx, dy in next_move:             words_from(board, row + dx, column + dy, new_string, list_words)         board[row][column] = new_string[-1]         return list_words   def get_permutations(board):     """Get all permutations """     set_permutations = set()     # Build Trie     global trie     trie = Trie()     with open("en-dict.txt", "r", encoding="utf8") as file:         for line in file:             trie.insert(line)     #Search for words from each starting position     for row in range(4):         for column in range(4):             # Search for words             words = words_from(board, row, column, running_string="", list_words=[])             if words:                 for word in words:                     set_permutations.add(word)             words = None     return sorted(list(set_permutations))  def dictionary_check(set_permuations):     """Check set_permutations for valid English words"""     dictionary = {}     with open("en-dict.txt", "r", encoding="utf8") as file:         for line in file:             dictionary[line.strip()] = 0      counter = 0     for word in set_permuations:         if word.lower() in dictionary:             counter += 1             print(word)     print(f"======\n{counter} words")   def find_words(board):     """Find words on the boggle board"""     set_permutations = get_permutations(board)     print("\nPerforming dictionary check....")     dictionary_check(set_permutations)   def build_board(string):     """Build board from string"""     if len(string) != 16:         print("Error. Must enter 4*4 grid (16 characters)")         return     board = [[*string[0:4]], [*string[4:8]], [*string[8:12]], [*string[12:16]]]     find_words(board)   if __name__ == "__main__":     string_board = "playthiswordgame"     build_board(string_board) 

Non-recursive version of Trie deletion

So,I was looking for the non-recursive version of Trie data structures deletion in the internet. I couldn’t find one. The best I could manage to find was a recursive implementation of trie in this website. Looking at the code for sometime, I thought that I could make the recursive version of the deletion to non-recursive one. Here is my take on it. Remember, I am worried whether I have done effective memory cleanup. Any insight on the overall code structure will also be helpful. Thanks.

#include<iostream> using namespace std; struct node{     bool isWord=false;     node* previous;     node* children[27]; };  void insert(node*& root, string key){     node *temp = root;     temp->previous = NULL;     int keyLen = key.length();     for(int i=0; i<keyLen; ++i){         int index = key[i]-'a';         //If i is the last element         if(i==keyLen-1){             //then take notice of it             //and change isWord to true             temp->isWord=true;             temp->children[index] = NULL;         }         //If there is no node at a given index         if(!temp->children[index]){             //Form a new node             temp->children[index] = new node;             //Take notice of the node that it is coming from             (temp->children[index])->previous = temp;         }         //Traverse along the children node         temp = temp->children[index];     } }   bool search(node*&root, string key){     node*temp = root;     int keyLen = key.length();     for(int i=0; i<keyLen; ++i){         int index = key[i] -'a';         //If i is at string length         //and the end the end it finds         //itself to be true         if(i==keyLen-1 && temp->isWord==true)         {             return true;         }          //If a character does not exist         //in the traversal          if(!temp->children[index]){             return false;         }         temp = temp->children[index];     }     //If the string is longer then expected     return false; } bool hasNoChildren(node* root){     for(int i=0; i<27; ++i){         //If the root has at least one child         //then return false         if(root->children[i])return false;     }     //else return true     return true; } void remove(node*& root, string key){     if(!search(root,key)) return ;     node*temp = root;     int keyLen = key.length();     for(int i=0; i<keyLen; ++i){         int index = key[i]-'a';         /*If i reaches the length of the string         temp is turned to false is turned to false         Which means if day word 'foo' is part of longer         word 'football', 'foo' does not exist as a         seperate word.         And if only 'foo' exist in the dictionary,         it also signals for it get removed in the next         for loop.*/         if(i==keyLen-1){             temp->isWord = false;         }         temp = temp->children[index];     }      /*The pointer temp in the above for loop     manages to reach to the end of the string     Since, the previous pointer is being tracked     it is easy to retract , if it is so required.*/     for(int i=keyLen-1; i>=0; --i){          /*If the temp isWord variable is false          and it happens to have not children         then it is being removed. Not this removal         happens in the bottom up fashion, which         allows effective memory deletion.*/         if(temp->isWord == false && hasNoChildren(temp)){             node*p = temp;             temp = temp->previous;             delete p;             }             else temp= temp->previous;         } }   int main(){     node* a = new node;     string keys[] = { "the", "a", "there",                       "answer", "any", "by",                       "bye", "their", "hero", "heroplane" };     for (int i = 0; i < 10; i++)         insert(a, keys[i]);     search(a, "the") ? cout << "Yes\n" : cout << "No\n";     search(a, "these") ? cout << "Yes\n" : cout << "No\n";     remove(a,"heroplane");     search(a, "hero") ? cout << "Yes\n" : cout << "No\n";     return 0; } 

LeetCode: Replace Words Trie C#

I am trying to solve this leet code problem
https://leetcode.com/problems/replace-words/

Please comment about performance.

In English, we have a concept called root, which can be followed by some other words to form another longer word – let’s call this word successor. For example, the root an, followed by other, which can form another word another.

Now, given a dictionary consisting of many roots and a sentence. You need to replace all the successor in the sentence with the root forming it. If a successor has many roots can form it, replace it with the root with the shortest length.

You need to output the sentence after the replacement.

Example 1:  Input: dict = ["cat", "bat", "rat"] sentence = "the cattle was rattled by the battery" Output: "the cat was rat by the bat" 

Note:

The input will only have lower-case letters. 1 <= dict words number <= 1000 1 <= sentence words number <= 1000 1 <= root length <= 100 1 <= sentence words length <= 1000

using System; using System.Collections.Generic; using System.Collections.Generic; using System.Security.Policy; using System.Text; using Microsoft.VisualStudio.TestTools.UnitTesting;  namespace TrieQuestions {     [TestClass]     public class TrieReplaceWords     {         [TestMethod]         public void ReplaceWordsTest()         {             List<string> dict = new List<string> { "cat", "bat", "rat" };             const string sentence = "the cattle was rattled by the battery";             const string output = "the cat was rat by the bat";             Assert.AreEqual(output, ReplaceWords(dict, sentence));         }          [TestMethod]         public void LeetCodeLongTest()         {             List<string> dict = new List<string> {                 "e", "k", "c", "harqp", "h", "gsafc", "vn", "lqp", "soy", "mr", "x", "iitgm", "sb", "oo", "spj", "gwmly", "iu", "z", "f", "ha", "vds", "v", "vpx", "fir", "t", "xo", "apifm", "tlznm", "kkv", "nxyud", "j", "qp", "omn", "zoxp", "mutu", "i", "nxth", "dwuer", "sadl", "pv", "w", "mding", "mubem", "xsmwc", "vl", "farov", "twfmq", "ljhmr", "q", "bbzs", "kd", "kwc", "a", "buq", "sm", "yi", "nypa", "xwz", "si", "amqx", "iy", "eb", "qvgt", "twy", "rf", "dc", "utt", "mxjfu", "hm", "trz", "lzh", "lref", "qbx", "fmemr", "gil", "go", "qggh", "uud", "trnhf", "gels", "dfdq", "qzkx", "qxw"};             const string sentence = "ikkbp miszkays wqjferqoxjwvbieyk gvcfldkiavww vhokchxz dvypwyb bxahfzcfanteibiltins ueebf lqhflvwxksi dco kddxmckhvqifbuzkhstp wc ytzzlm gximjuhzfdjuamhsu gdkbmhpnvy ifvifheoxqlbosfww mengfdydekwttkhbzenk wjhmmyltmeufqvcpcxg hthcuovils ldipovluo aiprogn nusquzpmnogtjkklfhta klxvvlvyh nxzgnrveghc mpppfhzjkbucv cqcft uwmahhqradjtf iaaasabqqzmbcig zcpvpyypsmodtoiif qjuiqtfhzcpnmtk yzfragcextvx ivnvgkaqs iplazv jurtsyh gzixfeugj rnukjgtjpim hscyhgoru aledyrmzwhsz xbahcwfwm hzd ygelddphxnbh rvjxtlqfnlmwdoezh zawfkko iwhkcddxgpqtdrjrcv bbfj mhs nenrqfkbf spfpazr wrkjiwyf cw dtd cqibzmuuhukwylrnld dtaxhddidfwqs bgnnoxgyynol hg dijhrrpnwjlju muzzrrsypzgwvblf zbugltrnyzbg hktdviastoireyiqf qvufxgcixvhrjqtna ipfzhuvgo daee r nlipyfszvxlwqw yoq dewpgtcrzausqwhh qzsaobsghgm ichlpsjlsrwzhbyfhm ksenb bqprarpgnyemzwifqzz oai pnqottd nygesjtlpala qmxixtooxtbrzyorn gyvukjpc s mxhlkdaycskj uvwmerplaibeknltuvd ocnn frotscysdyclrc ckcttaceuuxzcghw pxbd oklwhcppuziixpvihihp";             const string output = "i miszkays w gvcfldkiavww v dvypwyb bxahfzcfanteibiltins ueebf lqhflvwxksi dc k w ytzzlm gximjuhzfdjuamhsu gdkbmhpnvy i mengfdydekwttkhbzenk w h ldipovluo a nusquzpmnogtjkklfhta k nxzgnrveghc mpppfhzjkbucv c uwmahhqradjtf i z q yzfragcextvx i i j gzixfeugj rnukjgtjpim h a x h ygelddphxnbh rvjxtlqfnlmwdoezh z i bbfj mhs nenrqfkbf spfpazr w c dtd c dtaxhddidfwqs bgnnoxgyynol h dijhrrpnwjlju muzzrrsypzgwvblf z h q i daee r nlipyfszvxlwqw yoq dewpgtcrzausqwhh q i k bqprarpgnyemzwifqzz oai pnqottd nygesjtlpala q gyvukjpc s mxhlkdaycskj uvwmerplaibeknltuvd ocnn f c pxbd oklwhcppuziixpvihihp";             Assert.AreEqual(output, ReplaceWords(dict, sentence));         }         [TestMethod]         public void LeetCodeTest()         {             List<string> dict = new List<string> {"a", "aa", "aaa", "aaaa"             };             const string sentence = "a aa a aaaa aaa aaa aaa aaaaaa bbb baba ababa";             const string output = "a a a a a a a a bbb baba a";             Assert.AreEqual(output, ReplaceWords(dict, sentence));         }          public string ReplaceWords(IList<string> dict, string sentence)         {             TrieNode root = new TrieNode();             foreach (var word in dict)             {                 var current = root;                 foreach (var letter in word)                 {                     if (!current.Edges.TryGetValue(letter, out var output))                     {                         output = current.Edges[letter] = new TrieNode();                     }                     current = output;                 }                  current.IsTerminal = true;             }              string[] words = sentence.Split(' ');             StringBuilder str = new StringBuilder();              foreach (var word in words)             {                 var current = root;                 StringBuilder tempWord = new StringBuilder();                 for (var index = 0; index < word.Length; index++)                 {                     var letter = word[index];                     //if there is no word starting with those letters                     if (!current.Edges.TryGetValue(letter, out var output))                     {                         if (current.IsTerminal)                         {                             str.Append(tempWord + " ");                             break;                         }                          str.Append(word + " ");                         break;                     }                      tempWord.Append(letter);                     //output is terminal for the case we have "a" as a word                     if (current.IsTerminal || output.IsTerminal )                     {                         str.Append(tempWord + " ");                         break;                     }                     if (index == word.Length-1)                     {                         str.Append(word + " ");                     }                      current = output;                 }             }              str.Length--;             return str.ToString();         }     } }