Space-efficent storage of a trie as array of integers

I’m trying to efficiently store a list of strings in an array with the following constraints:

  • All strings consist of 8-bit characters (0..255).
  • The final trie is static, i.e. once it is built, no strings have to be inserted or removed.
  • Looking up a string of length $ m$ must be done in $ O(m)$ with a constant factor as low as possible.
  • The only available memory structure to store the data is an array of integers. In particular, there are no pointers or dynamic memory allocation.
  • Once an array is allocated, it cannot be resized and its memory cannot be released anymore.
  • Memory is rare, so the final data structure should be as compact as possible and no unnecessarily long arrays should be allocated.
  • Computation time is not important for the building phase, but for memory usage the consraints above apply.

Preface

My current approach is a trie that is stored in the array with the following structure:

$ $ \fbox{$ \vphantom{^M_M} \;i_0 \;\ldots\;i_{255}\;$ }\, \fbox{$ \vphantom{^M_M} \;i^*_0 \;\ldots\;i^*_{255}\;$ }\, \fbox{$ \vphantom{^M_M} \;w\;$ }\, \fbox{$ \vphantom{^M_M} \;\mathit{last}\;$ }\, \fbox{$ \vphantom{^M_M} \;B_0\;$ }\,\fbox{$ \vphantom{^M_M} \;B_1\;$ }\,\ldots $ $

where $ i_k$ is a mapping from each unique input character $ k$ to an integer $ 1 \leq i_k(c) \leq w$ with $ i^*$ being the corresponding reverse mapping of $ i$ . Each node in the trie is stored as a block $ B$ of size $ w+1$ . The mapping $ i$ is used to reduce the size of each block, because not the whole character range has to be stored but only the number of characters actually used. This comes at the expense of having one more indirection when looking up words. (The field $ \mathit{last}$ here is used as a pointer to the field after the last block in the array, used to find the next allocation point.)

Each block looks like this:

$ $ \fbox{$ \vphantom{^M_M} \;b\;$ }\, \fbox{$ \vphantom{^M_M} \;c_1 \;\ldots\;c_w\;$ } $ $

$ b$ is either 1 if the word represented by that block is in the trie, and 0 otherwise. $ c_i$ represent all unique input characters (after the $ i$ mapping). If the value of $ c_i$ is equal to 0, there is no entry for this character. Otherwise $ c_i$ is the index into the array at which the block to the following letter starts.

To build the trie, the first step is calculate the bijection $ i$ /$ i^*$ and $ w$ . Then new blocks are added with each prefix that isn’t already present in the trie.

Problem

While this approach works so far, my main problem is memory usage. The current approach is extremly memory expensive when only few words share longer prefixes (which is usally the case). Some tests show that the typical number of non-empty fields is only about 2-3% of the whole array. Another problem is that the final number of needed array fields is only available after the trie has already been built, i.e. I have to be conservative when allocating the memory to not get out of memory while adding new blocks.

My idea now was to use a compressed trie/radix trie instead with two types of blocks: 1) the ones above that represent nodes with several children, and 2) compressed blocks (similar to C char arrays) that represent suffixes in the trie. For example, when the words apple juice and apple tree should be stored in the tree, there would be seven normal blocks for the common prefix apple and a compressed block for each juice and tree. (Perhaps that would also allow to merge common suffixes for words with different prefixes.)

The problem with this is that is may lead to gaps in the array while building the trie. Consider the situation in the above example, in which only apply juice is stored as a compressed block in the trie. Now apple tree is inserted, which would lead to a removal of the apple juice block and addition of juice and tree blocks instead, which will not fit into the left hole in general.

Under the given constraints, can anyone see an algorithm to store strings most efficiently in the array while keeping the linear lookup time?

Array to string conversion on array_map

I’m using array_map to sort out an array with all of my _octopud_id’s.

var_dump($ offices): Returns the following:

array (size=500)   0 =>      array (size=1)       'id' => string '1382' (length=4)   1 =>      array (size=1)       'id' => string '1330' (length=4) 

I need to input that array integer into the ’employees/’ section but I can’t figure out how – If I hardcode employees/6 I get the following result:

object(stdClass)[14592]   public 'id' => int 6 

What could I be doing wrong? I keep getting the Notice: Array to string conversion error on the $ results = line.

/* Return an array of _octopus_ids */ $  offices = array_map(     function($  post) {         return array(             'id' => get_post_meta($  post->ID, '_octopus_id', true),         );     },     $  query->posts );  var_dump($  offices);  $  results = $  octopus->get_all('employee/' . implode($  offices)); var_dump($  results); 

What constitutes a minimal-sum section of an integer array

I’m having trouble understanding what constitutes a “minimal sum section” of an integer array. My book defines it as the following:

Let $ a[0],\dots, a[n-1]$ be the integer values of an array $ a$ .
A section of $ a$ is a continuous piece $ a[i],\dots,a[j]$ , where $ 0\le i \le j < n$ . We write $ S_i,_j$ for the sum of that section: $ a[i] + a[i+1]+\dots+a[j]$ .
A minimal-sum section is a section $ a[i],\dots,a[j]$ of $ a$ such that the sum $ S_{i,j}$ is less than or equal to the sum $ S_{i’,j’}$ of any other section $ a[i’],\dots,a[j’]$ of $ a$ .

My confusion comes with one of the examples that follow this definition:

The array [1,-1,3,-1,1] has two minimal-sum sections [1,-1] and [-1,1] with minimal sum 0.

But, wouldn’t the minimal sum section be $ [-1]$ ?

In a later example they give:

array $ [-1,3,-2]$

Minimal sum

section $ [-2]$

So, in the last example they definitely counted one element as the minimal sum section, but not in the first one. Any clarification on why this is so would be greatly appreciated.

Minimum increment/decrement to change an array into non-decreasing sequence

I was trying to solve a codeforces problem https://codeforces.com/contest/713/problem/C.

The solution that I thought of was naive(I am a beginner in competitive coding), so I read the editorial. I understood the dynamic programming solution but then I found a blog post – https://codeforces.com/blog/entry/47821 that describes an (nlogn) solution to the problem. I tried hard to understand the details but I do not understand why they are considering slope and that too when they already have a recurrence relation at the start. After looking at the implementation, basically what they did I think was just consider the largest element found so far in the sequence and keep it as the key point. Any other element that is less than that number is changed to that number rather than considering decreasing the larger number because this reduces the chances of further modification in array. Is there something I am missing which was the point of whole mathematical slope analysis. I would appreciate if anybody could shed some light in simple manner as to what the blog tries to say or explain the why this approach is correct in simple manner. Here is the implementation

#include<stdio.h> #include<queue>  int main() {     int n, t;     long long ans = 0;     std::priority_queue<int> Q;     scanf("%d%d", &n, &t);     Q.push(t);     for(int i=1; i<n; i++)     {         scanf("%d", &t); t-=i;         Q.push(t);         if(Q.top() > t)         {             ans += Q.top() - t;             Q.pop();             Q.push(t);         }     }     printf("%lld", ans);     return 0; }  

Find two numbers in sorted array whose product is close to N

Given a sorted array of numbers, is there a good algorithm for finding two numbers in that array whose product is as close as possible to a given number N?

I know that there is a good O(n) (?) algorithm for the related problem of finding two numbers that sum to a given number N. You begin with ptr1 at the start of the array and ptr2 at the end, and if *ptr1 + *ptr2 > N you decrement ptr2, else increment ptr1.

But for multiplication, it feels possible this method could “miss” the optimal solution by overshooting it somehow? Is there a good way to solve the multiplication variant in something that isn’t just O(n^2)?

Given an array of non-negative intergers, find number of sub arrays of all sizes(1,2,..n) with sum less than k

My approach: arr is the input array, ans is the final array, ans[k] denotes number of subarrays of size k+1

int s =0,e=0,count=0,sum=arr[0]; int ans[n]; memset(ans,0,sizeof(ans)); while(s<n && e<n) {     if(sum<=k)     {         e++;                 if(e>=s)         {          ans[e-s-1]+=1;         }          if(e<n)         sum+=arr[e];     }     else     {         sum-=arr[s];         s++;     } }   for(int i=n-2;i>=0;i--)   ans[i] += ans[i+1];  for(int i=0;i<n;i++)  cout << ans[i] << " ";    

}

I don’t understand what are the cases that I may be missing here.

How to get an array of custom blocks by block name

Scenario: I have created a custom block that outputs a list of posts, with control over number of posts, and the taxonomies where the posts can be selected from.

The custom block is nested inside a custom “row” block, and further inside a core “column” block. The registered id of the custom block is e.g. ‘xx/dyno-list’

Need: I need to extract a list of the posts that have been assigned to the custom block via grabbing the data attached to each block, e.g. an array of posts.

Perhaps using something like: wp.data.select('core/blocks').getBlockTypes('tr/dynamic-list') which does not work…

Each custom block has a unique “name” attribute, e.g. “block_one”, so I need to be able to grab the list of custom blocks, i.e. ‘xx/dyno-list’ and then grab the lists of posts within that specific block.

I need this so as to be able to “de-dupe” the posts list between custom post list blocks.

Question: How to grab a list of the custom blocks by their registered name and then by the attribute name

Remove x last elements of an array and reinsert them before position y

I am looking for an algorithm to move a subarray in before an element (which is not part of that subarray) in the same array where the last element of the subarray is the last element of the array with O(1) extra space and O(n) runtime.

e.g. where *p1 = 5 and *p2 = 3:

1 2 5 6 7 3 4

becomes

1 2 3 4 5 6 7

This is what I have so far (written in C programming language). Trouble arises when p1 reaches p2.

void swap(long* p1, long* p2, long* array_end) {   long* p2_i = p2;   while (p1 < array_end) {     if (p2_i > array_end) {       p2_i = p2;     }      // swap *p1 and *p2_i     long* temp = p1;     *p1 = *p2_i;     *p2_i = temp;      ++p1;     ++p2_i;   } }