Compressed Trie Run-time(s)

Consider a set $ X$ $ =$ $ \{x_1, x_2, …, x_n\}$ where every $ x_i$ is a positive integer and $ x_i \leq n^f$ for some constant $ f$ . We represent $ x_i$ as strings over an alphabet $ 0, 1, …, 2^t – 1$ (in other words, representing each $ x_i$ in base $ 2^t$ ) and store all $ x_i$ from $ X$ in a compressed trie $ T$ . For every node $ u$ of $ T$ we keep an array $ A(u)$ of size $ 2^t + 1$ . $ A(u)[i]$ contains a pointer to the child $ u_i$ of $ u$ such that the edge from $ u$ to $ u_i$ is marked with $ i$ ; $ A(u)[i] = NULL$ if there is no such $ u_i$ in $ T$ . Specify all of the following in big-$ O$ in terms of $ f$ , $ n$ , and $ t$ (as necessary). What is the maximum height of $ T$ ? What is the total space usage of $ T$ and all $ A(u)$ ? What time is needed to search for a key $ k$ in $ T$ ?

Yandex not crawling compressed sitemap index

I have submitted a sitemap index file (one that links to other sitemaps that contain the actual URLs search engines are instructed to crawl). It is GZip compressed.

Using the Yandex sitemap validation tool it tells me it is valid and has 202 links and no errors.

However, in Yandex Webmaster it shows up with a small, grey sign in the status column. When clicked it says ‘Not indexed’.

Yandex is not indexing the URLs provided in the file, which are all new. Though it states it has consulted the sitemap.

Any ideas what may be wrong?

Is there an algorithm to compress a string array represented as pointers to a long string to pointers with a compressed version of the long string?

In a program I am writing, I represent an array of strings as a long string and have pointers point at the various substrings to present my array. E.g.

str_array = struc string_array    long_str = "abcdefab"    pointer_array = [(start = 0, len = 3), (start = 3, len = 3), (start = 6, len=2)) end 

So str_array = ["abc", "def", "ab"], but notice that I can actually compress the long string so by getting rid of “ab” at the end. E.g.

str_array2 = struc string_array    long_str = "abcdef"    pointer_array = [(start = 0, len = 3), (start = 3, len = 3), (start = 0, len=2)) end 

and note that str_array2 is also ["abc", "def", "ab"] === str_array.

What’s this type of compresssion callled in computer science? I assume there’s already literature on this type of algorithms?

Offering a compressed file for download that’s around 1.2 GB, should we split it into smaller parts?

The file contains thousands of PDFs and will be posted on a government website. The primary audience is researchers and the media, desktop, not mobile. We plan to indicate the filesize. Given 1.2 gb is pretty big, are there any reasons we should split the file into smaller parts? Asking because I’m just not sure.

Can data be compressed through this hash function technique?

I’d like to know if this data compression scheme would work or not, and why:

Suppose we have a file. If we treat the bits that make up the file as the binary representation of a number n, we have n (of course, if the first bit is zero we flip every bit so that n is unique). Now we have the number n, and a boolean that informs us whether to flip all the bits of the binary representation of n or not.

My idea was approximating n from below (e.g. finding a relative big number raised to a relative big power, such as 17^6038) and then start to compute arbitrary hashes for all numbers from this approximated n to the real n, counting the number of collisions. When we finally get to n, we have the “collision state” of the hashes and then we output the compressed file, which basically contains information about how to get to the approximation of n (e.g. 17^6038) and the “collision state” for n (note that this “collision state” must also occupy very few bits, so I’m not sure this would be possible).

The decompression procedure would do a very similar process; it will approximate n (e.g. compute ~n as 17^6038) and then start to hash (i.e. apply a function and check the result) every single number (we could also check every 5 numbers or another divisor of n – ~n) until the “collision state” is the same as the specified in the compressed file. Once we match everything, we have n. Then, it would just be a matter of flipping every bit or not (as specified in the compressed file) and outputting to a file.

Could this work? The only problem I can think of is (besides the time required for processing) the number of collisions being extremely huge.

What is the difference between Memory, Real Mem, and Compressed Mem?

I’ve already seen this question:

  • What's the difference between Real, Virtual, Shared, and Private Memory?

but I think it might be outdated. Specifically, there is a Memory column, as well as Real Mem and Compressed Mem column. What is the difference, and why would the Real Mem ever be smaller than Compressed Mem?

(I’m using macOS Sierra 10.12, but I think I’ve seen this in slightly older versions as well.)

Activity monitor screenshot

Skimage Cannot find _tiffile module – Loading of some compressed images will be very slow

I am trying to run faster-RCNN on an Nvidia Xavier. I have followed this guide here and the process went fine. However, whenever attempting to run the demo I get this error:

/usr/local/lib/python2.7/dist-packages/skimage/external/tifffile/  UserWarning: ImportError: No module named '_tifffile'.  Loading of some compressed images will be very slow. Tifffile.c can be obtained at "ImportError: No module named '_tifffile'. " 

I have run pip install -U scikit-image and pip install -U tifffile to make sure they’re up to date. In that path there is a but it is not being imported. In the directory above it I attempted to run python install but it fails saying it’s missing tifffile.c. When I follow the link from the warning I cannot find Tifffile.c.

In, this snippet is causing the issue:

    if __package__:         from . import _tifffile     else:         import _tifffile except ImportError:     warnings.warn(         "ImportError: No module named '_tifffile'. "         "Loading of some compressed images will be very slow. "         "Tifffile.c can be obtained at") 

Any help is appreciated!