## Can’t enter BIOS after drive compression

So I decided to compress everything on my main drive which was a horrible idea.

Now I can’t enter my BIOS anymore. I tried:

• Pressing keys on startup
• Running `shutdown /s /fw` in cmd but I get this error `Boot to firmware UI is not supported by this system's firmware.(1)`
• Unchecking the “Compress this drive to save space” checkbox (which took like 24 hours and most of my files still have that blue compressed icon on them, so maybe the decompression failed?)

Any idea what is wrong and how I can fix this so that I can enter my BIOS on startup? I could startup to my BIOS by pressing keys just fine, but after the compression I can’t enter my BIOS anymore. The compression is definitely the problem (I think)

## An interesting compression algorithm idea

I stumbled on this paper about compressing images using a kind of odd, but interesting idea. It seems pretty easy to implement and I was wondering if there is something wrong with it that I haven’t picked up.

The main idea is that every possible number combination can be represented by an index in an imaginary list of all possible combinations.

So if we have 4 numbers that range from 0 to 255 e.g. [45, 129, 12, 32], we can pretend that it is a base 256 number, in the same way that [0, 1, 0, 1] is the base 2 number with index 5 (decimal value 5). So if we find the index of our original value then we could use that to represent it and thus achieve compression.

An example is provided where we have four pixels each one able to hold one of 3 possible color values:

So we end up representing 4 integers with just 1 (their index in all possible combinations)

This is an old paper though with no citations that I could find, but the idea seems appealing enough that I would like to test it. Is there anything wrong with the idea in general though? Anything I missed that makes it not actually implementable as a real compression algorithm.

Note: One would obviously have to split the image into windows and work on those segments individually using variable precision arithmetic since the index for a combination of values to low on the table would be huge.

## Can I increase strength of ECB by decorrelating input data with compression

I know that ECB block cipher mode is the weakest method of encryption because repetitions in input data with a stride of a block size lead to repetitions in output data.

However, as I may consider, elimination of repetitiveness should give a stronger result like for other modes (CBC, etc.). In addition, I know that compression algorithms are just designed specially for searching and removing certain types of correlations.

Let’s consider input data are compressed using DEFLATE algorithm via zlib 1.2.11 with the highest compression level possible (9). How does such an approach increases the strength of ECB?

Update: My goal is to increase encryption speed by parallelization. I receive UTF-8 texts already deflated as specified in the previous paragraph, encrypt it with a user-supplied key, and put it to cache storage in a user folder.

## How does a predictive coding aid in lossless compression?

I’m working on this lab where we need to apply a lossless predictive coding to an image before compressing it (with Huffman, or some other lossless compression algorithm).

From the example seen below, it’s pretty clear that by pre-processing the image with predictive coding, we’ve modified its histogram and concentrated all of its grey levels around 0. But why exactly does this aid compression?

Is there maybe a formula to determine the compression rate of Huffman, knowing the standard deviation and entropy of the original image? Otherwise, why would the compression ratio be any different; it’s not like the range of values has changed between the original image and pre-processed image.

Liam.

## Is it possible to achieve greater than perfect compression using machine learning and big data?

Imagine Google wanted to make their chrome browser faster. Let “database” be all the machines which serve content from Google’s servers, including Search and Google cloud services. Google begins using machine learning to analyze thousands of petabytes of data on their database to extract a dictionary of the most redundant file sequences. Lets say the most redundant hexadecimal found was the sequence:

“64 65 63 69 6d 61 6c 20 66 6f 72 6d 61 74 2e 20 43 6f 6d 70 75 74 65 72 73 20 73 74 6f 72 65 20 74 65 78 74 20 61 73 20 6e 75 6d 62 65 72 73 2c 20 61 6e 64 20 77 69 74 68 20 68 65 78 20 79 6f 75 20 64 69 73 70 6c 61 79 20 74 68 65 20 6e 75 6d 62 65 72 73 20 6e 6f 74 20 61 73 20 61 20 64 65 63 69 6d 61 6c 20 6e 75 6d 62 65 72 2c 20 62 75 74 20 69 6e 20 62 61 73 65 20 31 36 2e 20 48 65 78 20 6f 72 20 62 61 73 65 20 31 36 20 6f 72 20 68 65 78 61 64 65 63 69 6d 61 6c 20 69 73 20 61 20 6e 75 6d 65 72 61 6c 20 73 79 73 74 65 6d”

They stored this sequence in their dictionary at index 1 and included this dictionary file in their Chrome update. For every user that had the latest Chrome update if they requested a file that had this sequence Google’s server would return a flag and the index 1 (e.g. “D 1”) such that the Chrome browser could lookup the bytes locally without the need of transferring them over the network. In essence the file being transferred would become something like “data data data D 1 data data”

Furthermore, they did this for gigabytes of sequences causing a larger Chrome update but faster file transfer times.

And to make things more interesting they used machine learning to analyze end-user behavior to predict which sequences were more likely to be requested.

After adding this functionality they achieved faster load times and decreased network congestion and possibly saved in electricity costs by removing redundancy across their entire network.

Questions:

What would this be called? Distributed Cached file transfer? Would this be practical or desirable? I imagine it could save money in terms of electricity for a large company like google.

## Apache: handle compression for large static XML files

My site (hosted by Apache 2.4 on ubuntu 14.04) must provide some large XML files (more than 200Mb). I choose to compress them to speed up the download process (.tar.gz) but recently my users need the flat version (no compression). Would it be safe to enable gzip compression for xml files and left them uncompressed? I mean, for small XML files, Apache effort should be insignificant but with large files?

## Compact transfer of partially empty blockdevice images for further compression

I like to do backups from a bunch of linux block devices on remote machines. Currently I am transferring the complete block device, albeit almost empty, to the backup machine and compress or deduplicate it afterwards (Multiple methods are in use for that).

Now I encounter images to large for the backup machine and transfering GBs of zeros is not useful either.

Compressing the whole data isn’t useful either, as it would prevent deduplication and useful incremental indexing at the backup compression process.

So how can I compact the block device data, preferably without installing further tools despite standard linux / GNU utilities, without compressing the non-zero data?

## RFC 1951 compression LZ77 re-hashing approach

I have written a C# class to perform LZ77 compression ( per RFC 1951 ), to incorporate in my RFC 1951 deflate implementation ( posted on Jan 2 ).

The usual method (per RFC 1951) is to use a 3-byte hash to get started, then iterate through a chain of earlier 3-byte matches. However this can be slow if the chains become long. ZLib has heuristics (which I don’t fully understand!) to shorten the search, RFC 1951 suggests “To avoid a worst-case situation, very long hash chains are arbitrarily truncated at a certain length, determined by a run-time parameter.”

That all seems a little messy, so I decided to see if a “re-hashing” approach would work : suppose the string ABC is already recorded, and we encounter a new ABC, the new position of ABC will in future be used in preference to the old (position) for coding ABC (as it’s “closer”) so we re-hash the old position as a 4-byte hash instead. If there is already a matching 4-byte entry, then the lesser position becomes a 5-byte hash, etc.

Example: given input string ABCABCDABC, the hash dictionary updates as follows ( each section delimited by ‘;’ corresponds to the input position advancing by one byte ):

0(ABC); 1(BCA); 2(CAB); 0(ABCA), 3(ABC); 4(BCD); 5(CDA); 6(DAB); 3(ABCD), 7(ABC);

Here’s the code:

``class Matcher {   public Matcher ( int n )   {      Hash = new Entry[ n * 5 ]; // A larger table reduces the number of hash collisions, but uses more memory.   }    private const int MinMatch = 3, MaxMatch = 258,  MaxDistance = 32768; // RFC 1951 limits.    public int Match( byte [] input, int position, uint hash, out int matchDistance, bool look )   // Record the specified input string starting at position for future matching, and (if look=true)    // look for the longest prior match. hash is a function of MinMatch bytes of input starting at    // position ( can simply be the concatenated bytes ).   {     int matchPosition = InsertLook( input, position, hash, MinMatch );     matchDistance = position - matchPosition;      if ( ! look || matchPosition < 0 || matchDistance > MaxDistance ) return 0;      int match = MinMatch;      int bestMatch = ExtraMatch( input, position, matchPosition, match );      // Look for a match longer than MinMatch.     while ( bestMatch < MaxMatch && position + bestMatch < input.Length )     {       hash += hash * 4 + input[position+match];       match += 1;       matchPosition = Look( input, position, hash, match );       if ( matchPosition < 0 || position - matchPosition > MaxDistance ) break;       int newMatch = ExtraMatch( input, position, matchPosition, match );       if ( newMatch > bestMatch )        {          matchDistance = position - matchPosition;          bestMatch = newMatch;        }     }     return bestMatch;   }    private Entry [] Hash; // The Hash table used to find a matching string.    private class Entry // Alternatively make Hash an array of positions and also store Length and Next in arrays.   {      public int Position, Length; // Position and Length of input substring.     public Entry Next; // To handle hash collisions.     public Entry( int position, int length, Entry next ){ Position = position; Length = length; Next = next; }   }    private int Look( byte[] input, int position, uint hash, int length )   // Look in the hash table for a match of specified length.   {     uint hashindex = hash % (uint) Hash.Length;     for ( Entry e = Hash[ hashindex ]; e != null; e = e.Next )     {       if ( e.Length == length && Equal( input, position, e.Position, length ) ) return e.Position;     }     return -1;   }    private int InsertLook( byte[] input, int position, uint hash, int length )   // Look in the hash table for the specified string, and also insert the specified string into the table.   // If there is a match with an existing string, it's position is returned, and the existing string is re-hashed.   {     uint hashindex = hash % (uint) Hash.Length;      for ( Entry e = Hash[ hashindex ]; e != null; e = e.Next )     {       if ( e.Length == length && Equal( input, position, e.Position, length ) )          {         int result = e.Position;         e.Position = position;         hash += hash * 4 + input[ result + length ];         Rehash( input, result, hash, length + 1 );         return result;       }     }     Hash[ hashindex ] = new Entry( position, length, Hash[ hashindex ] );     return -1;   }    private void Rehash( byte[] input, int position, uint hash, int length )   {     while ( true )     {       uint hashIndex = hash % (uint) Hash.Length;       Entry e = Hash[ hashIndex ];        while ( true )       {         if ( e == null )         {           Hash[ hashIndex ] = new Entry( position, length, Hash[ hashIndex ] );           return;         }         else if ( e.Length == length && Equal( input, position, e.Position, length ) )            {           int eposition = e.Position;           if ( eposition < position ) // The smaller position is rehashed.           {             e.Position = position;             hash += hash * 4 + input[ eposition + length ];             position = eposition;           }           else           {             hash += hash * 4 + input[ position + length ];           }            length += 1;            break;         }         e = e.Next;       }     }   }    private static int ExtraMatch( byte [] input, int p, int q, int match )   {     int limit = input.Length;     if ( limit - p > MaxMatch ) limit = p + MaxMatch;     while ( p + match < limit && input[ p + match ] == input[ q + match ] ) match += 1;     return match;   }    private static bool Equal( byte [] input, int p, int q, int length )   {     for ( int i = 0; i < length; i+=1 ) if ( input[ p + i ] != input[ q + i ] ) return false;     return true;   } } ``

ZLib (on default settings) seems to be faster ( although I have not tried to fully optimise my code yet), but at the expense of less compression ( since truncating the search means it doesn’t always find the longest LZ77 match ).

For example, my code compresses the file FreeSans.ttf from 264,072 bytes to 146,542 bytes, the ZLib compressed length is 148,324 bytes.

Questions:

(1) What do you think of this approach compared to the ‘standard’ approach, are there any hidden issues I haven’t foreseen?

(2) Re-hashing seems a fairly obvious and natural idea, have you seen it before?

(3) Would using arrays instead of the Entry record be a good idea?

(4) I am thinking of using two hash tables, “sliding” every 16kb of input, to reduce hash collisions when the input is more than 32kb long ( and also reduce memory usage), would that be a good idea?

(5) Any other suggestions?

## Do lossless compression algorithms reduce entropy?

According to Wikipedia:

Shannon’s entropy measures the information contained in a message as opposed to the portion of the message that is determined (or predictable). Examples of the latter include redundancy in language structure or statistical properties relating to the occurrence frequencies of letter or word pairs, triplets etc.

So entropy is a measure of the amount of information contained in a message. Entropy coders are used to losslessy compress such a message to the minimum number of bits needed to represent it (entropy). To me this looks like a perfect entropy encoder would be all that is needed to losslessy compress a message as much as possible.

Many compression algorithms however use steps before entropy coding to supposedly reduce the entropy of the message.

According to german Wikipedia

Entropiekodierer werden hÃ¤ufig mit anderen Kodierern kombiniert. Dabei dienen vorgeschaltete Verfahren dazu, die Entropie der Daten zu verringern.

In english:

Entropy coders are frequently combined with other encoders. Previous steps serve to reduce the entropy of the data.

i.e. bzip2 uses the Burrows-Wheeler-Transform followed by a Move-To-Front-Transform before applying entropy coding (Huffman coding in this case).

Do these steps really reduce the entropy of the message, which would imply reducing the amount of information contained in the message? This seems contradictory to me, since that would mean that information was lost during compression, preventing lossless decompression. Or do they merely transform the message to improve the efficiency of the entropy coding algorithm? Or does entropy not correspond directly to the amount of information in the message?

## String compression function in python code

I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. Can someone suggest a better way to do this?

``     s=input("Enter the string:")         temp={}         result=" "         for x in s:             if x in temp:                 temp[x]=temp[x]+1             else:                 temp[x]=1         for key,value in temp.items():             result+=str(key)+str(value)                 print(result) ``