Space complexity of using a pairwise independent hash family

I’m trying to analyze the space complexity of using the coloring function $ f$ which appears in "Colorful Triangle Counting and a MapReduce Implementation", Pagh and Tsourakakis, 2011,

As far as I understand, $ f:[n] \rightarrow [N]$ is a hash function, that should be picked uniformly at random out of a pairwise independent hash functions family $ H$ . I have a few general questions:

  1. Does the space complexity required by $ f$ is affected by the fact that $ H$ is $ k$ -wise independent? Why? (if it does, then also- how?)
  2. What do we know about $ |H|$ ? What if $ H$ is $ k$ -wise independent?
  3. Is there a more space-efficient way to store $ f$ than storing an $ N \times m$ matrix that maps each vertex to its color, using O($ N m$ ) storage words?
  4. Does the total space complexity which is required in order to use $ f$ as described in the paper is $ |H| \cdot O(\text{space complexity of } f)$ ?

Best regards

Why can’t Hash Suite see any username/hash pairs in my SAM file?

I recently started experimenting with Hash Suite 3.5.1 – a Windows program that tests the security of password hashes.

A problem I’m already running into is that Hash Suite is only able to see the username and hashes on my Windows 10 laptop but not my Windows 10 desktop. The main difference (that I can see) between the two PCs is that my laptop has BitLocker enabled! There must be something else that I’m missing here, related to the SAM file version and behaviour.

![enter image description here


I can see my usernames in Hash Suite when using the "Import: Local accounts" option.

I haven’t been able to test this against an offline copy of my laptop’s SAM file due to BitLocker making it more complicated to extract the SAM file (as Windows locks it when booted) but I will try to test this scenario soon.


An offline version of the SAM file reveals no username/hash pairs.

When attempting to import local accounts from within Windows (something that works on the laptop), I get the following error:

enter image description here

LM and NTLM are both greyed out when selecting the offline copy of my SAM file:

enter image description here

Does anyone have any ideas why these two different Windows 10 systems are behaving differently?

Is this method of 32 char hash generation secure enough for online-based attacks?

A fellow developer and I have been having a discussion about how vulnerable a few different methods of developing a hash are, and I’ve come here to see if smarter people than I (us?) can shed some light.

In PHP, I feel the below is secure ENOUGH to generate as 32 character value that could not be reasonably broken via online attack. There are some other mitigating circumstances (such as in our specific case it would also require the attacker to already have some compromised credentials), but I’d like to just look at the "attackability" of the hash.


The suggested more secure way of generating a 32 character hash is:


I acknowledge the first hash generation method is not ABSOLUTELY SECURE, but for an online attack I think being able to guess the microtime (or try a low number of guesses), and know the MD5 was shuffled and/or find a vulnerability in MT which str_shuffle is based on is so low as to make it practically secure.

But I would love to hear why I’m a fool. Seriously.

EDIT — This is being used as a password reset token, and does not have an expiry (although it is cleared once used, and is only set when requested).

Are there any security implications to changing the base of a hash?

File this under "gosh, I’d hope not," but I’ve been surprised before.

I have a hash (SHA-1 specifically). I need to store the full hash, but due to constraints outside my control, I can only use a limited number of characters to store it. I chose to change to base-36 to make the hash a shorter length.

Changing to base-36 makes the number of characters vary (between 27 and 31 instead of the original 40). This made me wonder about possible security implications of changing the base. Are there any?

John the Ripper: Cannot extract hash from PDF because Python keeps opening?

I’m having a really strange issue. I’m attempting to extract a hash from a user-password encrypted .pdf with John the Ripper’s pdf2john tool, but every time I run the command:

c:\...\run\ mypdf.pdf 

My Python IDE (Visual Studio Code) opens up the file and the following appears in the command line:

[main 2020-06-18T10:02:06.775Z] update#setState idle (node:15044) Electron: Loading non context-aware native modules in the renderer process is deprecated and will stop working at some point in the future, please see for more information [main 2020-06-18T10:02:36.776Z] update#setState checking for updates [main 2020-06-18T10:02:36.934Z] update#setState downloading 

Any ideas on how to stop my IDE from opening up and having the command actually work as expected? The latest version of Perl is installed on my machine.

Using the hash of the user’s password for encrypting/decrypting an E2E encryption/decryption key: is this good practice? [migrated]

I am developing a zero-knowledge app, meaning the data is encrypted in the client before it’s transmitted (over SSL) and decrypted after the data is received. If the database is ever compromised, without the user’s decryption keys the attacker knows nothing.

Of course, when the app is hosted on a web server, an attacker could still inject malicious scripts, but that’s always a risk. The idea is that the user data is encrypted by default. As long as no malicious code was added to the client code, the server should not be able to obtain the user data.

The title summarises how I intended to do this, but actually it’s a bit more convoluted:

  • On account registration, a secure random string is generated as (AES) encryption key (could also be private/public key generation here I guess). Let’s call this key K1.
    • All data will be encrypted/decrypted (e.g. using AES) with this key.
  • The plain text password is hashed to create another key. Let’s call this K2 = hash(plain password) (for example using SHA256)
    • K2 is used to encrypt K1 for secure storage of the key in the remote database in the user profile.
    • If the user changes his password, all that needs to be done is re-encrypting K1 with K2 = hash(new password), so not all the data has to be decrypted and re-encrypted.
    • K2 is stored in localStorage as long as the user is authenticated: this is used to decrypt K1 at bootstrap.
  • K2 is hashed again to generate the password that is sent to the API: P = hash(K2) (also using SHA256 for example)
    • This is to prevent that the decryption key K2 (and therefore, K1) can be deduced from the password that the API/database receives.
  • In the API, the password P that is received is hashed again before it is compared/stored in the database (this time with a stronger function such as bcrypt).

My question is: does this mechanism make sense or are there any gaping security holes that I missed?

The only downsides that I see are inherent to zero-knowledge, E2E encrypted apps:

  • Forgotten password = all data is lost (cannot be decrypted). This is why the user is recommended to write down the encryption key K1 after creating the account: then the data can always be recovered.
  • Searching, indexing, manipulating, analysing the data is limited because everything has to be done client-side.

How will Apple and Google provide 5-minute data on Covid exposures using 10-minute interval numbers in the hash?

The goal of COVID-19 exposure notification is to notify people that they were exposed to someone who later tested positive for the virus. Protecting privacy in this process requires some cryptography, and avoiding excessively granular detail on user locations. But providing data useful for disease prevention requires adequate detail in measuring the length of exposures.

There is a new API for such exposure notification from Apple and Google, but it has a tension between 5- and 10-minute numbers that I don’t see how to resolve.

The cryptography specification, v1.2.1, specifies 10-minute intervals as inputs to the hash: “in this protocol, the time is discretized in 10 minute intervals that are enumerated starting from Unix Epoch Time. ENIntervalNumber allows conversion of the current time to a number representing the interval it’s in.”

Meanwhile the FAQ, v1.1, specifies 5-minute increments in the output: “Public health authorities will set a minimum threshold for time spent together, such that a user needs to be within Bluetooth range for at least 5 minutes to register a match. If the contact is longer than 5 minutes, the system will report time in increments of 5 minutes up to a maximum of 30 minutes to ensure privacy.”

How will the system report times in 5-minute increments when the interval numbers are only updated for the hash once every 10 minutes?

Why is a hash sent with data secure?

Hash values are also useful for verifying the integrity of data sent through insecure channels. The hash value of received data can be compared to the hash value of data as it was sent to determine whether the data was altered. Source.

A common approach used in data transmission is for the sender to create a unique fingerprint of the data using a one-way hashing algorithm. The hash is sent to the receiver along with the data. The data’s hash is recalculated and compared to the original by the receiver to ensure the data wasn’t lost or modified in transit. Source.

If the hash of some data is computed and sent along w/ the data, can’t an attacker alter the data, re-compute the hash and the receiver would be none the wiser?