Find the flaw in my architecture: Shamir’s Secret implementation for data encryption and recovery


This will be a long one.

Here’s the thing: I want to build a privacy-preserving system where the user data is not even accessible to the database administrator.

Intuitively, I immediately thought of simply using AES to encrypt user data with their own password and hashing their username so that an attacker with access to my database would need to brute-force the password for the encrypted data to get the info and then brute-force the username to maybe get an idea of who the decrypted data is about.

This would be great but leads to the problem of forgotten passwords. If one forgets their password they could reset it by providing the correct username or recovery email (also hashed), but they could not get their data back. ProtonMail, for instance, claims your data is safe even from them, but you cannot recover your emails if you forget your password.

I then started looking at secret sharing and came across Shamir’s secret. My question therefore is: Is the system I propose below worse than simply storing data in plaintext with obfuscated (hashed) usernames?

I understand that:

  1. Security does not come with complexity
  2. This system will not be entirely flawless

However, I just want to know if it is any better than a much simpler solution. Because as long as it is equally easy/hard for a hacker but harder for the database admin to gather any info from the data, it would be worth it for me.

It is “complex” because it is the only system my mind has currently come up with that allows for data encryption + somewhat simple recovery protecting data from hackers and admins. I would also happily take suggestions for other implementations.

So here we go.

The proposed system would use Shamir’s secret to encrypt the user data with k=6 and n=11 so that 6/11 parts are needed to decrypt the data. User information would then be given a “weight” and utilized to store a proportional number of parts in an encrypted manner. Something like this:

Weights

  • Username: 2
  • Password: 4
  • Email: 2
  • Security Question 1: 1
  • Security Question 2: 1
  • Name + Date of Birth: 1

Based on those weights, the following is done to the user’s private data (pseudocode):

SHAMIR(user_data, k=6, n=11)

This will produce something like a uint8 array with length=11. Let’s call that array parts.

The database would then use symmetric encryption (let’s say AES) to store these parts as follows (only the resulting ciphertext is stored):

{   username: AES(key=username, message=parts[0:2])   password: AES(key=password, message=parts[2:6])   email: AES(key=email, message=parts[6:8])   seq1: AES(key=answer, message=parts[8:9])   seq2: AES(key=answer, message=parts[9:10])   id: AES(key=name+dob, message=parts[10:11]) }  

Login would then happen with the traditional username+password or email+password, such that the user will be authenticated/logged in if the data is decrypted correctly. Both combinations give access to enough parts (6) to decrypt the data. From the user perspective, it’s all the same as everywhere else.

Then, user forgets their password. Well, now they need to find an alternative way to gather the 4 “points” provided by the password. So they would click “Forgot Password”, and a form would pop up with all the possible fields to fill in. They must then fill enough to gather 4 more parts (in addition to username or email) in order to decrypt their data. For example:

username (2) + email (2) + seq1 (1) + namedob (1) = 6

(Email verification could also be implemented)

So now the user has 6/11. Server decrypts the data, user sets a new password, data is re-encrypted, and all the fields are updated with the new parts. By definition, a user who forgot their password will have accumulated a minimum of 10 out of 11 “points” after password reset is complete (The 6 points they provided + the 4 from the new password). Therefore, 1 point could be missing. Given that the user cannot provide that last point, they can be prompted to add a new security question, at which point all is back to normal.

So, in conclusion:

I know all parts of the secret being in the same place is not great, nor is it great to use AES with low-entropy secrets.

However, this should add some security, no? To get the data, an attacker would have to brute force at least a password and a username, or, to not brute-force the password, would have to brute-force quite a bit of other data. It isn’t perfect by any means, but it’s better for data privacy than the standard, no? What am I missing? Assuming it’s implemented perfectly and it works as intended, is it possibly worse than how companies treat our data today? For most, a database breach means the data is already out there, only the password has to be brute-forced, right?

Lastly, could these objectives be achieved in any other way?

That’s it. If you’ve read until now, thank you. Please go easy on me.

Cheers.

EDIT: I’m also thinking somewhat about UX here. The entropy of the data used to store the parts is definitely low, but giving users a higher-entropy “random recovery code” or something would be problematic from a UX perspective.