Android: How safe is PBKDF2 with a 4 digit pin?

Our Product Manager wants a 4 digit pin for login in our app, obviously for UX reasons, so user don’t have to remember their password each time when they login.

A refresh token can be retrieved from backend to obtain a session token, which have access to the API. On our app, we encrypt the refresh token with AES and PBKDF2. A random salt and IV are generated plus the 4 digit used as password for PBKDF2.

After the encryption, I store the salt, IV and the cipher text base64 encoded in private shared preference.

The encryption code looks like this:

const val CPR_TRANSFORMATION = "AES/CBC/PKCS7Padding" const val ALGORITHM_TYPE = "PBKDF2WithHmacSHA1" const val ITERATION_AMOUNT = 12000 const val KEY_SIZE = 256  private fun encrypt(passCode: String, data: ByteArray): Encrypted { //e.g.: passCode = "0000"     val salt = ByteArray(256)     SecureRandom().nextBytes(salt)      val iv = ByteArray(16)     SecureRandom().nextBytes(iv)      val cipher = Cipher.getInstance(CPR_TRANSFORMATION)     cipher.init(Cipher.ENCRYPT_MODE, getSecretKey(passCode, salt), IvParameterSpec(iv))     val raw = cipher.doFinal(data)     return Encrypted(salt.encodeBase64(), iv.encodeBase64(), raw.encodeBase64()) }  private fun getSecretKey(passCode: String, salt: ByteArray): Key {     val pbKeySpec = PBEKeySpec(passCode.toCharArray(), salt, ITERATION_AMOUNT, KEY_SIZE)     val keyBytes = SecretKeyFactory.getInstance(ALGORITHM_TYPE).generateSecret(pbKeySpec).encoded     return SecretKeySpec(keyBytes, KeyProperties.KEY_ALGORITHM_AES) } 

Now my question is: How secure is this implementation?

  • How could an attacker retrieve the refresh token from shared preference and decrypt it?
  • Is the symmetric key inside secure element?
  • How safe is this implementation against malware or root?
  • How easy can the key be brute forced? (except that user tries 10k times manually to insert the correct pin)

How does Amazon secure its 6 digit one-time password?

I noticed that Amazon’s password reset relies on a 6 digit numeric PIN. Doesn’t this reduce every user’s account to a 1 in 10^5~ chance of being accessed through brute force guess factoring in a few retries (requesting OTP resend)?

It seems that they put a captcha ahead of this and probably have some timeout where the OTP expires or unspecified limit when too many attempts will lock the account from further retries. But nevertheless this doesn’t seem like a very good idea to me. I think Google Apps uses 8 characters with multiple character sets (lowercase, uppercase, numeric, symbol), which seems like how I would implement something like this.

What are good best practices for implementing a similar password reset mechanism with 6 digit numeric PIN on my own web app? Or is this a bad idea?



Could we define the decimal notation of a natural number as an expression of multiplication and addition of single digit numbers?

We recognize that every natural number can be expressed as 10 times a natural number plus a number from 0 to 9. Take any natural number and express it that way in the form $ (10 \times x) + y$ where $ y$ is between 0 and 9. Again for $ x$ , express that number that way and keep going until the second term of the expression is 0. This shows that every natural number can be gotten by starting from 0 and repeatedly applying operations each of which is of the form of multiplying by 10 then adding a number from 0 to 9. It seems so intuitive to define the decimal notation of a number by the method of getting it in that way where you start from 0 then multiply by 10 and add the first digit then multiply by 10 and add the second digit and so on. For example, we can define the notation 122 to literally mean $ (10 \times ((10 \times ((10 \times 0) + 1)) + 2)) + 2$ .

I know that’s probably not actually the way it was defined but it turns out to be correct. Since it is correct anyway, maybe it is better to define it that way as an instruction to submit into computers. It’s not that hard to show that that definition agrees with the conventional definition. Some people might have a demand to understand why proven math results such as the statement that long division to get a quotient and remainder works but have so many things to learn and don’t want to bother understanding why that definition agrees with the conventional definition. Despite that, could we still define it that way without creating a problem for those people? They might figure out a proof that long division to get a quotient and remainder works using that definition of decimal notation works and then be like “That’s fine, I accept it only as a proof that long division works using that definition of the decimal notation of a natural number and not as proof that long division works using the conventional definition of the decimal notation of a number.”

Also, could we treat English like Python and and say something means something because we defined it to mean that, and then use Polish notation to describe expressions with symbols we just invented a meaning of? Let’s say we already have a meaning for 0 which is ∅ and can also express the successor operation $ S$ and the addition operation + in Polish notation so $ 2 + S(2 + 2)$ would be written $ +SS∅S+SS∅SS∅$ . Next we define $ 0 = ∅, 1 = S∅, 2 = SS∅, 3 = SSS∅, 4 = SSSS∅, 5 = SSSSS∅, 6 = SSSSSS∅, 7 = SSSSSSS∅, 8 = SSSSSSSS∅, 9 = SSSSSSSSS∅, X = SSSSSSSSSS∅$ . Now, the number 122 can be described in Polish notation as $ +×X+×X+×X∅122$ . If you decide to replace the operations of right addition with left addition of the same number, then the Polish notation is $ +2×X+2×X+1×X∅$ . If you have a Python like program you can then again redefine $ 0 = +0×X, 1 = +1×X, 2 = +2×X, 3 = +3×X, 4 = +4×X, 5 = +5×X, 6 = +6×X, 7 = +7×X, 8 = +8×X, 9 = +9×X$ so now the digits are defined as operations so you can now type in $ 221∅$ to mean 122. Also, $ ×2∅8001∅$ could be the way to write 2 × 1008. Although some people use the symbol ∅ in place of the symbol 0, in this case 0 and ∅ don’t mean the same thing it at all. I originally defined 0 to mean the same thing as ∅, the number zero. Later, I redefined the meaning of all the digits to be operations where as ∅ still kept its original meaning, and the new notation for a natural number does not end until you get to the character ∅ making the multiplication expression unambiguous. Then those picky mathematicians will be satisfied with knowing that computer programs programmed that way can use the following law to compute the quotient and remainder of a division problem of natural numbers using the fact that the quotient and remainder of division of a one number by another number can be determined from the quotient and remainder of the floor function of a tenth of the former by the latter. There might be a lot of them who are picky because other mistakes in computer programs have happened such as the Chess Titans glitch in the YouTube video Windows Vista Chess Titans Castling Bug. They will consider long division of natural numbers to get a quotient and remainder to be entirely a function from ordered pairs of natural numbers to ordered pairs of natural number, and will not accept that as proof of how to compute the quotient as a rational number given in mixed fraction notation.

Extending prime numbers digit by digit while retaining primality

I looked at a table of primes and observed the following:

If we choose $ 7$ can we concatenate one digit to the left so as to form a new prime number? Yes, concatenate $ 1$ to obtain $ 17$ . Can we do the same with $ 17$ ? Yes, concatenate $ 6$ to obtain $ 617$ . And with $ 617$ ? Yes, concatenate $ 2$ to obtain $ 2617$ . Then we can form $ 62617$ . And I could not continue since table gives primes with the last entry $ 104729$ .

Now some terminology. Call a prime number $ a_1…a_k$ a survivor of order $ m$ if there exist $ m$ digits $ b_1,…,b_m$ (all different from zero) so that the numbers $ b_1a_1…a_k$ and $ b_2b_1a_1..a_k$ and… and $ b_mb_{m-1}…b_1a_1…a_k$ are all prime numbers.

Call a prime number $ a_1…a_k$ a survivor of order $ + \infty$ if $ a_1…a_k$ is a survivor of order $ m$ for every $ m \in \mathbb N$ .

I would like to know:

Does there exist a survivor of order $ + \infty$ ?

(This question, with exactly the same title and content, was asked on MSE about an hour ago, and I think that I should apologize for asking here and there a same question in so a little time-interval, but, as I thought that somebody will come up very fast with an argument with which question would be decided, and that did not happen, I decided to ask it here also, so that this question receives an attention here also. Yes, it has a recreational flavor, but I hope that you like it.)

Generating fake number for a 25 digit PII number in a file containing millions of rows

I have to expose some sensitive data containing a PII column that has a 25 digit number. Rest of the columns aren’t PII data. This is done such that the data can be safely shared to the larger audience without the original PII column’s data. but if required I need to check the original value, hence a look up file is needed which maps the PII with its pseudo number.

How do I generate a pseudo unique number such that we can later map back to original data if required ?

Currently there are about 22 million rows. There could be at max 50 million such rows of data later on as the data keeps coming in.

I was thinking of the UUIDs but they aren’t really human friendly & UUIDs would be bad at indexing later on if we move to a database (Over thinking much?). Also joining two dataframes based on indexing could be slow I reckon.

My current thought process using pandas (for the first file containing 22 million rows)

  1. shuffle the lines with pandas (assuming it fits in memory)
  2. add a column with a auto incrementing field (say Psuedo_number)
  3. add another column with uuids (UUID4)
  4. Create a lookup file with our new pseudo_number, UUID, original PII number

now when new PII data comes in

  1. read the highest value of the pseudo_number from the lookup file
  2. use that (+1) as the starting number for the above process on the new data

tldr I need to generate unique random numbers for PII column in a file containing 22 millions rows & maintaining a look up file. Later would need to import into a database once system grows.

some intial code:

<!-- language: lang-py -->  # dummy list >>> l = [('C0000005', 'RB', 'C0036775', '')] * 27000000 # create a sample dataframe to represents our data of 22+ million rows >>> df = pd.DataFrame(l, columns=list('abcd')) # let the following 'sensitive_col' column represent our 25 digit number for now >>> df['sensitive_col'] = df.index + 123456789  >>> df.head()   a          b             c        d         sensitive_col 0   C0000005    RB  C0036775    D185368     123456789 1   C0000005    RB  C0036775    D185368     123456790 2   C0000005    RB  C0036775    D185368     123456791 3   C0000005    RB  C0036775    D185368     123456792 4   C0000005    RB  C0036775    D185368     123456793   #actual code!! # shuffle the rows >>> df = df.sample(frac=1).reset_index(drop=True) >>> df['New_ID'] =  df.index + 123  # create the UUIDs >>> df['uuid'] = [uuid.uuid4() for _ in range(len(df.index))]  >>> df.head()             a    b         c       d     sensitive_col   New_ID  uuid 0   C0000005    RB  C0036775    D185368     132571068   123     8c1974cf-49ff-4b87-bfac-b791156d1b1b 1   C0000005    RB  C0036775    D185368     130859684   124     2a170f08-43a9-4a1d-acf5-b537a229c7e9 2   C0000005    RB  C0036775    D185368     135318849   125     5b265c8e-35ea-4100-bac0-c77f4d3f85ea 3   C0000005    RB  C0036775    D185368     145963082   126     77e2e78c-c72a-4738-907a-9e4851a328d2 4   C0000005    RB  C0036775    D185368     141664707   127     de73b056-6c5e-4276-8b93-db44cd9990ba 

Any suggestions ?

Can we find every 216 digit number combination? [on hold]

This question popped to my head when I watched Pi where the mathematican told the Jews “Didnt you calculate every 216 digit number allready?”

So I want to find every possible 216 digit number possible.

Here is an example


What I want.

A computer that finds 216 digit numbers and puts them on hard storage.

How long would it take with todays computers like IBM Summit? How much storage would be required to store all these numbers? And also how much data storage would be needed to store all these numbers with an index system? Like Index No 547 is