First of all, the understanding I have of the
p parameter in
scrypt is that it multiplies the amount of work to do, but in such a way that the additional workloads are independent from each other, and can be run in parallel. With the interpretation of
p cleared out of the way, why is the recommended value still
1? More generally, why is it a good thing that key stretching algorithms are not parallelizable?
From the point of view of an attacker trying to crack a password, it doesn’t matter whether an algorithm is parallelizable. After all, even if the entire algorithm is sequential, the attacker can just crack several different passwords in parallel.
I understand that
scrypt being memory-hard makes it difficult to utilize GPUs for cracking. GPUs have a much greater combined computational power accross its many weak cores than CPUs, but the memory bus is about the same speed, so it levels the ground for authentic users on a CPU and attackers on a GPU.
However, subdividing an
scrypt workload that accesses 256MB of RAM into 4 different parallel
scrypt workloads, accessing 64MB each, would still consume the same amount of memory bandwidth for an attacker, therefore running at the same throughput, while running 4 times faster on a quad+ core CPU for an authentic user.
Is there any fundamental flaw in my logic? Why is the recommended value for
p = 1? Is there any downside I can’t see to increasing