Why General-Purpose Hash Functions Fail for Passwords

Why General-Purpose Hash Functions Fail for Passwords

The fundamental issue with MD5, SHA-1, and even SHA-2 for password storage is their speed. These algorithms were designed to hash large amounts of data quickly, making them excellent for file integrity checking or digital signatures. However, this speed becomes a vulnerability for password hashing, where we want to slow down attackers attempting billions of guesses. Modern GPUs can compute over 100 billion MD5 hashes per second, making even complex passwords vulnerable to brute force attacks.

import hashlib
import time

# Demonstrating the speed of general-purpose hash functions
def benchmark_hash_speed(hash_func, iterations=1000000):
    start_time = time.time()
    
    for i in range(iterations):
        hash_func(f"password{i}".encode()).hexdigest()
    
    elapsed = time.time() - start_time
    hashes_per_second = iterations / elapsed
    
    return hashes_per_second

# Test different hash functions
print(f"MD5 speed: {benchmark_hash_speed(hashlib.md5):,.0f} hashes/second")
print(f"SHA1 speed: {benchmark_hash_speed(hashlib.sha1):,.0f} hashes/second")
print(f"SHA256 speed: {benchmark_hash_speed(hashlib.sha256):,.0f} hashes/second")

Hardware acceleration exacerbates the speed problem. Specialized hardware like ASICs (Application-Specific Integrated Circuits) can compute specific hash functions at extraordinary rates. Bitcoin mining hardware, essentially SHA-256 ASICs, demonstrates this capability—modern mining rigs perform trillions of hashes per second. While password hashing doesn't face identical ASIC threats, GPU acceleration provides similar advantages to attackers.

The lack of memory requirements in traditional hash functions enables massive parallelization. Attackers can run thousands of simultaneous hash computations on GPUs, each testing different password candidates. This parallelization multiplies effective cracking speed, making even seemingly complex passwords vulnerable. Password-specific hash functions address this by requiring significant memory per computation, limiting parallelization.