With the consensus aiming towards an educated public on digital privacy, it’s no surprise to see an increasing interest in encryption algorithms and cybersecurity. MD5 algorithm was one of the first hashing algorithms to take the global stage as a successor to the MD4 algorithm. Despite the security vulnerabilities encountered in the future, MD5 remains a crucial part of data infrastructure in a multitude of environments.
Before diving headfirst into the main topic, it is best to go through the basic concept of hashing first.
What is Hashing?
Hashing consists of converting a general string of information into an intricate piece of data. This is done to scramble the data so that it completely transforms the original value, making the hashed value utterly different from the original.
Hashing uses a hash function to convert standard data into an unrecognizable format. These hash functions are a set of mathematical calculations that transform the original information into their hashed values, known as the hash digest or digest in general. The digest size is always the same for a particular hash function like MD5 or SHA1, irrespective of input size.
Also Read: Top Data Structures and Algorithms Every Data Science Professional Should Know
Hashing has two primary use cases:
It is common to store user credentials of websites in a hashed format to prevent third parties from reading the passwords. Since hash functions always provide the same output for the same input, comparing password hashes is much more private.
The entire process is as follows:
- User signs up to the website with a new password
- It passes the password through a hash function and stores the digest on the server
- When a user tries to log in, they enter the password again
- It passes the entered password through the hash function again to generate a digest
- If the newly developed digest matches the one on the server, the login is verified
Some files can be checked for data corruption using hash functions. Like the above scenario, hash functions will always give the same output for similar input, irrespective of iteration parameters.
The entire process follows this order:
- A user uploads a file on the internet
- It also uploads the hash digest along with the file
- When a user downloads the file, they recalculate the hash digest
- If the digest matches the original hash value, file integrity is maintained
Now that you have a base foundation set in hashing, you can look at the focus for this tutorial, the MD5 algorithm.
What is the MD5 Algorithm?
MD5 (Message Digest Method 5) is a cryptographic hash algorithm used to generate a 128-bit digest from a string of any length. It represents the digests as 32 digit hexadecimal numbers.
Ronald Rivest designed this algorithm in 1991 to provide the means for digital signature verification. Eventually, it was integrated into multiple other frameworks to bolster security indexes.
The digest size is always 128 bits, and thanks to hashing function guidelines, a minor change in the input string generate a drastically different digest. This is essential to prevent similar hash generation as much as possible, also known as a hash collision.
You will now learn the steps that constitute the working of the MD5 algorithm.
Steps in MD5 Algorithm
There are four major sections of the algorithm:
When you receive the input string, you have to make sure the size is 64 bits short of a multiple of 512. When it comes to padding the bits, you must add one(1) first, followed by zeroes to round out the extra characters.
You need to add a few more characters to make your final string a multiple of 512. To do so, take the length of the initial input and express it in the form of 64 bits. On combining the two, the final string is ready to be hashed.
Initialize MD Buffer
The entire string is converted into multiple blocks of 512 bits each. You also need to initialize four different buffers, namely A, B, C, and D. These buffers are 32 bits each and are initialized as follows:
A = 01 23 45 67
B = 89 ab cd ef
C = fe dc ba 98
D = 76 54 32 10
Process Each Block
Each 512-bit block gets broken down further into 16 sub-blocks of 32 bits each. There are four rounds of operations, with each round utilizing all the sub-blocks, the buffers, and a constant array value.
This constant array can be denoted as T -> T.
Each of the sub-blocks are denoted as M -> M.
According to the image above, you see the values being run for a single buffer A. The correct order is as follows:
- It passes B, C, and D onto a non-linear process.
- The result is added with the value present at A.
- It adds the sub-block value to the result above.
- Then, it adds the constant value for that particular iteration.
- There is a circular shift applied to the string.
- As a final step, it adds the value of B to the string and is stored in buffer A.
The steps mentioned above are run for every buffer and every sub-block. When the last block’s final buffer is complete, you will receive the MD5 digest.
The non-linear process above is different for each round of the sub-block.
Round 1: (b AND c) OR ((NOT b) AND (d))
Round 2: (b AND d) OR (c AND (NOT d))
Round 3: b XOR c XOR d
Round 4: c XOR (b OR (NOT d))
With this, you conclude the working of the MD5 algorithm. You will now see the advantages procured when using this particular hash algorithm.
Advantages of MD5
- Easy to Compare: Unlike the latest hash algorithm families, a 32 digit digest is relatively easier to compare when verifying the digests.
- Storing Passwords: Passwords need not be stored in plaintext format, making them accessible for hackers and malicious actors. When using digests, the database also gets a boost since the size of all hash values will be the same.
- Low Resource: A relatively low memory footprint is necessary to integrate multiple services into the same framework without a CPU overhead.
- Integrity Check: You can monitor file corruption by comparing hash values before and after transit. Once the hashes match, file integrity checks are valid, and it avoids data corruption.
How Can Simplilearn Help You?
The message digest family of algorithms has been a staple in many hashing systems across the globe. They have their flaws, but they can still be considered an excellent beginner algorithm for newer cryptographic enthusiasts. Apart from this particular subject, there are multiple sections in cybersecurity that need to be practiced before one starts a career in this line of work.
Both novices and seasoned professionals can benefit from Simplilearn's Advanced Executive Program In Cyber Security course. The course is loaded with activities, live classes, and a solid base to start your career in this lucrative industry, from addressing the basics of cybersecurity to teaching its most complex aspects.
In this lesson regarding the MD5 algorithm, you took a small recap into hashing and its use in today’s industry. You understood the origin of MD5 and learned its process of scrambling.
Are there any queries regarding the topic you just learned? If yes, please comment below, and we will be happy to answer them for you.