Let's take a look at hashing then as we move towards the end of our crypto module. Hashing is an approach, it's a form of crypto, and it is a one-way form of encryption. We've talked about encryption so far being encryption and then decryption being reversible. Hashing is non-reversible. You put in plain text and what it gives you is a small, fixed-length output. The small, fixed-length output we can think of as being a unique fingerprint of the input. In the top left there, you see the input is fox and it gives you that fingerprint, which we call the digest of the input. You see it starts DFCD and ends 2D17. Now this digest or this fingerprint of the input will change radically even if there's a small change to the input. A small change to the input, look at Rows 3, 4, and 5 in the illustration. There is a very small change to the [inaudible] random, and the digest is always fixed length. So different inputs will have different hash values, will produce different digests. But if you put the word fox into a digest, it will always give the same output. If I put the word fox into a hashing algorithm, it would give that output. If tomorrow I did the same, it would still give the same output. If you did the same a week, a month, whenever from now, it would still give the same output, so this is deterministic. It means same input gives us the same output. This is why it's used as a form of integrity control. If we have, let's say, a message and we create a digest of it, and we send both the message and the digest to a recipient, they can process the message through the same hashing algorithm. If the digest that's calculated matches the digest that we have sent, if those two things are the same, the two fingerprints match, the message cannot have changed. If the two digests don't match, then the message must have been corrupted in some way. Whether deliberate or accidental, we know that the message that created the digest value is not the same as the one we have received. Here we have some common hashing algorithms. Probably the most well-known is MD5. Still very popular and it produces a 128-bit digest that we can think of as being a fingerprint. You can put a message of any size in and it will give you that fixed length output. The output of the digest will always be 128-bits. If you put a one bit message in, you get 128 out. If you put 10 gigabytes in, you get 128-bits out. Common to hashing algorithms, fixed length output. MD5, created by Ron Rivest and it's not considered secure anymore. People can falsify or manipulate that fingerprint outcome so we don't use this to protect against deliberate alterations, but it's still commonly used because it can help detect corruption, for example, in a downloaded file. It's very common to see MD5 in operating systems. It's very widely available, very widely implemented, but we don't use it anymore to prevent deliberate or to detect deliberate falsification. SHA, the Secure Hashing Algorithm, is NIST's approach to hashing. There are a number of different variations of SHA. The digest value, well, again depending on the version, you get different digest outputs. But again, bear in mind, the larger the fingerprint or the larger that digest value, the stronger typically the hashing algorithm is. Just to give you guys an idea of what is fingerprint; I've talked about a fingerprint and the digest. What does this look like? Well, we've input International Information System Security Certification Consortium into the MD5 algorithm and you see there what you get. That's what that 128-bit fingerprint or digest looks like. I have done the same for SHA1 as well. These digests, those values should be unique. One input creates a unique hash value. If you get collisions, it's a problem because if two inputs create the same fingerprints, you don't know which one of the two inputs created the hash value. You could think of this almost as being like two people with the same fingerprints committing a crime. You don't know who committed the crime because two people have the same fingerprints. If they committed a crime, left fingerprints at the scene of the crime, which one of those two people committed the crime? We don't know because there's two possible inputs, two possible sources. So hashing should create a unique output. You get accidental collisions, but also with MD5, it's possible to manipulate a fake fingerprint. You could take whatever text you want as an input, manipulate it to create the fingerprint value that you want. If we have this situation, which is exactly what we have with MD5 at this point, then we can't tell whether or not there was a change of integrity. We lose the control over integrity. We lose the level of confidence. Was it changed accidentally? Was it deliberately falsified? We don't know. With MD5, a team of researchers about 12 years ago, 14 years ago, falsified documents using a Sony PlayStation 3. It was a team of researchers at the University of Eindhoven. We want to make sure we're using a strong encryption algorithm with an appropriate key size, a strong hashing algorithm with the appropriate digest size. That gives us the assurance we want around confidentiality, integrity, availability, and privacy.