One of the most direct applications of mathematics to software is cryptography. The mathematics of cryptography give us confidence that messages are authentic and confidential. How does that work?

## Information Theory

Modern digital encryption is founded on the work of mathematician Claude E. Shannon. He taught us how to measure how much information is in a message.

This allows us to correct errors in transmission, compress files, and encrypt messages.

He measured information in bits. A bit, just as in a digital computer, is a unit that can be in one of two states. If your language only allows you to say two things (say it only contains the words “car” and “derrigible”) then a statement only has one bit of information in it. You chose one or the other. It doesn’t matter how long the words are.

If you can choose one of four statements, then your language has 2 bits. One of 8, it has 3 bits. So what if you could choose from a possible set of 10 things to say? In other words, how many bits of information are in a decimal digit? Simple: log_{2} 10 = 3.32 bits.

Languages that are highly redundant have fewer bits of information than the actual text used to express them. For example, in English, a letter “q” is usually followed by a “u”. That means that the “u” is largely redundant. Even though it takes another log_{2} 26 = 4.7 bits of information to write the additional letter, it doesn’t contribute much more information to the message.

The ratio of information in a message as compared to the length of the message is its relative entropy. 1 minus entropy is redundancy. The English language contains about 50% entropy, 50% redundancy. That’s why English text can be compressed to about 50% of its original size.

When we compress, we squeeze out redundancy. The message gets close to having a 1:1 entropy ratio. And when we transmit a message over a noisy channel, we can add redundancy to make it possible to reconstruct the message on the other side if parts of it get scrambled.

But what about encryption? Well, to encrypt a message, you want to combine the information from two different channels. You need to do this in such a way that it is difficult to separate the information once its combined.

The two channels that we’ll combine are the key and the plaintext. They mix together to form the ciphertext. Information theory tells us how difficult it will be to separate the two again, assuming that you don’t already have one or the other (the key or the plaintext).

To securely encrypt a message, we have to spread the information in both the key and the message evenly throughout the ciphertext. So a small change in the key leads to a large change in the ciphertext. And a small change in the plaintext also leads to a large change in the ciphertext. Spreading the key is called **confusion**. Spreading the plaintext is called **diffusion**.

The algorithms that we’ve designed for symmetric encryption do a good job of both confusion and diffusion. AES, the Advanced Encryption Standard, is a good example.

## Inverse Functions

Symmetric keys have to be exchanged to be useful. They have to be moved. And in so moving, they may be compromised. So how can we exchange symmetric keys with someone over a potentially unsecure network? And how can we be sure of the identity of the recipient? What about the identity of the sender?

To solve these problems, we can use inverse functions. Suppose that I make up two functions, f(x) and f^{-1}(x). If I give you the first, but keep the inverse for myself, then you can challenge me. You make up a random number, and run it through f(x). They you give me the answer and ask me what was your random number. Only someone who knows f^{-1}(x) could tell you. And I haven’t shared f^{-1}(x) with anybody, so if I get it right it must be me.

You can do the same thing to send me a secret message. Just encrypt it with a symmetric key, then run the symmetric key through f(x) and send me the answer. It doesn’t matter if an eavesdropper intercepts it. They don’t know f^{-1}(x). So only I can get back to the key for the message.

Ah, you say. But what if I replaced the correct message with a fake? Well, I’ve got you there, too. You see, the *sender* also has a pair of inverse functions, g(x) and g^{-1}(x). I know g(x). He used g^{-1}(x) on a hash of the message and sent that, too. That’s called the signature. So I can hash the message I receive, and run his signature through g(x). If I come up with the same number, then I know the message must have come from him.

You can find out more about the mathematics and mechanics of modern digital encryption at http://cryptofundamentals.com.