Posts Tagged ‘ciphers’

To Cipher and to Decipher

July 27, 2016

Swappin’ today and tomorrow’s themes around, because for some reason I felt like talking about some math stuff today. So, for today’s topic, we are going to go over the basics of cryptography. This is going to be super fast covering a bunch of stuff, because I want to go over the basic gist of a number of different ways cryptography can be done, and how one can go around cracking it.

To start, the type of cryptography we are going to talk about is called a cipher. A cipher is a means of hiding information by changing the individual letters and numbers and spaces to do different things. Codes meanwhile are when you replace things on the word level. So if you replace the word dog with jazz, that is a code, but if the word dog becomes agp because d goes to a, o goes to g, and g goes to p, then that is a cipher. Ciphers are generally a lot simpler to learn then codes, which can require entire books to translate, and they never run into the problem of being unable to talk about something because you never thought of a code word for it. Also, we are only covering the really big ciphers, there are a lot of different ones out there, and it can be fun learning about all the different kinds.

Anyways, first cipher to cover is called the Caesar Cipher. So called because it was used by Julius Caesar when he wanted to send coded messages to his generals. Its pretty much the simplest cipher you could imagine. Basically you pick a number, between one and twenty-five, and for every letter of the alphabet, you simply go forward by that many letters. If you get to the end, then you start back over at the beginning. As an example, if your number was one, then a would change to b, l would change to m, r would change to s, and z would change to a. If your number is three, then a goes to d, l to o, etc. You change each letter of your message to another letter depending on the number that both you and the recipient know. The number is called a key. When the person gets the message, they use the same method, just going backwards instead of forwards. If the key was one, then they would change b to a, m to l, a to z, etc. I think Julius Caesar always used four as his number, but I don’t remember for sure.

If someone has no idea anything about cryptograms, then this method works pretty well. They see nothing but seemingly random letters, and don’t know what to do with that. If you have some idea someone is using the cipher however, it is one of the easiest to crack. Since there are only twenty five different ways to make the cipher work, you can just test out each of them, and look for a message that makes sense. Twenty four of those will turn random letters into more random letters, but one of them will give you the message. The number of possible keys makes this cipher almost useless against someone who knows anything about ciphers, unless you only need the cipher to last a couple minutes, in which case the time it takes them to test twenty five different keys vs your targets one, could make this useful.

The next step in complexity up from this is still relatively straightforward, but much harder to brute force. Instead of simply incrementing every letter by the same amount in the alphabet, you can map each letter to another letter. You would take the alphabet and scramble it up, and then map from the original alphabet to the scrambled one. If we pretend the alphabet is only four letters long: abcd, then the scrambled alphabet might be bdca. In which case, every time we see an a, we map it to b. When we see b, we map it to d, when we see c, we map it to c, when we see d, we map it to a. So the word “bad” would then become “dba” in our new alphabet. To reverse it, we just map from the new alphabet back to the old, changing b to a, d to b, etc. Our key is now the scrambled alphabet, which is a fair bit harder memorize than one number, but is also a lot harder to brute force, as there are 403 septillion different possible combinations of all the letters in the alphabet. As long as both you and your intended receiver have a copy of the scrambled alphabet you can use this code quite quickly.

If we can’t just try all the combinations, and look for one that works, how can we go about cracking this cipher? Well, anyone who is familiar with the cryptogram puzzles you sometimes see in newspapers is familiar with one method. Almost all the cryptogram puzzles are messages coded in this sort of a cipher. The thing that makes them crackable as a hobby is that words are not just random serieses of letters, there are patterns. Especially if the spaces between words are preserved, it is pretty easy to figure out using deduction, that some letters have to translate to some other ones, usually starting with the one letter words translating to I or A, then using that info to try and figure out more of the words, ultimately working our way to the whole thing.

Even when you take the spaces out, it is still possible for a skilled cryptographer to deduce the code using statistics. By understanding the frequency of letters used in a given language, and comparing it with the frequencies of each letter in the code, you can generally figure out some of the letters with a pretty high degree of accuracy, especially if the coded message is long. If that fails for some reason, an even more powerful tool involves the statistics of what letters follow other letters, letting you look up statistics of letter pairs, and figuring it out from there. While it is not quite as straightforward as figuring out a Caesar Cipher, it is still pretty methodical, and skilled cryptographers can figure out cryptograms coded in this way in just a few minutes. In terms of using this in your every day life, this system is probably strong enough that most people are not going to crack it, but is not particularly secure, as any random person interested in patterns and puzzles can likely crack this sort of cipher.

For a long time, the cipher I just described was the main one used, but as people got better and better at solving them, it became less and less safe to use. People added a lot of little tricks to try and make it harder to figure out, adding extra characters that had no meaning at all, making certain letter pairs transform into a single letter, and other tricky things, but generally all of the methods fell prey to the fact that languages are filled with patterns, and when one letter is always translating to the same letter, those patters are being preserved. The next step then, is to make the letters not always map to the same one, but make it change as the message goes along. The problem of course is making it easy for the person you want to receive the message to decipher it, without having to make them carry around some kind of book that could be stolen or something. The solution that was eventually come up with involved an interesting reuse of the Caesar Cipher.

This next style starts with a keyword. This is a set of letters, preferably a little long, that both parties have memorized. For the purposes of this example, we are going to use the word “key” as our key. Lets try and encipher the word “secret” using our key. So, the first step involves adding the first letter of your key together with the first letter of your message. How do you go about adding letters together you might ask? Well, you can turn them into numbers. If we pretend a is 1, b is 2, etc all the way to z is 26, then we can easily add two numbers together. In order to make sure that we get a letter back out when we add things together, we wrap back around to the beginning if we go past 26. So in our example, the first letter of our message is s, which translates to the number 19. The first letter of our key is k, which translates to 11. We add 19 and 11 together to get 30, which is bigger than 26. So, we subtract 26 from it, in order to wrap back around to the beginning and get 4. 4 translates to d, so the first letter of our ciphered text is d. Another way of looking at it, is that since k translates to 11, we are using the Caesar Cipher with key equal to 11 for our first letter. Anyways, next we do the second letter of our key, with the second letter of our message. We get e is 5, and e is 5, which add together to 10, so the second letter of our ciphertext is j. Then we do third and third, 25+3=28 which goes to 2, so the third letter is b. Now, we get to the fourth letter of our message, but since we are at the end of our keyword, we jump back to the beginning, and continue. So we combine r with k to get c. We continue through the whole message, cycling through the keyword as many times as we need to. When we are finished, our message is translated to djbcjs. To figure it out, we do the same process in reverse, subtracting our key letters from the cipher letters, and adding 26 if the resulting number is less than 1, thus wrapping back around the other direction. We are thus able to disguise the frequency of the letters by making the same letter mean a different thing in a different part of the message.

You might think this sounds basically unbreakable, and for centuries, people thought that it was, but the old foe of ciphers, patterns and statistics ended up creating a way to figure out even this cipher. There is actually one version of this where the message is provably impossible to crack. If your key is as long as your message, and you only ever use it to code one message, then it is impossible to figure out what your message says without the key. This is called a one time pad, and is both really cool, and extremely difficult to utilize if you are sending a lot of information. If you ever think you will need to communicate discreetly with someone, and want to make absolutely sure no one will ever be able to figure out what you are communicating, then create a very long, random sequence of letters that both of you have copies of. As long as no one else sees this sequence of letters, you only use them once, and the letters were created randomly, then this is the one and only perfectly secure communication method.

Most of the time however, the key will be shorter than the message, and you will use the same key for multiple communications. In that case, its possible to use statistics again to break through the keyword cipher. As a general principle, the longer the key, the better, and the more random the key the better, but it becomes harder and harder to memorize the key as you get longer and more random, and sometimes having to write the key down can do more to compromise your security than having a less secure key. Anyways, on to actually breaking it.

So the principle that drives this method is the fact that patterns are much more common in real words then they are in gibberish. T and h come together much more frequently in real words than if they were placed randomly, and so do most consonants followed by most vowels and visa-versa. In order to take advantage of this, we go through the message very carefully, and note down any series of three or more letters in a row that is ever repeated in our message. This is a long process, and people would generally use computers for this today, though this was done even before computers by hand. Each time you find one of these matching patterns, you count the number of letters separating the first letter of the first copy and the first letter of the second copy. You keep going through the message, finding repeated patterns, and noting down how far apart they are from each other. When you get done, you will have a list of patterns, and a list of numbers next to them. Now, you take a look at the numbers, and you look for a number that is a common factor to almost all of them. So if you had 64, 32, 64, 28, 16, 32, 48, 80, you would note that sixteen divides into all of these numbers, except for one. What that means, is that most likely the keyword is sixteen letters long. Because sequences of letters are much more likely to repeat due to the way words are constructed, as opposed to randomly, you are mostly finding the patterns that ended up being coded the same way because they used the same part of the keyword. So maybe the word “the” was being coded in one place, then 32 characters later, the word “they” was being coded. Because the “the” was in the same place in the keyword, it got translated to the same set of letters in the ciphertext. The 28 we see is an example of an actual random pattern copy, which will exist, but will be much less frequent than the other examples. Anyways, after we do this, we get a pretty good idea of how long the keyword is.

Alright, we know how long the keyword is, we still don’t know what the keyword is. How does this help us? Well, that leads us back to the wonderful world of letter frequency analysis. We take our ciphertext, and we cut it into a number of pieces equal to what we think the length of the ciphertext is, in our example we would break it into sixteen chunks. The first chunk would have the first letter, the seventeenth letter, the thirty-third letter, etc. Our second chunk would have the second, the eighteenth etc. So on and so forth. Now, each chunk we have uses the same Caesar Cipher for all the letters, so we can do a separate frequency analysis on each of the different chunks, figuring out which letters likely translate into which other letters. Because each chunk is all using the same offset, and its not different for every letter, its actually even easier than using the frequency analysis from before, because once you are pretty confident on any single letter in the chunk, then every other letter is figured out for that chunk because they all use the same offset. Anyways, as long as we got our key-length correct, we can generally use the frequencies to figure everything out. Even if some of our chunks are resistant to analysis, we can use the other ones to try and deduce the letters in those chunks.

The reason why longer keywords lead to more difficulty in deciphering, is two fold. First, the longer the keyword, the less times it repeats, and thus the less chance of it making the patterns that the first part of our cracking method relies on, and also the more it will be obscured by random patterns. Even if you figure out the length, if it is sufficiently long, sometimes each individual chunk will have so few letters its hard to get a frequency analysis to work. Generally, if someone wants it bad enough and is willing to spend enough time, any version save a one time pad will eventually get cracked, but with a really long keyword, it can end up taking an incredibly long amount of time. Its that kind of thinking that eventually leads to some more modern cipher methods that are used today.

I am afraid I am out of time for tonight however, so for today I am finished, I will try and do another post about modern ciphers on another day. If you want to learn more about any of these ciphers, or try and use/break them, do some research online. The first one is called the Caesar Cipher, the next is called the Substitution Cipher, and the last one is called the Viginere Cipher. As mentioned before, the Subsitution Cipher is one of the most common recreational ciphers to break. The Viginere Cipher requires a lot more math generally, and more time, so its not very common to crack relationally, but the thrill of actually doing so is worth doing at least once, so I would recommend trying your hand at one of those as well. Here are three messages, the first coded in Caesar, the second in Substitution, and the last in Viginere, if you want to try and crack them.

ro fjbqrwpcxw jrwc pxrwp cx urbcnw cx mrblryurwnm mrbbrmnwcb cqrb rb cqn mroonanwln cqrb trm rb xdc

zitkt vql ziol aofu lozzofu of iol uqkrtf qss qsgft vitf iol wkgzitk of iol tqk hgxktr q sozzst woz gy hgolgf. lzgst iol wkgzitkl ekgvf qfr iol dgftn qfr iol vorgv lg zit rtqr aofu vqsatr qfr ugz iol lgf qfr lqor itn solztf aorrg. oct wttf aosstr qfr ozl ngxk rxzn zg zqat ktctfut gf esqxroxl aoss iod jxoea qfr estqf qfr ztss zit fqzogf viqz q ykqxr it vql. zit aor lqor kouiz oss rg oz wxz oss fttr zg rg oz ekqyzn. lg ziqz fg gft voss lxlhtez dt oss stz gf ziqz od q rqyzn.