Skip to main content

Subsection 3.3 Frequency analysis

To determine the amount of the shift for an unknown shift cipher, the 9th century Iraqi Arab polymath Abu Yūsuf Ya‘qūb ibn ’Isḥāq aṣ-Ṣabbāḥ al-Kindı̄, in order to better understand what he saw as God’s messages in the Qu’ran, developed a technique that later became known as frequency analysis. The following passage from his Manuscript on Deciphering Crptographic Messages describes this technique:
One way to solve an encrypted message, if we know its language, is to find a different plaintext of the same language long enough to fill one sheet or so, and then we count the occurrences of each letter. We call the most frequently occurring letter the ‘first’, the next most occurring letter the ‘second’, the following most occurring the ‘third’, and so on, until we account for all the different letters in the plaintext sample.
Then we look at the cipher text we want to solve, and we also classify its symbols. We find the most occurring symbol and change it to the form of the ‘first’ letter of the plaintext sample, the next most common symbol is changed to the form of the ‘second’ letter, and so on, until we account for all symbols of the cryptogram we want to solve.
In this portion of the project, have students develop their own “frequency analysis muscle” and hone it into a mental algorithm for cryptanalyzing shift ciphers.
4. The mnemonic “ETAOIN SHRDLU” is useful for remembering the relative frequencies of the twelve most common English letters in decreasing order of frequency, although the mnemonic does not reflect all written English and changes with the living language.
  1. (Optional for coding-focused courses) Use the step-by-step module at [cross-reference to target(s) "inceResourcesFirstYearCollege2022" missing or not unique] entitled “Frequency Analysis” to write a Python frequency analysis tool. Then input several English texts of at least, say, twenty pages, and verify that most of the most frequent twelve letters are in the list ETAOIN SHRDLU.
  2. Take three or four English texts of at least a couple of pages in length (books in the public domain, found online, are a good source of plaintext here). Encrypt these using shifts chosen however you like, then have students use their Python program, or one of the many available frequency analysis applets online, to determine the shifts used to encrypt each message. For additional practice, have them decrypt the first sentence and verify their guessed shift is correct. Note: some texts, especially shorter ones, will require students to guess multiple shifts and test each to see which gives legible plaintext, which verifies that their guess was correct.