Saturday, January 5, 2013

New English Letter Frequency Tables...

Many older ciphers (and some less sophisticated current ciphers) are based on letter substitution (instead of “a”, write “m”, instead of “b”, write “j”, and so on).  A basic tool for cryptanalysts trying to break such a cipher is tables of “letter frequencies”.  These tables show how often each letter appears in ordinary writing for any particular language.  Even more importantly, the tables will include how often each “digram” (pair of letters) and “trigram” (triplets of letters) appears.  With these tables and a little additional cleverness for polyalphabetic substitution ciphers, all you need is enough encrypted messages and you can break the code.

Letter frequency tables have been readily available for a long time.  I remember first seeing them in the 1970s (when I was studying cryptography in the U.S. Navy), and the tables I saw then were quite old – WWII-era.  The English letter frequency tables most often used today were generated by a fellow named Mark Mayzner in the early 1960s.  He analyzed (the hard way!) some 20,000 English words to generate his tables.  

It turns out that Mr. Mayzner is still alive, and just last month he wrote Peter Norvig to ask if he might update those tables using today's much easier methods and vastly larger accessible texts.  Peter did exactly that, and the results are here – and freely downloadable!

No comments:

Post a Comment