Monday, June 2, 2014

Positional distribution of letters in English usage...

Positional distribution of letters in English usage...  It's amazing how many different ways there are to analyze data.  I've done rather a lot of analysis of English writing, mostly in the '90s when I got interested in identifying an author by their writing style.  I also did some, just for fun, when working on decryption (for some kinds of cyphers, knowing letter frequencies, letter pair frequencies, etc. can be very useful).  This is one kind of analysis I'd never thought of!

For any given letter, how often does it appear in various positions within words (at the beginning, somewhere in the middle, or at the end)?  The distribution for the letter “z” is shown at right.  This is one of the letters that surprised me – I wouldn't have guessed that it appeared more often in the middle of a word than anywhere else.

The first thought I had upon seeing these graphs was this: it's very useful information for a Scrabble player!

No comments:

Post a Comment