Wednesday, March 4, 2015

Geek: the evolution of character codes...

Geek: the evolution of character codes...  For an aging geek like me, this paper was a trip down memory lane.  In the '70s, while writing software for Univac computers that were part of the U.S. Navy's NTDS system, I often wrote programs to convert textual data from one character encoding to another.  This was a common problem, as there was no “one standard to rule them all” as there is today with Unicode.  Instead we used a combination of different character encodings, and if we wanted one system or program to communicate with another, we had to write a conversion program to do it.

The character encodings that I worked with included several widely used ones: Baudot, FIELDATA, ASCII, and EBCDIC, all discussed in the linked paper.  We also used some special-purpose, typically application-specific encodings that were basically primitive compression schemes – these were especially common in what we'd call log data today.  For instance, one system I worked on kept a log (on magnetic tape!) of all the targets we had identified and tracked.  Space on that tape was at a premium, so many simple tricks were used to conserve characters.  One that I recall: in an ASCII character stream (5 bits per character), we had a special “numbers-only” mode that was initiated by a control character.  Once in that mode, codes from 0x00 to 0x63 represented decimal digit pairs (00 to 99), and 0x64 dropped us out of that mode.  This was useful because a high percentage of a log was comprised of numbers – so why “waste” an entire ASCII character for just one digit?  If you had a number with 8 or more sequential digits (and we had many of these), this character encoding would save bits.

What a different world with Unicode today!

No comments:

Post a Comment