Sunday, September 29, 2013

UTF-8: the most beautiful hack.


UTF-8: the most beautiful hack.  If you're an ancient enough programmer (and I am more than ancient enough!), you probably remember – and not fondly – the horribly bad old days of incompatible character encodings and the notoriously evil “code pages”. 

Unicode came along and cleaned up part of this problem, but it wasn't until UTF-8 (the standard 8 bit encoding of Unicode) implementations became common (starting in late '93) that developers started coalescing on it as a truly universal character encoding.  Now it's ubiquitous, having won the war much like TCP/IP did in networking. 

In the video at right, Tom Scott explains the origins of UTF-8 (some of which I'd inferred, but never heard before) in an engaging short presentation, which I found in this post with even more of the story., and here's an email with even more details.

No comments:

Post a Comment