Sunday, February 12, 2017

And visions of unums danced in his head...

And visions of unums danced in his head...  Over the years, I've posted here on several occasions here about my various battles with numeric representation in computers.  There are lots of specific problems I've run into, but I believe they all fall into one of just four general categories:
  • Programmer ignorance.  Up to about 15 years ago, this generally meant ignorance of how floating point numbers actually worked in a computer.  More recently, I'd have to add widespread ignorance about how even integers are actually represented (most especially two's complement representation, and underflow/overflow).
  • Trying to represent fractional decimal numbers in binary floating point, especially for money.
  • Performance issues, especially with conversions to-and-from strings.  To a much lesser extent, actual math operation performance, especially with divides.
  • Insufficient precision, usually the result of a poor choice in early design phases.  You might be thinking I mean single-precision floating point vs. double-precision floating point, but actually the most common issue I've run into is use of a low precision integer (say, 16 bit instead of a 32 or 64 bit), especially with overflow on a multiply.
I have spent hundreds of hours over the years learning for myself about these issues, implementing solutions and workarounds for them, fixing existing issues, and (more than anything else) teaching other engineers about them.  It's a topic near and dear to my geekly heart.  So, when I read an intriguing little headline the other day, I just had to go check it out.  There's a fellow named Dr. John Gustafson who has proposed an entirely new floating point numeric representation with the objective of eliminating at least some of these problems.  As best I can tell from the bit of reading I've done so far, it would deal handily with the first and last bullets above.  I'm reading more, in the hopes that he's also going to tackle the other two.

The core proposal he's got is for something he calls unums (short for Universal Number) – a new way to represent numbers (real or integer) in a computer.  If that name's not sufficiently grand for you, his book about the first version of unums is called “The End of Error” – that ought to do it!  I've purchased that book and I'm about fifty pages in at the moment.  My first comment has nothing to do with the subject matter, but rather with the style: this is one of the most readable and understandable books on a computer science topic that I've ever read (and I've read quite a few :).  Hats off and a bow to Dr. Gustafson for that!

I know from the web reading that I've done that Dr. Gustafson has proposed a quite different “unum 2.0” from what's described in his book (and what he now calls “unum 1.0”.  I'm going to read the book first, and get a good understanding of unum 1.0 before I tackle 2.0.

If unums look like they really would be as revolutionary as Dr. Gustafson would have them, they're going to need rather a lot of work to be done before they can be widely used.  For many applications (though not, I think, for business applications), hardware implementations will be a must.  He's already working with Intel on this idea, and there's no better way to make hardware happen.  There will also need to be implementations in language compilers/interpreters.  That will be a major challenge!  Just as with IEEE floating point, the compilers will have to work with or without hardware support.  There will be performance issues to worry about.  There will be legions of programmers to educate.  And so on.

If I get done with my reading and conclude that Dr. Gustafson has something worthwhile here, I'm going to consider volunteering to help with the Java implementations.  That could be a lot of fun!