## Tuesday, April 25, 2017

### Binary to decimal conversion performance...

Binary to decimal conversion performance...  Now here's an obscure topic that I've worked on a great many times over my career.  The original inspiration for me to work on this came from programming the Z-80, way back in the '70s.  That little 8 bit microprocessor could do 16 bit addition, but it had no multiply or divide instructions at all.  Often the application I was working on needed higher precision math (usually 32 bits), so I had a library of assembly language subroutines that could do the basic four 32 bit operations: add, subtract, multiply, and divided.  Multiply and divide were maddeningly slow.  Eventually I came up with a way to speed up multiply considerably, but divide was intractably slow.

This slow divide performance was on full display anytime I wanted to convert a 32 bit binary integer to its decimal equivalent.  The canonical algorithm uses repeated division-with-remainder by ten.  For instance, to convert the binary number 11001001 to its decimal equivalent, you would do this:
11001001 / 1010 = 10100 remainder 1
10100 / 1010 = 10 remainder 0
10 / 1010 = 0 remainder 10
The result is then the remainders, in decimal, in reverse order: 201.  You need one division operation for each digit of the result.  A 32 bit number might have as many as ten digits, and on that low-horsepower little Z-80, those divisions were really expensive.  So I worked hard to come up with an alternative means of conversion that didn't require division.

What I came up with has been independently invented by many people; I wasn't the first, I'm sure, and I don't know who was.  The basic technique depends on a simple observation: there aren't all that many possible decimal digits whose value fits in a given length binary number.  For instance, in a signed 32 bit binary number, the largest positive decimal number that can be represented is 2,147,483,647.  That's 10 decimal digits, any of which can have the non-zero values [1..9], except the most significant one can only be [1..2].  That's just 83 values.  If you pre-compute all those 83 values and put them in a table, then you can convert a binary number to decimal with this algorithm:
while remaining value is nonzero
find the largest value of a decimal digit < the remaining value
record the digit
subtract that decimal digit value from the remaining value
This produces the decimal digits in order, from the most significant digit to the least significant digit.  Most importantly, it does so with no division required.

That was vital to getting decent performance out of that Z-80.  But what about a modern processor?  Well, it turns out that in relative terms, division is still the red-haired stepchild of CPU performance.  On a modern Intel or ARM processor, 64 bit integer division can take over 100 CPU clock cycles.  Multiply operations, by comparison, take just a handful of clock cycles, while addition and subtraction are generally just 1 or 2 clock cycles.  Even when using a high-level language like Java, algorithms that avoid using division repetitively can still produce large performance gains.  I've done this five or six times now, most recently to get a 6:1 performance gain when converting floating point values to XML.

One line of the above algorithm I sort of glossed over: the bit about finding the largest value of a decimal digit that is less than the remaining value.  The naive way to do this would be to use a linear search, or possibly a binary search of the table of decimal digit values.  This would require a number of iterations over the search algorithm, at best an average of 4 iterations for a binary search over a table big enough for 32 bit values.  For 64 bit values we'd have 171 decimal digit values, and we'd need 5 iterations for that binary search.

We can do better, though, by taking advantage of the fact that there's a close relationship between the binary values and the largest fitting decimal digit.  I just finished coding up a nice solution for finding the largest decimal digit that fits into a signed 64 bit integer.  It works by using a table with 512 entries, about three times as many as we have possible decimal digit values.  Given a 64 bit positive binary number, the Java code that computes the index into that table is:
```int lz = Long.numberOfLeadingZeros( value );
int index = ((63 - lz) << 3) + (0xf & (int)Long.rotateRight( value, 60 - lz));```
Pretty simple code, really. It's also very fast.  Best of all, with a properly constructed table the index will point to an entry that will either be the correct one (that is, the largest decimal digit that will fit into the value) or it will be too big.  If it is too big, then the preceding entry in the table is guaranteed to be the correct one.  Much better than the iterative searches!

With those two techniques in combination, I'm doing conversions from binary to decimal with no divisions and no iterative searching.  It's very challenging to benchmark performance in Java, but my simple testing shows about 5:1 improvement in performance for random numbers in the 64 bit positive integer range.  Smaller numbers (with fewer decimal digits) show smaller performance gains, but always at least 2:1.  I'll call that a win!