Saturday, March 7, 2015

Geek: integer overflow/underflow in C/C++...

Geek: integer overflow/underflow in C/C++...  My first reaction on reading this paper was this: geez, guys, it's 2015.  This problem has been around since about 1945.  Is this really still a mystery and a source of problems?  I've run into this issue (and other numerical representation issues) over my entire career, and I've no sense that things have actually gotten better over time.

With respect to integer overflow/underflow, the fundamental issue is very easy to understand: computers generally represent integers in a way that limits the range of integers that can be represented.  For example, in today's computers a single byte can represent all the integers between 0 and 255.  If you try to add, say, 200 + 100, the result (300) cannot be represented in a single byte – that's an example of integer overflow.

We humans don't generally run into integer overflow or underflow, as we don't have any particular limits on the size of integers we can represent.  Computers, on the other hand, do have this limitation – and therefore computer programmers have to deal with it.  This is where the problems start, as there are many possible ways one could deal with it – including ignoring the possibility altogether, which is in practice what most programmers actually do.  This is the source of much trouble.

The problem is complicated by at least two things in modern computers: the fact that multiple sizes of integers are available (generally, but not always, some even number of bytes), and the fact that both signed and unsigned integers can be represented.  For instance, a single byte unsigned can represent integers in the range 0..255, whereas a single signed byte can represent integers in the range -128..127.  Older computers (and a few still in production) through a different variable into the mix: they represented integers by schemes other than the now-ubiquitous two's complement encoding.  I've worked extensively with computers that used one's complement integer encoding and signed binary coded decimal (where the sign is held in a special bit), and to a lesser extent with zoned decimal encodings.  Each of these has their own special variations of the overflow/underflow problem, though of course being fixed size encodings they all have the problem.

Anyway, it just seems kind of silly that at this stage of software development we still don't have a standard way of handling integer overflow/underflow, and we still haven't managed to educate programmers about the problems they cause.  Actually, it's worse than that – even within the past few years I've run into programmers in very responsible positions, sometimes with years of experience, who don't actually understand the source of overflow and underflow, and who believe the problem can be safely ignored most of the time.  They're wrong and they don't even know it...

No comments:

Post a Comment