Sunday, November 2, 2014

Some non-deterministic software makes me uncomfortable...

Some non-deterministic software makes me uncomfortable...  Reading this article about flaws in neural networks got me thinking about some of my own experiences with non-deterministic software.

There are two big areas of software I'm aware of whose outputs are not (necessarily) determined by its inputs: many kinds of AI, and genetic programming.

Some kinds of (so called) artificial intelligence (or AI) use non-deterministic “learning” mechanisms that result in a system that produces outputs from its inputs in a way that no human understands.  I'm not saying that these systems aren't useful, because they demonstrably are useful.  I first ran into such a system when interviewing with a company that made software to evaluate the risk in credit applications.  In the interviewing process I was left alone for a few hours with a computer running a development version of their software.  This let me play around with various inputs and see how they changed the output – and I very quickly spotted some things that looked like big problems to me.  For instance, I noted that if I varied the applicant's annual income up in $100 increments, the output would vary from “recommend approval”  to “recommend disapproval” in an apparently random way.  It's hard to imagine any real world justification for that!  When I inquired about that response, I was told that nobody knew why that happened, exactly; they just knew that the “network” had learned that somehow.  As a software developer, this makes me uncomfortable, and distrustful of the results.  This is the kind of software application that (it seems to me) should be deterministic – but it's not.

Genetic programming attempts to find solutions for problems through the same sort of mechanisms through which life has evolved.  The basic method is to try various approaches, randomly selected, and let the winners survive through some sort of Darwinian filtering.  This is a concept I find fascinating, as it has the potential to solve problems that no human software engineer has ever found a solution for.  Implementing genetic programming is a tricky thing to do, and I've only played around with it a little bit.  Only twice did I manage to “evolve” a program that actually accomplished something (that is, produced the desired results) – and in one of these cases, the result was better (meaning faster, in this case) than what I did by hand.  This better result was for a sorting program written in Java.  The test that controlled the evolution of my genetic program took 10 sets of data as inputs, and sorted them all correctly.  I ran the evolution part of the program for about a month on a Linux server, where it consumed one core for that entire time – that's a lot of computing!  The result was three times faster than my hand implementation of Quicksort, and four times faster than the standard Java library's sort method.  The code itself was complete gobbledegook, a couple thousand lines of incomprehensible lunacy that I spent a week trying to reverse-engineer but totally failed.  Somewhere during that week it occurred to me to try other datasets on the genetic algorithm, and here I found some fascinating results.  On some datasets, the algorithm never terminated – bad.  On some datasets it sorted correctly, but slower than Quicksort – moderately bad.  On most datasets, it sorted very quickly – but incorrectly.  Very bad!  How could one ever trust the result of a genetically derived sort algorithm?  I don't think you could.  My conclusion to that is that genetic programming is really only useful when the entire set of possible inputs can be tested – and that's a very limited set of problems, indeed.

I'm much more comfortable with deterministic software.  Some would argue, though, that all software is non-deterministic – because of the bugs lurking within.  They may be right :) 

No comments:

Post a Comment