Generating Quality Code – Part 1

I often joke with other testers that every bug comes from the system doing what we said instead of what we wanted.

“And if someone would just come up with a way to get a computer to do what we actually meant, instead of what we told it to do, that person would be a trillionaire.”

It usually elicits a few laughs.

But you know what… I wonder if we could pull this off today? And if so, how would that look?

The Theory Behind the Madness

All programs, no matter how complex, eventually reduce down to small pieces of logic. We usually don’t deal at those low levels, because it’d take too much time to build code from scratch like that.

But systems are made of applications. Applications are made of classes. Classes contain methods, and methods contain the nuts and bolts holding everything together.

And the more complex the systems get, the more we use tools to speed up the process. Nobody realistically expects us to handwrite every piece of code to get a huge system up and running.

Also, most of the reason why code looks the way it does today is for our benefit–the humans. All the architecture, naming conventions, inheritance, libraries–all of it–is there for us so we can get in there and read it, understand it, debug, maintain and extend it.

But a computer doesn’t really need the benefits these concepts provide us–it’s perfectly fine for a program to be a single billion line method with 200 parameters. It’s just, we’d never write it like that because everybody would laugh at us.

But although all of this abstraction helps us construct code faster, it presents some problems:

  • Some of that lower level stuff acts weird, and we don’t know it,
  • Some of the higher level stuff isn’t playing nice with other higher level stuff,
  • Some people aren’t aware of all the code upstream and downstream of theirs.

And these result in bugs.

So: what if instead of us trying to deal with an increasingly complex systems, and piles of lower level logic, we just… didn’t. What if instead, we let the computer itself figure out what the code is supposed to be, for us?

What if it did what we wanted instead of what we told it to do? 

How It Could Work

Let’s say we wanted to make a really simple program that adds two numbers together.

But instead of saying, “I’m going to make a program that adds numbers,” we specify a set of tests that should pass once the program is run.

We already do this with unit level tests. What we don’t do is try to generate the code that would make those tests pass.

Simple Programs

So let’s specify a test: If given integers 2, 2, the result should be another integer: 4.

Next, we try and generate some code. There are a few strategies we could try:

  • Generate random chunks of text–it’s possible but not likely to get working code, much less code that does what we want.
  • Use random tokens that work for the language being used–for this example we know we can use some combination of arithmetic operators, return, and variable names.
  • Use random tokens like before, but apply them to a template. If we’re writing a method for example, put enough of it to supply as a skeleton, and fill in the rest.

As candidates are generated, there’s a chance that some of them would yield code that would pass the test we gave. Here are two examples:

def s1(x,y) return x+y end
def s2(x,y) return x**y end

Both of these would return 4 if given 2 and 2. The problem is that the 2nd method is returning an exponent, not a sum.

But, instead of the solution being to debug and fix the code, it’s to supply an additional test: If given 2 and 3, return 5. This would filter out the 2nd candidate.

Complex Programs

Complex programs are just made of simpler parts. Instead of using exclusively low-level operators, as in a simple program, complex programs would be combinations of smaller methods.

An example would be if we wanted to generate code to convert a Fahrenheit temperature to Celsius. The formula for that is: (5/9)*(F – 32). But again, this is a complex program made of simple parts. We have a division, a multiplication, a subtraction and parenthetical groups. All very simple things that could have been generated previously.

Next we create a few tests:

32 should return 0
212 should return 100
113 should return 45

A possible program might look like this:

def f(x)return m((d(5,9)),(x-32)) end
# for this example: m = multiply, d = divide.

Is it very readable? No. But does it work? Yes. Here’s what it looks like next to some other methods that could’ve been generated before:

def m(x,y) return x*y end
def d(x,y) return x.to_f/y.to_f end
def f(x)return m((d(5,9)),(x-32)).to_i end

# this is us using the code:
puts f(32)  # 0
puts f(212) # 100
puts f(113) # 45

Why It Could Work

We have technologies today that weren’t viable even a few years ago:

  • Really cheap distributed/cloud computing,
  • Human-readable tests (e.g.: Cucumber),
  • Very low cost computing horsepower, with regard to CPU and memory.

This means, we’re in a position where:

  • A bunch of computers can try a bunch of crap that doesn’t work, to find the stuff that does, much quicker than we can,
  • Anybody could describe tests that guide how the candidate is supposed to work,
  • Space and speed are trivial.

The focus then becomes us telling the computer what we consider to be correct, instead of doing that and also telling the computer step by step instructions on how to achieve that correctness.

Fritz, You Idiot, You’ll Ruin Us All

No I won’t. But it’s a reasonable concern, and I hear you.

There’s a lot of money in development. There’s a lot of money in QA too. There’s a heck of a lot more good money being spent on fixing and maintaining bad code. That’s what this is aimed at.

While I agree that generating code like this can be a concern, I don’t think it’s something to be scared of. I just saw a post from Capgemini about this very thing, and I intend to address the concerns in part 2 of this series.

Leave a comment