(Part 1 is here)
When I wrote the previous post, I didn’t realize how many people would feel so strongly that what I was proposing was impossible.
I’d thought that this concept was largely considered an ideal, or a pipe dream, but ended up having many people vehemently opposed to what I was saying. Many of the arguments came from my breakdown of the examples in the original article.
For this pass, I’d like to go a level deeper, focusing more on the statement I took issue with, and less on example.
The Original Statement
The original statement made was, software can have a huge number of input combinations–so much so, that it would be impossible to test them all in a reasonable amount of time. This appears to be a core statement about why software can never be considered defect-free.
Let me start by saying: I agree with this statement. I agree that it would take an unrealistic amount of time to test through all the combinations.
What I disagree with are the extrapolations that come from this statement–ones such as:
- There are an infinite number of combinations,
- All input combinations need to be tested before software can be considered defect-free,
- Software has to be proven defect-free in order to be defect-free.
When describing the number of input combinations, we’re talking about discrete numbers.
Let’s say you had a form with 100 million dropdown lists, each with 100 million choices. There would be a set amount of combinations you could get from that data set.
It’s a huge number. But it’s not infinite.
When people use “infinity” to describe a real value, what they mean is, “a number so big that it might as well be infinite”. And for this case, yes that would be true. The number of combinations above would be astronomical, and even running tests at an obscene rate, it would take a really really really really long time to complete.
But it’s not infinite. It’s a certain value. An integer.
The problem with classifying combinations as infinite, is it immediately puts the software in a box.
It’s the kind of box where people think, “This software is too complex to test because of all the data combinations, so I’ll just do the best I can.”
That kind of box is dangerous. Here’s why.
I would argue that you don’t have to test all possible combinations to prove that all defects have been found.
I would agree this were the case, if every combination yields a unique defect.
But they don’t. There are wide swaths of data sets that:
- work just fine,
- are invalid by their very nature (i.e. huge input length when a length is clipped in the program), or,
- all trigger the same kind of defect.
Instead what I was trying to describe in part 1 was that, the way code is structured, you can find defects of a certain type using certain inputs–there’s an example there that will help explain, so I won’t repeat it here.
Basically, you only need one combination of input to detect a certain defect. Another combination of input that exhibits the same characteristics as the first combination, will show the same defect.
Therefore, you only need to run one of that kind of combination, to find that kind of defect.
Finally, a trend that I notice is more a philosophical one–software has to be proven defect-free in order to be defect free.
This, I disagree with. The state of something either is, or isn’t, without our proving it to be the case. Something isn’t true because we’ve proven it to be so.
The same is true of software. Of all the many possible combinations of a piece of software, there can be one that is defect-free. Maybe more.
But it still goes back to what we’re actually proving: If you were to run through all possible combinations of inputs, an accidental result would that you’d find all the defects.
But I don’t think that should be a side effect. I think it should be an intentional goal. Testers should seek out and find defects.
Side Effects of This Thinking
I think this approach to software testing will have some interesting side effects. Particularly, I think the way we’d test would look a little different:
- Testers would be a lot more familiar with the code they’re testing–I’m a heavy proponent of white box testing, and insist on being able to see the code being tested. If done right, white box testing won’t taint your understanding of the code, as long as you approach it with a healthy amount of “O RLY?” when making sure the code does what it says it’s actually doing.
- Fewer redundant tests would be written–much less testing of combinations for the sake of thoroughness would be done. Why run the same kind of test to find the same kind of defect? Stop that.
- More targeted tests would be written–if you could come up with one combination of inputs that would trigger a failure in the presence of one bug, wouldn’t you write it? I would. A large amount of these kinds of tests would be a valuable regression suite.
- Each test would tell you something new and unique about the health of the system–the more information the better.
- More questioning about what you’re actually testing for, when writing a test–I see many testers (and I’m including myself) get stuck in the trap of slamming out tests without considering what it is that’s being tested for. Stop. Take a second, and ask yourself, “what kind of bug am I hoping to find with this test?” Take some time and classify the bug (aside: I wonder what all the classifications of bugs even are? Intriguing…). And then write one test to find that bug in that part of the code.
- More variety of tests–there are many flavors of bugs out there. Not all of them are related to data. Some are security related, or load and performance related. The fewer tests you spend writing overall, the more time you have to write a higher variety of them.
Many times, I refer to testing like setting tripwires. Either that, or a spider web, would be a good analogy.
If placed properly, a tripwire will warn you when something enters a room. If built properly, a spider web will catch flying insects (bugs, lol)
If placed improperly, a bunch of tripwires will fire off when one thing enters a room. Too much. Why have multiple failures of the same kind when only one will give you the same information? Are the other tripwires adding value? If not, get rid of them. Better yet, place them elsewhere.
If built improperly, a spider web will make for a starving spider. I would imagine that instead of “coverage” like a good web provides, it would look like a really thick single strand of spider silk. A lot of work to catch bugs along a single path is not as good as a large web that covers more area. It’s delicate, but each strand has capturing power, and all it takes is one…
As I said at the beginning, I hadn’t realized how strongly many people felt about what I said last time.
However, I know that ideas can propagate in different ways. I know that very few people have done the arduous work of researching, documenting and then publishing their findings about whether software can be defect-free.
A larger, second group of people will read and agree with the researchers. An even larger, third group of people will listen to suggestions made by the second group, and agree because it sounds reasonable.
But this doesn’t mean the first group was right to begin with.
More than any other profession, it seems that testers are asked to “think outside the box”. Whether it’s true or not, I’m sure you would agree that “software can never be defect-free” is a type of box. It’s a constraint that we have to (or maybe not have to) operate within.
Even if I’m wrong, I think we can safely think outside the box on this, and come up with radical new ways to test.
It’s my hope that parts 1 and 2 have challenged you to think critically about the testing craft, and also encourage you to try out some crazy stuff once in awhile.
Thanks again for reading,