Here's another case where 100% testing is possible: Suppose we want to test the COORDINATES package, and the input values are given as one byte integers. That means the values can range from -128 to +127 feet. Figure 42 shows a short test program to check the COORDINATES package transformations for all possible inputs. It uses the inverse function technique to test both Transform functions by transforming a rectangular point into a polar point, and then transforming it back again. The program does this for all rectangular positions included in the area from 128 feet west to 127 feet east, and 128 feet south to 127 feet north. It keeps track of the largest errors, and prints them when finished. It also tells us how long it took the test program to run.
The two previous examples are exceptions to the general rule. It is more likely that the COORDINATES package has to work for all locations 100 miles from the origin. In theory, we could 100% test this package by simply changing the loop limits to +/- 528,000 feet to get full 100 mile testing coverage. In practice we can't do this. It took 177 seconds for the test program shown in Figure 42 to run on a 10 MHz AT Clone. The number of points tested was 256**2 = 65,536 points. That's 370 points per second. To test (2*528,000 + 1)**2 points would take 753,473,124 seconds. That's almost 24 years. By the time we finished testing it, the weapon it was being designed for would be obsolete.
The coverage wouldn't be as dense as 100% testing because we would be checking it every 115 feet instead of every foot, but this kind of sparse uniform testing is good because it quickly covers the entire range of inputs and is likely to uncover overflow conditions. (In fact, 46,341 squared overflows on a 32-bit integer machine, so the transformations fail at 8.78 miles.)
We said it would take almost 24 years to test all the points in a square +/- 100 miles from the origin (to 1 foot resolution), but if you add a third dimension and want to test all those points at all altitudes from 0 to 8 miles in 1 foot increments, the schedule out stretches out 971,520 years. Nobody is going to fund a project that long!
We've already calculated that the test program can test 83,916,000 points over the weekend. If we uniformly distribute those test points over the area 8 miles high, 100 miles each direction from the origin, then the points fall on grid lines 825 feet apart. That's sparse, but it gives gives good, uniform coverage over the entire area.
The disadvantage of uniform testing is that the inputs are regular multiples of the index. Therefore certain ratios of input conditions occur over and over, and other ratios never happen. Differences between two variables tend to be multiples of certain values. This nice, regular input pattern sometimes misses small differences of large numbers, or division by numbers near zero.
Another problem with uniform testing is that it is too uniform. It spends just as much time testing the coordinate transformations of targets 90 miles away as it does testing coordinate transforms 90 feet away. If you are testing an early warning system you want to test lots of cases at the outer boundary, and don't care about short ranges. If you are testing a short-range gun system, you care more about the performance at short range than long range. In these cases you don't want a uniform distribution. Monte Carlo testing might be more appropriate.
Let's suppose we decided to test the COORDINATES package using Monte Carlo techniques for picking the X and Y coordinates, instead of using a uniform distribution. We still have all weekend, so we want to generate about 84 million coordinate pairs and test them. If we try to do that, we will be in for several surprises.
First, we will find the program is still running monday morning when we get to work. That's because it takes more time to generate a random number than it does to simply bump the index in a loop. You won't be able to test as many points in a given time period as you will if you use a uniform distribution.
Second, you may not be testing as much as you think. Suppose you generated X and Y by doing this:
If you read the fine print in the RANDOM_NUMBERS package specification, you will see the Rnd function generates a random sequence that repeats after 2048 numbers. After you have generated 1024 X,Y pairs, you will begin to repeat the same pairs. So the algorithm above doesn't really test 84 million random positions. It tests 1024 random positions 82,000 times. The last 81,999 times don't tell you anything you didn't already learn the first time.
Furthermore, every random number generated by Rnd can be expressed as N/2048 where N is 0 through 2047. If you multiply by 2*528000, then every value can be expressed as 515.63 * N, where N is an integer from 0 through 2047. So even though it appeared to be testing 84 million different points where X and Y could take on any integer value of feet, it was really only testing 1024 sparsely distributed points.
The moral of the story is, "You better know the characteristics of your random distribution, or you could badly mislead yourself."
If you read the whole RANDOM_NUMBERS package specification, you will see a procedure called Random_Digit, that returns a random integer in the range 0..9. The comments in the package body describe how it works. I don't think the sequence repeats, but I haven't proved it. (Proving a random sequence doesn't repeat, and is completely uncorrelated is quite a job.) If you need a sequence longer than 2048 numbers, or need a very dense sequence, then use the Random_Digit function to build random numbers one digit at a time. Be patient, though. The Random_Digit function is slow.
Let me be the first to admit this isn't a very good random number package. Remember, this is a book about how to write Ada programs, not about how to design the ultimate random number generator. I didn't want to get waste a lot of time confusing readers with things that don't really have anything to do with Ada. There must be people who have made it their life's work to figure out the fastest, longest, densest random sequence. If you need a really good random number generator, it would be a good idea for you to search the literature and rewrite the package body of RANDOM_NUMBERS using a better technique. I just needed something that would generate a random sequence of 52 numbers so I could shuffle the poker deck, so I used this simple, well known random number generator. It's plenty good for that purpose.
Despite the limitations of the simple-minded random number generator, it teaches a valuable lesson: "Simple approaches can be taken in a package body when you need to do a quick feasibility study, then upgraded later." Maybe the RANDOM_NUMBERS package isn't good enough for a real Draw_Poker program. That doesn't matter when I'm first testing the logic of the program. Who cares if somebody notices the 157th card after the three of clubs is always the eight of hearts? Before I start selling the machine, I can give the RANDOM_NUMBERS package specification to somebody who really enjoys writing random number generators. When they come back with an improved body, I can just compile it and link it, and I will not have to worry about making any other changes in the Draw_Poker program. The important thing is that I can test out the Draw_Poker concept now, using a mediocre random-number generator.
Starting with the simplest approach helps me define the requirements for the final solution. For example, I learned that I have to throw in 1 second delays just to slow the Draw_Poker program down. This means I can tell a designer to come up with a random number generator without any detectable correlation (that would give a gambler an edge) and no speed constraints. On the other hand, if I need the random number generator for testing the COORDINATES package, and find out the simple package body only lets me test 10 points a second, speed will be important to me. I'd accept something that quickly generates a long string of dense random numbers derived from the CPU clock, even though it has a mild (but noticeable) 60 Hertz correlation in it.
Testing the ASCII_UTILITIES.Upper_Case function that works on a whole string is an example of a situation where all you can use is good judgment when selecting test cases. You can't test 100% of all possible input strings, simply because you can't even list 100% of all possible input strings. Even a sparse uniform distribution of all possible input strings boggles my mind. It would be possible to use Monte Carlo techniques for generating random input strings, but how many would I need to generate to assure myself it works? If there are thousands of these strings generated, how do I check to see if they were properly converted by the Upper_Case function? This is a case where, like it or not, there is no substitute for intelligence. You have to use some good judgment and come up with a few, well-chosen test cases.
It's hard to teach good judgment. Techniques that work one time don't make any sense at other times. I'll just give you an example, and then I'm afraid you're pretty much on your own.
The string Upper_Case function has three basic parts: (1) There is a loop that indexes through all the characters in the string, (2) the assignment statement that gives values to each character in the string, and (3) the return statement. I need to convince myself that each of these three parts work, then I have some assurance that the whole thing works.
To test the loop index, I will need to try the function on strings of different lengths. I certainly want to try one-character strings, and some moderate length strings, but what do I do about bizarre string lengths? How do I test it for a string -2 characters long? How do I declare the input test string? Will Ada let me declare TEST : string(2..1);? What value will she let me assign to TEST? What exception do I expect Upper_Case to raise if I do succeed in giving it an invalid string? What do I define the proper response to be?
Of course I have to ask the same questions about strings that are longer than the maximum length. Those questions are a little more plausible because it isn't unreasonable to anticipate a line like X := Upper_Case(Y & Z); where the lengths of Y and Z exceed the maximum string length. But even in that case I have to declare X to be longer than the maximum length, and Ada probably won't let me compile my test case. If she does, she should raise CONSTRAINT_ERROR when she tries to elaborate X, and the program will terminate before I get to my test case.
So here is my stand: Despite all my best efforts, I can't consciously create a situation that tries to process an illegal string, so I probably won't create one by accident. If I do create one by accident, it is the result of an error, and I should detect that error before I call the Upper_Case function. If I fail to detect the error, the part of the program I should fix is the part that caused the error or failed to detect it, not the innocent Upper_Case function. Judgment tells me that I don't need to test strings less than one character long, or longer than the maximum allowed by the implementation.
Therefore, to assure myself that the loop in the Upper_Case function works, I will test it with some minimum length strings, some maximum length strings, and a few intermediate length strings, and I will be satisfied if all those cases work.
Testing the assignment statement is relatively easy. I've already done 100% testing on the character Upper_Case function, so I know it works. Therefore, the few tests cases that checked the loop will also suffice to check the assignment. I'll want some of those cases to include lower case letters, and some to include upper case letters, numbers, and punctuation marks, to show they aren't changed.
It isn't hard to convince myself that the return statement works. The few test cases I've already planned won't work unless the return statement words, so they provide valuable information about the return statement. The only feature I need to test that hasn't already been tested is to see what happens when I try to Upper_Case a string N characters long and assign it to a string M characters long, when N is not equal to M. It should raise CONSTRAINT_ERROR. If N and M are known at compiler time, Ada may warn me about the CONSTRAINT_ERROR before I run the test.
Therefore, good judgment tells me that a few test cases satisfying the requirements in the preceding paragraphs are sufficient to test the string Upper_Case function.
The Fixed_Image function is one of those routines that is impossible to test for all possible inputs. There are an infinite number of floating point values. The computer can only represent a finite number of them, but even so, that finite number is far too large to test. Besides, the Fixed_Image function has three independent options (the number of digits before the decimal point, number after the decimal point, and leading characters). You can't test all input values for all combinations of options.
I had no choice but to use good judgment when selecting the test cases. I tested Fixed_Image with a bunch of different values. I used big values, small values, positive values, negative values, fixed length, variable length, and so on. It passed.
Then I wrote the some routines to test the logarithm functions in the TRIG package using the inverse function method. I was taking logs of powers of two, and then taking the antilog of the result to see if I got the original value back again. For example, Ln(0.5) should be -0.69315, and Exp(-0.69315) should be 0.5 (or very close to it). Ln(2.0) should be +0.69315, and Exp(+0.69315) should be 2.0. The test program was checking the results automatically, so there really wasn't any need to print them out, but I did anyway.
My test routine told me everything was fine, but I happened to notice on the printout that Ln(0.5) was +0.69315, and Exp(+0.69315) was 0.5. That's clearly an error in both the Ln and Exp functions. I thought they didn't work for values less than one. But then I noticed Ln(0.25) was -1.38629, and Exp(-1.38629) was 0.25, so the functions appeared to work for numbers less than one after all. Then I noticed Ln(2.0) was also +0.69315, and Exp(+0.69315) was 2.0! How could Exp(+0.69315) evaluate to two different answers and always the right one? It was very confusing.
It turned out that the bug was in Fixed_Image. Numbers in the interval from -1.0 to 0.0 were printed as positive values. Even though I had tested Fixed_Image with many different values, I didn't happen to pick a value in that small interval. I just discovered it through dumb luck.
After you have tried one or more of the previous input selection methods, dumb luck will finish the job. Just put a fully-debugged, error-free module to normal use and you will almost always find errors you didn't find before, no matter how much you tested it.
Don't think that dumb luck is a substitute for the other kinds of testing. It is the final line of defense, and should just find minor coding errors that can be fixed without changing any documentation. If normal usage finds an error that requires a major program revision, it is too late in the development cycle for that. You have to do everything you can to find major errors early.
Routines that involve user interfaces are very difficult to test, because it is so difficult to predict what a user will do. These routines need lots of operational testing because dumb luck is the only way of finding strange responses to illegal inputs.