This post is part of the F# Advent Calendar in English 2014 project. Check out all the other great posts there! And special thanks to Sergey Tihon for organizing this.
UPDATE: I did a talk on property-based testing based on these posts. Slides and video here.
Also, there is now a post on how to choose properties for property-based testing
Let’s start with a discussion that I might have had once (topic changed to protect the guilty):
Me to co-worker: “We need a function that will add two numbers together, would you mind implementing it?”
(a short time later)
Co-worker: “I just finished implementing the ‘add’ function”
Me: “Great, have you written unit tests for it?”
Co-worker: “You want tests as well?” (rolls eyes) “Ok.”
(a short time later)
Co-worker: “I just wrote a test. Look! ‘Given 1 + 2, I expect output is 3’. "
Co-worker: “So can we call it done now?”
Me: “Well that’s only one test. How do you know that it doesn’t fail for other inputs?”
Co-worker: “Ok, let me do another one.”
(a short time later)
Co-worker: “I just wrote a another awesome test. ‘Given 2 + 2, I expect output is 4’”
Me: “Yes, but you’re still only testing for special cases. How do you know that it doesn’t fail for other inputs you haven’t thought of?”
Co-worker: “You want even more tests?” (mutters “slave driver” under breath and walks away)
But seriously, my imaginary co-worker’s complaint has some validity: How many tests are enough?
So now imagine that rather than being a developer, you are a test engineer who is responsible for testing that the “add” function is implemented correctly.
Unfortunately for you, the implementation is being written by a burned-out, always lazy and often malicious programmer who I will call The Enterprise Developer From Hell, or “EDFH”. (The EDFH has a cousin who you might have heard of).
You are practicing test-driven-development, enterprise-style, which means that you write a test, and then the EDFH implements code that passes the test. Unfortunately for you, the EDFH loves to practice malicious compliance, as we will see.
So you start with a test like this (using vanilla NUnit style):
[<Test>] let ``When I add 1 + 2, I expect 3``() = let result = add 1 2 Assert.AreEqual(3,result)
The EDFH then implements the
add function like this:
let add x y = if x=1 && y=2 then 3 else 0
And your test passes!
When you complain to the EDFH, they say that they are doing TDD properly, and only writing the minimal code that will make the test pass.
Fair enough. So you write another test:
[<Test>] let ``When I add 2 + 2, I expect 4``() = let result = add 2 2 Assert.AreEqual(4,result)
The EDFH then changes the implementation of the
add function to this:
let add x y = if x=1 && y=2 then 3 else if x=2 && y=2 then 4 else 0 // all other cases
When you again complain to the EDFH, they point out that this approach is actually a best practice. Apparently it’s called “The Transformation Priority Premise”.
At this point, you start thinking that the EDFH is being malicious, and that this back-and-forth could go on forever!
So the question is, what kind of test could you write so that a malicious programmer could not create an incorrect implementation, even if they wanted to?
Well, you could start with a much larger list of known results, and mix them up a bit.
[<Test>] let ``Add two numbers, expect their sum``() = let testData = [ (1,2,3); (2,2,4); (3,5,8); (27,15,42) ] for (x,y,expected) in testData do let actual = add x y Assert.AreEqual(expected,actual)
But the EDFH is tireless, and will update the implementation to include all of these cases as well.
let add x y = match x,y with | 1,2 -> 3 | 2,2 -> 4 | 3,5 -> 8 | 27,15 -> 42 | _ -> 0 // all other cases
A much better approach is to generate random numbers and use those for inputs, so that a malicious programmer could not possibly know what to do in advance.
let rand = System.Random() let randInt() = rand.Next() [<Test>] let ``Add two random numbers, expect their sum``() = let x = randInt() let y = randInt() let expected = x + y let actual = add x y Assert.AreEqual(expected,actual)
If the test looks like this, then the EDFH will be forced to implement the
add function correctly!
One final improvement – the EDFH might just get lucky and have picked numbers that work by chance, so let’s repeat the random number test a number of times, say 100 times.
[<Test>] let ``Add two random numbers 100 times, expect their sum``() = for _ in [1..100] do let x = randInt() let y = randInt() let expected = x + y let actual = add x y Assert.AreEqual(expected,actual)
So now we’re done!
Or are we?
There’s just one problem. In order to test the
add function, you’re making use of the
+ function. In other words, you are using one implementation to test another.
In some cases that is acceptable (see the use of “test oracles” in a following post), but in general, it’s a bad idea to have your tests duplicate the code that you are testing! It’s a waste of time and effort, and now you have two implementations to build and keep up to date.
So if you can’t test by using
+, how can you test?
The answer is to create tests that focus on the properties of the function – the “requirements”. These properties should be things that are true for any correct implementation.
So let’s think about what the properties of an
add function are.
One way of getting started is to think about how
add differs from other similar functions.
So for example, what is the difference between
subtract? Well, for
subtract, the order of the parameters makes a difference, while for
add it doesn’t.
So there’s a good property to start with. It doesn’t depend on addition itself, but it does eliminate a whole class of incorrect implementations.
[<Test>] let addDoesNotDependOnParameterOrder() = for _ in [1..100] do let x = randInt() let y = randInt() let result1 = add x y let result2 = add y x // reversed params Assert.AreEqual(result1,result2)
That’s a good start, but it doesn’t stop the EDFH. The EDFH could still implement
x * y and this test would pass!
So now what about the difference between
multiply? What does addition really mean?
We could start by testing with something like this, which says that
x + x should be the same as
x * 2:
let result1 = add x x let result2 = x * 2 Assert.AreEqual(result1,result2)
But now we are assuming the existence of multiplication! Can we define a property that only depends on
One very useful approach is to see what happens when the function is repeated more than once. That is, what if you
add and then
add to the result of that?
That leads to the idea that two
add 1s is the same as one
add 2. Here’s the test:
[<Test>] let addOneTwiceIsSameAsAddTwo() = for _ in [1..100] do let x = randInt() let y = randInt() let result1 = x |> add 1 |> add 1 let result2 = x |> add 2 Assert.AreEqual(result1,result2)
add works perfectly with this test, while
However, note that the EDFH could still implement
add using subtraction and this test would pass!
Luckily, we have the “parameter order” test above as well. Combining both the “parameter order” and “add 1 twice” tests should narrow it down so that there is only one correct implementation, surely?
After submitting this test suite we find out the EDFH has written an implementation that passes both these tests. Let’s have a look:
let add x y = 0 // malicious implementation
Aarrghh! What happened? Where did our approach go wrong?
Well, we forgot to force the implementation to actually use the random numbers we were generating!
So we need to ensure that the implementation does indeed do something with the parameters that are passed into it. We’re going to have to check that the result is somehow connected to the input in a specific way.
Is there a trivial property of
add that we know the answer to without reimplementing our own version?
What happens when you add zero to a number? You always get the same number back.
[<Test>] let addZeroIsSameAsDoingNothing() = for _ in [1..100] do let x = randInt() let result1 = x |> add 0 let result2 = x Assert.AreEqual(result1,result2)
So now we have a set of properties that can be used to test any implementation of
add, and that force the EDFH to create a correct implementation:
There’s quite a bit of duplicated code in these three tests. Let’s do some refactoring.
First, we’ll write a function called
propertyCheck that does the work of generating 100 pairs of random ints.
propertyCheck will also need a parameter for the property to test. In this example, the
property parameter will be a function that takes two ints and returns a bool:
let propertyCheck property = // property has type: int -> int -> bool for _ in [1..100] do let x = randInt() let y = randInt() let result = property x y Assert.IsTrue(result)
With this in place, we can redefine one of the tests by pulling out the property into a separate function, like this:
let commutativeProperty x y = let result1 = add x y let result2 = add y x // reversed params result1 = result2 [<Test>] let addDoesNotDependOnParameterOrder() = propertyCheck commutativeProperty
We can also do the same thing for the other two properties.
After the refactoring, the complete code looks like this:
let rand = System.Random() let randInt() = rand.Next() let add x y = x + y // correct implementation let propertyCheck property = // property has type: int -> int -> bool for _ in [1..100] do let x = randInt() let y = randInt() let result = property x y Assert.IsTrue(result) let commutativeProperty x y = let result1 = add x y let result2 = add y x // reversed params result1 = result2 [<Test>] let addDoesNotDependOnParameterOrder() = propertyCheck commutativeProperty let add1TwiceIsAdd2Property x _ = let result1 = x |> add 1 |> add 1 let result2 = x |> add 2 result1 = result2 [<Test>] let addOneTwiceIsSameAsAddTwo() = propertyCheck add1TwiceIsAdd2Property let identityProperty x _ = let result1 = x |> add 0 result1 = x [<Test>] let addZeroIsSameAsDoingNothing() = propertyCheck identityProperty
We have defined a set of properties that any implementation of
add should satisfy:
- The parameter order doesn’t matter (“commutativity” property)
addtwice with 1 is the same as doing
addonce with 2
- Adding zero does nothing (“identity” property)
What’s nice about these properties is that they work with all inputs, not just special magic numbers. But more importantly, they show us the core essence of addition.
In fact, you can take this approach to the logical conclusion and actually define addition as anything that has these properties.
This is exactly what mathematicians do. If you look up addition on Wikipedia, you’ll see that it is defined entirely in terms of commutativity, associativity, identity, and so on.
You’ll note that in our experiment, we missed defining “associativity”, but instead created a weaker property (
x+1+1 = x+2).
We’ll see later that the EDFH can indeed write a malicious implementation that satisfies this property, and that associativity is better.
Alas, it’s hard to get properties perfect on the first attempt, but even so, by using the three properties we came up with, we have got a much higher confidence that the implementation is correct, and in fact, we have learned something too – we have understood the requirements in a deeper way.
A collection of properties like this can be considered a specification.
Historically, unit tests, as well as being functional tests, have been used as a sort of specification as well. But an approach to specification using properties instead of tests with “magic” data is an alternative which I think is often shorter and less ambiguous.
You might be thinking that only mathematical kinds of functions can be specified this way, but in future posts, we’ll see how this approach can be used to test web services and databases too.
Of course, not every business requirement can be expressed as properties like this, and we must not neglect the social component of software development. Specification by example and domain driven design can play a valuable role when working with non-technical customers.
You also might be thinking that designing all these properties is a lot of work – and you’d be right! It is the hardest part. In a follow-up post, I’ll present some tips for coming up with properties which might reduce the effort somewhat.
But even with the extra effort involved upfront (the technical term for this activity is called “thinking about the problem”, by the way) the overall time saved by having automated tests and unambiguous specifications will more than pay for the upfront cost later.
In fact, the arguments that are used to promote the benefits of unit testing can equally well be applied to property-based testing! So if a TDD fan tells you that they don’t have the time to come up with property-based tests, then they might not be looking at the big picture.
We have implemented our own property checking system, but there are quite a few problems with it:
- It only works with integer functions. It would be nice if we could use the same approach for functions that had string parameters, or in fact any type of parameter, including ones we defined ourselves.
- It only works with two parameter functions (and we had to ignore one of them for the
identityproperties). It would be nice if we could use the same approach for functions with any number of parameters.
- When there is a counter-example to the property, we don’t know what it is! Not very helpful when the tests fail!
- There’s no logging of the random numbers that we generated, and there’s no way to set the seed, which means that we can’t debug and reproduce errors easily.
- It’s not configurable. For example, we can’t easily change the number of loops from 100 to something else.
It would be nice if there was a framework that did all that for us!
Thankfully there is! The “QuickCheck” library was originally developed for Haskell by Koen Claessen and John Hughes, and has been ported to many other languages.
The version of QuickCheck used in F# (and C# too) is the excellent “FsCheck” library created by Kurt Schelfthout. Although based on the Haskell QuickCheck, it has some nice additional features, including integration with test frameworks such as NUnit and xUnit.
So let’s look at how FsCheck would do the same thing as our homemade property-testing system.
First, you need to install FsCheck and load the DLL
If you are using F# 5 or newer, you can reference the package directly in a script, like this:
#r "nuget:NUnit" open FsCheck
For older versions of F#, you should download the nuget package manually, and then reference the DLL in your script:
// 1) use "nuget install FsCheck" or similar to download // 2) include your nuget path here #I "/Users/%USER%/.nuget/packages/fscheck/2.14.4/lib/netstandard2.0" // 3) reference the DLL #r "FsCheck.dll" open FsCheck
Once FsCheck is loaded, you can use
Check.Quick and pass in any “property” function. For now, let’s just say that a “property” function is any function (with any parameters) that returns a boolean.
Here’s an example using a property function called
let add x y = x + y // correct implementation let commutativeProperty (x,y) = let result1 = add x y let result2 = add y x // reversed params result1 = result2 // check the property interactively Check.Quick commutativeProperty
And here’s a check of a property function called
let add1TwiceIsAdd2Property x = let result1 = x |> add 1 |> add 1 let result2 = x |> add 2 result1 = result2 // check the property interactively Check.Quick add1TwiceIsAdd2Property
And the identity property
let identityProperty x = let result1 = x |> add 0 result1 = x // check the property interactively Check.Quick identityProperty
If you check one of the properties interactively, say with
Check.Quick commutativeProperty, you’ll see the message:
Ok, passed 100 tests.
Let’s see what happens when we have a malicious implementation of
add. In the code below, the EDFH implements
add as multiplication!
That implementation will satisfy the commutative property, but what about the
let add x y = x * y // malicious implementation let add1TwiceIsAdd2Property x = let result1 = x |> add 1 |> add 1 let result2 = x |> add 2 result1 = result2 // check the property interactively Check.Quick add1TwiceIsAdd2Property
The result from FsCheck is:
Falsifiable, after 1 test (1 shrink) (StdGen (1657127138,295941511)): 1
That means that using
1 as the input to
adding1TwiceIsAdding2OnceProperty will result in
false, which you can easily see that it does.
By using random testing, we have made it harder for a malicious implementer. They will have to change tactics now!
The EDFH notes that we are still using some magic numbers in the
adding1TwiceIsAdding2OnceProperty – namely 1 and 2,
and decides to create an implementation that exploits this. They’ll use a correct implementation for low input values and an incorrect implementation for high input values:
let add x y = if (x < 10) || (y < 10) then x + y // correct for low values else x * y // incorrect for high values
Oh no! If we retest all our properties, they all pass now!
That’ll teach us to use magic numbers in our tests!
What’s the alternative? Well, let’s steal from the mathematicians and create an associative property test.
let associativeProperty x y z = let result1 = add x (add y z) // x + (y + z) let result2 = add (add x y) z // (x + y) + z result1 = result2 // check the property interactively Check.Quick associativeProperty
Aha! Now we get a falsification:
Falsifiable, after 38 tests (4 shrinks) (StdGen (127898154,295941554)): 8 2 10
That means that using
(8+2)+10 is not the same as
Note that not only has FsCheck found some inputs that break the property, but it has found a lowest example.
It knows that the inputs
8,2,9 pass but going one higher (
8,2,10) fails. That’s very nice!
In this post I’ve introduced you to the basics of property-based testing and how it differs from the familiar example-based testing.
I’ve also introduced the notion of the EDFH, an evil malicious programmer. You might think that such a malicious programmer is unrealistic and over-the-top.
But in many cases you act like an unintentionally malicious programmer. You happily create a implementation that works for some special cases, but doesn’t work more generally, not out of evil intent, but out of unawareness and blindness.
Like fish unaware of water, we are often unaware of the assumptions we make. Property-based testing can force us to become aware of them.
But how does a property-based testing library like FsCheck actually work in detail? That’s the topic of the next post.
Source code used in this post is available here.