Part of the "Property Based Testing" series (link)

Choosing properties in practice, part 2

Properties for roman numeral conversion

In the previous post, we tested some list functions using various properties. Let’s keep going and test some more code in the same way. For this post, our challenge will be to test roman numeral conversion logic.

Generating Roman numerals in two different ways

In my post “Commentary on ‘Roman Numerals Kata with Commentary’" I came up with two completely different algorithms for generating Roman Numerals.

The first algorithm was based on understanding that Roman numerals were based on tallying

In other words, replace five strokes with a “V”, replace two Vs with an X and so on, leading to this simple implementation:

module TallyImpl =
  let arabicToRoman arabic =
    (String.replicate arabic "I")
      // optional substitutions

If we test this interactively, we get what seems like correct behavior.

TallyImpl.arabicToRoman 1    //=> "I"
TallyImpl.arabicToRoman 9    //=> "IX"
TallyImpl.arabicToRoman 24   //=> "XXIV"
TallyImpl.arabicToRoman 999  //=> "CMXCIX"
TallyImpl.arabicToRoman 1493 //=> "MCDXCIII"

Bi-quinary implementation

Another way to think about Roman numerals is to imagine an abacus. Each wire has four “unit” beads and one “five” bead.

This leads to the so-called “bi-quinary” approach:

module BiQuinaryImpl =
  let biQuinaryDigits place (unit,five,ten) arabic =
    let digit =  arabic % (10*place) / place
    match digit with
    | 0 -> ""
    | 1 -> unit
    | 2 -> unit + unit
    | 3 -> unit + unit + unit
    | 4 -> unit + five // changed to be one less than five
    | 5 -> five
    | 6 -> five + unit
    | 7 -> five + unit + unit
    | 8 -> five + unit + unit + unit
    | 9 -> unit + ten  // changed to be one less than ten
    | _ -> failwith "Expected 0-9 only"

  let arabicToRoman arabic =
    let units = biQuinaryDigits 1 ("I","V","X") arabic
    let tens = biQuinaryDigits 10 ("X","L","C") arabic
    let hundreds = biQuinaryDigits 100 ("C","D","M") arabic
    let thousands = biQuinaryDigits 1000 ("M","?","?") arabic
    thousands + hundreds + tens + units

Again, if we test interactively, the results look good.

BiQuinaryImpl.arabicToRoman 1    //=> "I"
BiQuinaryImpl.arabicToRoman 9    //=> "IX"
BiQuinaryImpl.arabicToRoman 24   //=> "XXIV"
BiQuinaryImpl.arabicToRoman 999  //=> "CMXCIX"
BiQuinaryImpl.arabicToRoman 1493 //=> "MCDXCIII"

But how can we be sure these implementations are correct for all numbers, not just the ones we tested?

The test oracle

One way to gain confidence is to use the test oracle approach – compare them to each other. It’s a great way to cross-check two different implementations when you’re not sure that either implementation is right!

let biquinary_eq_tally number =
  let tallyResult = TallyImpl.arabicToRoman number
  let biquinaryResult = BiQuinaryImpl.arabicToRoman number
  tallyResult = biquinaryResult

But if we try running this code, we get a ArgumentException: The input must be non-negative due to the String.replicate call.

Check.Quick biquinary_eq_tally
// ArgumentException: The input must be non-negative.

So we need to only include inputs that are positive. We also need to exclude numbers that are greater than 4000, say, since the algorithms break down there too.

How can we implement this filter?

We saw in an earlier post that we could use preconditions. But for this example, we’ll try something different and change the generator.

First we’ll define a integer generator (an “arbitrary”) called arabicNumber which is filtered as we want (if you recall, an “arbitrary” is a combination of a generator algorithm and a shrinker algorithm, as described earlier). We’ll only include numbers from 1 to 4000.

let arabicNumber =
  |> Arb.filter (fun i -> i > 0 && i <= 4000)

Next, we create a new property which is constrained to only use the arabicNumber generator by using the Prop.forAll helper.

let biquinary_eq_tally_withinRange =
  Prop.forAll arabicNumber biquinary_eq_tally

Now finally, we can do the cross-check test again:

Check.Quick biquinary_eq_tally_withinRange
// Ok, passed 100 tests.

And we’re good! Both algorithms work correctly, it seems.

Checking the entire domain

How many roman numbers do we have in total? 4000 we said. So why not test them all?

Let’s run all 4000 numbers through our property, and filter out the ones that succeed, leaving only the ones that fail, like this:

[1..4000] |> List.choose (fun i ->
  if biquinary_eq_tally i then None else Some i
// output => [4000]

We would expect there to be no numbers that failed, and the output list to be empty, but actually there is one number in the list: 4000!

If we check the two conversions for 4000 we can see how they differ. The biquinary implementation didn’t know how to handle it. The tally implementation is less brittle and would work for higher numbers without breaking.

TallyImpl.arabicToRoman 4000     //=> "MMMM"
BiQuinaryImpl.arabicToRoman 4000 //=> "M?"

Is this something we care about? Maybe not. We might want to restrict the inputs to the implementations to be less than 4000 though. For example, we could alter the tally implementation to return Some or None like this:

let arabicToRoman arabic =
  if (arabic <= 0 || arabic >= 4000) then
    (String.replicate arabic "I")
      // etc
    |> Some

Tip: If your domain is small enough, why not check all the values in it?

For an example from a different domain, see the post “There are Only Four Billion Floats – So Test Them All!". It uses a “test oracle” approach to check an implementation against all four billion floats.

Don’t forget to check the boundaries!

This little hiccup is a reminder then, that property-based checking is not a golden hammer. It generates random data, but it is not necessarily the best way of probing a domain at the boundaries. If you do have well known boundaries, it’s best to create some specific tests for them, either using a custom generator for your PBT tool, or by simply doing some explicit example-based tests for the edge cases, like this:

for i in [3999..4001] do
  if not (biquinary_eq_tally i) then
    failwithf "test failed for %i" i

There and back again

Another approach to testing these roman numeral implementations would be to create an inverse function, such that applying the original and then applying the inverse gets you back to the original state.

If we think of the roman numeral conversion as an “encoding”, then we will need to write a corresponding “decoder”. Here’s a very simple one using the same “tally” based approach that we used originally:

let romanToArabic (str:string) =

and here’s some ad-hoc tests

TallyDecode.romanToArabic "I"       //=> 1
TallyDecode.romanToArabic "IX"      //=> 9
TallyDecode.romanToArabic "XXIV"    //=> 24
TallyDecode.romanToArabic "CMXCIX"  //=> 999
TallyDecode.romanToArabic "MCDXCIII"//=> 1493

With the inverse function now available, we can write a property based test. Note that we’re using the same arabicNumber generator to limit the inputs:

/// encoding then decoding should return
/// the original number
let encodeThenDecode_eq_original =

  // define an inner property
  let innerProp arabic1 =
    let arabic2 =
      |> TallyImpl.arabicToRoman // encode
      |> TallyDecode.romanToArabic // decode
    // should be same
    arabic1 = arabic2

  Prop.forAll arabicNumber innerProp

And if we run the test, it passes.

Check.Quick encodeThenDecode_eq_original
// Ok, passed 100 tests.

We can also check the biquinary encoding against the tally-based decoding

/// encoding then decoding should return
/// the original number
let encodeThenDecode_eq_original2 =

  // define an inner property
  let innerProp arabic1 =
    let arabic2 =
      |> BiQuinaryImpl.arabicToRoman // encode
      |> TallyDecode.romanToArabic // decode
    // should be same
    arabic1 = arabic2

  Prop.forAll arabicNumber innerProp

And if we run the test, it passes again.

Check.Quick encodeThenDecode_eq_original2
// Ok, passed 100 tests.

Solving a smaller problem

Yet another way of checking that the implementation is correct is to break it down into smaller components and check that the smaller components are correct.

For example, if the we break the Arabic number into 1000s, 100s, 10s, and units, and then encode them separately, the concatenation of these components should be the same as the original Arabic number encoded directly.

let recursive_prop =

  // define an inner property
  let innerProp arabic =
    let thousands =
      (arabic / 1000 % 10) * 1000
      |> BiQuinaryImpl.arabicToRoman
    let hundreds =
      (arabic / 100 % 10) * 100
      |> BiQuinaryImpl.arabicToRoman
    let tens =
      (arabic / 10 % 10) * 10
      |> BiQuinaryImpl.arabicToRoman
    let units =
      arabic % 10
      |> BiQuinaryImpl.arabicToRoman

    let direct =
      |> BiQuinaryImpl.arabicToRoman

    // should be same
    direct = thousands+hundreds+tens+units

  Prop.forAll arabicNumber innerProp

And, it works great!

Check.Quick recursive_prop
// Ok, passed 100 tests.

Some things never change

As we mentioned last time, invariants are often a great way to beat the EDFH. On their own, they can not prove that an implementation is correct, but in conjunction with other properties they are a very powerful tool.

So, what invariants are there for a roman numeral encoding?

Well, there are no obvious things which are preserved between the input and the output of the encoding, such as the length of a string, or the contents of a collection.

However, there are some properties which are preserved when a roman numeral is created from another roman numeral using arithmetic. For example, there are never more than three “I"s, or one “V”, etc. We can test these invariants easily.

First, we need a function to count the number of occurrences of a pattern:

let matchesFor pattern input =

"MMMCXCVIII" |> matchesFor "I"   //=> 3
"MMMCXCVIII" |> matchesFor "XC"  //=> 1
"MMMCXCVIII" |> matchesFor "C"   //=> 2
"MMMCXCVIII" |> matchesFor "M"   //=> 3

and then we can define our property using those invariants:

let invariant_prop =

  let maxMatchesFor pattern n input =
    (matchesFor pattern input) <= n

  // define an inner property
  let innerProp arabic =
    let roman = arabic |> TallyImpl.arabicToRoman
    (roman |> maxMatchesFor "I" 3)
    && (roman |> maxMatchesFor "V" 1)
    && (roman |> maxMatchesFor "X" 4)
    && (roman |> maxMatchesFor "L" 1)
    && (roman |> maxMatchesFor "C" 4)
    // etc

  Prop.forAll arabicNumber innerProp

And if we test it, it works as expected.

Check.Quick invariant_prop
// Ok, passed 100 tests.

Different paths, same destination

How far can we go with these patterns? We haven’t done the commutive diagram approach yet – can we apply that to the roman numeral encoding as well?

Yes, we can. It’s not really the most appropriate way to test this, but it’s a fun game to see what you can come up with!

How about this for the two paths:

  • Path 1
    • given a number less than 400
    • encode it first
    • then replace “C” with “M”, “X” with “C”, and so on, to create a new roman numeral.
  • Path 2
    • given the same number
    • multiply it by 10 first
    • then encode it

The two paths should give the same result.

/// Encoding a number less than 400 and then replacing
/// all the characters with the corresponding 10x higher one
/// should be the same as encoding the 10x number directly.
let commutative_prop1 =

  // define an inner property
  let innerProp arabic =
    // take the part < 1000
    let arabic = arabic % 1000
    // encode it
    let result1 =
      (TallyImpl.arabicToRoman arabic)
    // encode the 10x number
    let result2 =
      TallyImpl.arabicToRoman (arabic * 10)

    // should be same
    result1 = result2

  Prop.forAll arabicNumber innerProp

And, amazingly, it works!

Check.Quick commutative_prop1
// Ok, passed 100 tests.

What about the converse:

  • Path 1
    • given a number less than 4000
    • encode it first
    • then replace “C” with “X”, “M” with “C”, and so on, to create a new roman numeral.
  • Path 2
    • given the same number
    • divide it by 10 first
    • then encode it
/// Encoding a number and then replacing all the characters with
/// the corresponding 10x lower one should be the same as
/// encoding the 10x lower number directly.
let commutative_prop2 =

  // define an inner property
  let innerProp arabic =
    // encode it
    let result1 =
      (TallyImpl.arabicToRoman arabic)
    // encode the 10x lower number
    let result2 =
      TallyImpl.arabicToRoman (arabic / 10)

    // should be same
    result1 = result2

  Prop.forAll arabicNumber innerProp

This time, it fails, with “9” as a counter-example. Why is this?

Check.Quick commutative_prop2
// Falsifiable, after 9 tests
// 9

That can be an exercise for the reader!

Summary so far

I hope you get the idea now. By trying out the various approaches (oracles, inverses, invariants, commutative diagrams, etc.) we can almost always come up with useful properties for our design.

Interlude: A game based on finding properties

With that, we have come to the end of the various property categories. We’ll go over them one more time in a minute – but first, an interlude.

If you sometimes feel that trying to find properties is a mental challenge, you’re not alone. Would it help to pretend that it is a game?

As it happens, there is a game based on finding properties.

It’s called Zendo and it involves placing sets of objects (such as plastic pyramids) on a table, such that each layout conforms to a pattern – a rule – or as we would now say, a property!.

The other players then have to guess what the rule (property) is, based on what they can see.

Here’s a picture of a Zendo game in progress:


The white stones mean the property has been satisfied, while black stones mean failure. Can you guess the rule here? I’m going to guess that it’s something like “a set must have a yellow pyramid that’s not touching the ground”.

Alright, I suppose Zendo wasn’t really inspired by property-based testing, but it is a fun game, and it has even been known to make an appearance at programming conferences.

If you want to learn more about Zendo, the rules are here.

To be continued

In the next post we will keep going and use the same techniques to test a classic TDD example.

Source code used in this post is available here.


blog comments powered by Disqus