Part of the "The Return of the EDFH" series (link)

The Return of the Enterprise Developer From Hell

More malicious compliance, more property-based testing

In a previous series of posts, I introduced you to the burned-out and lazy programmer known as the Enterprise Developer From Hell, or the EDFH for short. As we saw, the EDFH loves to practice malicious compliance.

Recently, the EDFH’s influence was apparent with this viral answer to an interview problem.

Write a function that converts the input to the output.

  • Input: “aaaabbbcca”
  • Output: [(‘a’,4), (‘b’,3), (‘c’,2), (‘a’,1)]

Of course the EDFH answer is simple:

let func inputStr =
  // hard code the answer
  [('a',4); ('b',3); ('c',2); ('a',1)]

Since this is the only specification we are given, this is a perfectly fine implementation!

It’s funny though, because the interviewer was obviously asking for something a bit more complex.

But this raises two very important questions: what exactly was the interviewer asking for? And how would they know if they got it?

With only one input/output pair given, there are lots of potential specs that could work. However, the consensus of twitter was that this was meant to be a run-length encoding (RLE) problem. 😀

So now we have two specific challenges:

  • What should the specification for RLE be? How can we define it in an unambiguous way?
  • How can we check that a particular RLE implementation meets that spec?

So what does a spec for RLE actually look like? Interestingly, when I searched the internet, I didn’t find much. Wikipedia has a RLE page with an example but no spec. Rosetta Stone has a RLE page with an informal spec.

Testing in the presence of the EDFH

Let’s put the spec on hold for a minute and turn our attention to testing. How can we check that an RLE implementation works?

One approach would be to do example-based testing:

  • expect that the output of rle("") is []
  • expect that the output of rle("a") is [(a,1)]
  • expect that the output of rle("aab") is [(a,2); (b,1)]

and so on.

But, if we recall our previous experience with the EDFH, they will surely find an implementation that passes all the tests, but is still wrong. For example, the EDFH’s implementation for the examples above could look like this:

let rle inputStr =
  match inputStr with
  | "" ->
    []
  | "a" ->
    [('a',1)]
  | "aab" ->
    [('a',2); ('b',1)]
  | "aaaabbbcca" ->
    [('a',4); ('b',3); ('c',2); ('a',1)]
  // everything else
  | _ -> []

And if we check this implementation, it looks pretty good!

rle "a"           //=> [('a',1);]
rle "aab"         //=> [('a',2); ('b',1)]
rle "aaaabbbcca"  //=> [('a',4); ('b',3); ('c',2); ('a',1)]

The best way to beat the EFDH is to use random inputs, and in particular, property-based testing.

A nice thing about property-based testing is that, by doing it, you can often discover the specification. In a previous post, I discussed how we might test an implementation of addition. Eventually, we discovered the properties of commutativity, associativity, and identity. These not only defined the tests we needed, they also pretty much define what “addition” actually is.

Let’s see if we can do the same for RLE.

Using EDFH implementations to help us think of properties

Remember that in property based testing, we’re not allowed to reimplement the logic, but instead we have to come up with general properties that work for all inputs.

But this is the hard part – thinking of properties. However, we can use the EDFH to guide us! For each implementation that the EDFH creates, we figure out why it’s wrong, and then create a property to capture that requirement.

For example, the EDFH might implement the RLE function as an empty list, no matter what the input was:

let rle_empty (inputStr:string) : (char*int) list =
  []

Why would that be wrong? Because the output must have some connection to the input. It should contain every character from the input, in fact.

Ok then, the EDFH will retaliate by returning each character with a count of one.

let rle_allChars inputStr =
  inputStr
  |> Seq.toList
  |> List.map (fun ch -> (ch,1))

If we run this, we get

rle_allChars ""      //=> []
rle_allChars "a"     //=> [('a',1)]
rle_allChars "abc"   //=> [('a',1); ('b',1); ('c',1)]
rle_allChars "aab"   //=> [('a',1); ('a',1); ('b',1)]

These outputs do indeed contain every character from the corresponding input.

Why is that wrong? Well, we want to collect “runs”, which means that we should not have two ‘a’s together. Each character in the output list must be different from the adjacent characters.

That’s an easy fix for the EFDH, just add distinct into the pipeline!

let rle_distinct inputStr =
  inputStr
  |> Seq.distinct // added
  |> Seq.toList
  |> List.map (fun ch -> (ch,1))

And now the output satisfies the “runs” property – the duplicates have disappeared.

rle_distinct "a"     //=> [('a',1)]
rle_distinct "aab"   //=> [('a',1); ('b',1))]
rle_distinct "aaabb" //=> [('a',1); ('b',1))]

Why is this wrong? Well what about those counts? They’re all just 1. What should they be?

Without reimplementing the algorithm, we don’t know what the individual counts should be, but we do know what they should add up to: the number of characters in the string. If there are 5 characters in the source string, the sum of the run lengths should be 5 as well.

Unfortunately, the EDFH has an answer for this as well. Their implementation can just use groupBy or countBy to get the counts.

let rle_groupedCount inputStr =
  inputStr
  |> Seq.countBy id
  |> Seq.toList

The output looks good at first glance

rle_groupedCount "aab"         //=> [('a',2); ('b',1))]
rle_groupedCount "aaabb"       //=> [('a',3); ('b',3))]
rle_groupedCount "aaaabbbcca"  //=> [('a',5); ('b',3); ('c',2))]

But there’s a subtle problem. In the third example, there are two distinct runs of 'a' but the rle_groupedCount implementation merges them together.

What we wanted:

[('a',4); ('b',3); ('c',2); ('a',1)]

What we got:

[('a',5); ('b',3); ('c',2)]
//    ^ wrong number      ^ another entry needed here

The problem with the groupedCount approach is that it doesn’t take the order of the characters into account. What property could we come up with that would catch that?

The simplest way to check for ordering is just to reverse something! In this case we could have a property: “A reversed input should give a reversed output”. The rle_groupedCount implementation would fail this – just what we want.

So, already, with just a few minutes thinking (and some help from the EDFH) we have some properties that can be used to check an RLE implementation:

  • The output must contain all the characters from the input
  • No two adjacent characters in the output can be the same
  • The sum of the run lengths in the output must equal the total length of the input
  • If the input is reversed, the output must also be reversed
Is that enough to correctly check a RLE implementation? Can you think of any malicious EDFH implementations that would satisfy these properties and yet be wrong? We’ll revisit this in a later post.

Property checking in practice

Let’s put these concepts into practice. We’ll use FsCheck, the F# library, to test these properties against both bad and good implementations.

As of F# 5, it’s easy to load FsCheck into the interactive workspace. You can just reference it directly like this:

#r "nuget:FsCheck"

NOTE: For the code used in these examples, see the link at the bottom of this post

Now we can write our first property: “the result must contain all the characters from the input”

// An RLE implementation has this signature
type RleImpl = string -> (char*int) list

let propUsesAllCharacters (impl:RleImpl) inputStr =
  let output = impl inputStr
  let expected =
    inputStr
    |> Seq.distinct
    |> Seq.toList
  let actual =
    output
    |> Seq.map fst
    |> Seq.distinct
    |> Seq.toList
  expected = actual

Normally the only parameters for a property would be the ones under test, but in this case we will also pass in an implementation parameter so we can test with the EDFH implementations, as well as our (hopefully) correct ones

Checking the rle_empty implementation

Let’s try it out with the first EDFH implementation, the one that always returned an empty list:

let impl = rle_empty
let prop = propUsesAllCharacters impl
FsCheck.Check.Quick prop

The response from FsCheck is:

Falsifiable, after 1 test (1 shrink) (StdGen (777291017, 296855223)):
Original:
"#"
Shrunk:
"a"

In other words, simply using the minimal string “a” as input will break the property.

Checking the rle_allChars implementation

If we try with the rle_allChars implementation…

let impl = rle_allChars
let prop = propUsesAllCharacters impl
FsCheck.Check.Quick prop

…we immediately get a ArgumentNullException because we completely forgot to handle null inputs in the implementation! Thank you, property based testing!

Let’s fix up the implementation to handle nulls…

let rle_allChars inputStr =
  // add null check
  if System.String.IsNullOrEmpty inputStr then
    []
  else
    inputStr
    |> Seq.toList
    |> List.map (fun ch -> (ch,1))

… and then try again – oops – we get another null issue, this time in our property. Let’s fix that up too.

let propUsesAllCharacters (impl:RleImpl) inputStr =
  let output = impl inputStr
  let expected =
    if System.String.IsNullOrEmpty inputStr then
      []
    else
      inputStr
      |> Seq.distinct
      |> Seq.toList
  let actual =
    output
    |> Seq.map fst
    |> Seq.distinct
    |> Seq.toList
  expected = actual

Now if we try one more time, the property passes.

Ok, passed 100 tests.

So the incorrect rle_allChars implementation does pass, as we expected. Which is why we need the next property: “adjacent characters in the output cannot be the same”

The “adjacent characters are not the same” property

In order to define this property, we will first define a helper function removeDupAdjacentChars that strips duplicates.

/// Given a list of elements, remove elements that have the
/// same char as the preceding element.
/// Example:
///   removeDupAdjacentChars ['a';'a';'b';'b';'a'] => ['a'; 'b'; 'a']
let removeDupAdjacentChars charList =
  let folder stack element =
    match stack with
    | [] ->
      // First time? Create the stack
      [element]
    | top::_ ->
      // New element? add it to the stack
      if top <> element then
        element::stack
      // else leave stack alone
      else
        stack

  // Loop over the input, generating a list of non-dup items.
  // These are in reverse order. so reverse the result
  charList |> List.fold folder [] |> List.rev

With this in hand, our property will get the characters from the output and then remove duplicates. If the implementation is correct, removing duplicates should not have any effect.

/// Property: "Adjacent characters in the output cannot be the same"
let propAdjacentCharactersAreNotSame (impl:RleImpl) inputStr =
  let output = impl inputStr
  let actual =
    output
    |> Seq.map fst
    |> Seq.toList
  let expected =
    actual
    |> removeDupAdjacentChars // should have no effect
  expected = actual // should be the same

Now let’s check this new property against the EDFH’s rle_allChars implementation:

let impl = rle_allChars
let prop = propAdjacentCharactersAreNotSame impl
FsCheck.Check.Quick prop

And…

Ok, passed 100 tests.

That was unexpected. Maybe we were just unlucky? Let’s change the default configuration to be 10000 runs rather than 100.

let impl = rle_allChars
let prop = propAdjacentCharactersAreNotSame impl
let config = {FsCheck.Config.Default with MaxTest = 10000}
FsCheck.Check.One(config,prop)

And…

Ok, passed 10000 tests.

…it still passes. That’s not good.

Hmmm. Let’s add a quick printf to print the strings being generated by FsCheck, so we can see what is going on.

let propAdjacentCharactersAreNotSame (impl:RleImpl) inputStr =
  let output = impl inputStr
  printfn "%s" inputStr
  // etc

Here’s what the input strings being generated by FsCheck look like:

v$D
%q6,NDUwm9~ 8I?a-ruc(@6Gi_+pT;1SdZ|H
E`Vxc(1daN
t/vLH$".5m8RjMrlCUb1J1'
Y[Q?zh^#ELn:0u

We can see that the strings are very random, and almost never have a series of repeated characters. From the point of view of testing an RLE algorithm these inputs are completely useless!

The moral of this story is that, just as for regular TDD, make sure you start with failing tests. Only then can you be sure that your correct implementation is passing for the right reasons.

So what we need to do now is generate interesting inputs rather than random strings.

How can we do that? And then how can we monitor what the inputs are without resorting to crude print debugging?

That will be the topic of the next installment!

Source code used in this post is available here.

Comments

blog comments powered by Disqus