2019-02-11
  • In part 5 of this discussion of "generative testing", we began to see areas where randomly generated data didn't quite fit into the unit testing way of doing things.
    • Tests need to be extensible, so it often isn't sufficient to merely provide one set of generators for use in all tests. We will look at ways of selecting arbitrary generators to use for particular tests.
    • It is easy to generate a bunch of random tests, but when you try to run a specific test again, that example may not have been generated again, then the testing framework can't find the test because different tests were generated each time (both a help and a hindrance). How do generative testing frameworks help?
      F_Parameterized_Test_With_Better_Generators 509 373

The DNA of property-based testing

As "DNA" is composed of 4 nucleotides, adenine (A), thymine (T), guanine (G) and cytosine (C), property-based testing frameworks do 4 things for you.

The testing libraries have functions register the property-based tests with the testing framework, so different randomly-generated tests may be run each time.

  • A - Add test to framework
    • Register property tests for execution with an identifiable test name.
  • T - Test properties you define
    • These functions often take the input and return true when the test passes and false otherwise.
  • G - Generate reproducible data
    • Specify which generators will provide the random data for the property tests.
    • Provide a way to reproduce a test when a failing test is discovered, to see if an attempted code fix had the desired effect. (This is usually by way of a random seed that can reproduce the same sequence of random values consistently over time and on different machines.)
  • C - Collapse to the minimal failing case using shrinking.

The specific ways in which programmers may plug into the different testing frameworks really start to diverge here in this post, but by looking for these 4 parts, you'll see the commonalities between the different libraries. If you choose a different library for your platform, you can still look for these and use them in a similar manner.

Clojure.test & test.check

The same principals apply in Clojure's test.check as many other QuickCheck-inspired libraries. Clojure's easy-to-visualize data results will help us see what is going on. Watch for the DNA of the property-based tests as we go over each part of this test definition.

(defspec sheep-bleat?_matches_oracle                         ; `A`dd named test                         
  (prop/for-all [text (s/gen ::f/sheepish-like-string 4)]    ; Specify `G`enerator(s)
    (= (some? (re-find #"^baa+$" text))                      ; Define property to `T`est
       (sheep-bleat? text))))

user=> (sheep-bleat?_matches_oracle)
{:result         false,
 :seed           1549979449257,
 :failing-size   12,
 :num-tests      13,
 :fail           ["baaaaaa"],                                ; `C`ollapsed from "baaaaaa" to
 :shrunk                                                     ; shrunken test: "baa"
                 {:total-nodes-visited 8, :depth 2, :result false, :smallest ["baa"]}}
  • *A*dd test – It registers a test with clojure.test with a descriptive name, so the tests are runnable with commands like lein test or (clojure.test/run-tests).
  • *G*enerate reproducible data – clojure.test.check.properties/for-all is like the clojure.core/for list comprehension, in that it evaluates its expression body for each value generated by the generator.
  • *T*est properties – The property is modeled as an expression that compares the "actual" value generated by (sheep-bleat? text) to the "expected" value produced by our oracle.
  • *C*ollapse – shrink to a smaller failing case to make it easier to troubleshoot.

Let's analyze the output, key by key.

  • :result false – The test failed
  • :seed 1549979449257 – The random seed that generated the cases, and can be used to reproduce the test.
  • :failing-size 12 – The size/interestingness that generated the test.
  • :num-tests 13 – The number of tests executed.
  • :fail ["baaaaaa"] – The initial value that caused the test to fail. This is where the shrinking starts.
  • :shrunk – Details of the shrinking process.
    • :total-nodes-visited 8 – The count of distinct variation nodes visit while shrinking.
    • :depth 2 – The number of shrinking operations performed before getting to the smallest. Depth is always less-than or equal to total-nodes-visited.
    • :result falsefalse means the test failed.
    • :smallest ["baa"] – The smallest case that failed. This is where the shrinking stops.

Reproducing a test with :seed

After fixing the bug, if we want to run with the same random number generator to reproduce a formerly-failing scenario, we can specify the seed and the number of tests to execute again in an attempt to find an example that disproves the assertion.

user=> (sheep-bleat?_matches_oracle 100 :seed 1549979449257)
{:result false, :seed 1549979449257, :failing-size 12, :num-tests 13, :fail ["baaaaaa"], :shrunk {:total-nodes-visited 8, :depth 2, :result false, :smallest ["baa"]}}

FsCheck & NUnit

For this example, I'm using NUnit and FsCheck.NUnit (3.0.0-alpha4).

Registering properties to test and generators to customize tests

So in this part, we'll use FsCheck.Nunit.PropertyAttribute to indicate that a method should be treated as a property-based test. In one example, it returns true/false, and in the other, it uses an assertion

        [Property(Arbitrary = new[] { typeof(A_class_that_defines_a_generator) })]
        public bool AnExpression(SheepishTestCase nearlySheepish) => false == "a boolean expression";

        [Property(Arbitrary = new[] { typeof(A_class_that_defines_a_generator) })]
        public void AnNUnitAssertion(SheepishTestCase nearlySheepish) =>
            Assert.AreEqual(true, false, "helpful description");

        // * FsCheck uses reflection to find all the public Arbitrary functions that this class exposes.
        public static Arbitrary<SheepishTestCase> NearlySheepish() =>
            someGenerator.ToArbitrary();

Here it is in context. Watch for the DNA of the property-based tests as we go over each part of this test definition.

    using PropertyAttribute = FsCheck.NUnit.PropertyAttribute;

    [TestFixture]
    public class G_Property_Tests_With_Oracle
    {
        [Property( //`A`dd named test
           Arbitrary = new[] { typeof(G_Property_Tests_With_Oracle) })] // * Specify `G`enerator(s) and shrinkers
        public bool SheepBleatExpression(SheepishTestCase nearlySheepish) => // Define property to `T`est
            Regex.IsMatch(nearlySheepish.Text, @"^baa+$")
            == Sheepish.IsSheepBleat(nearlySheepish.Text);


        [Property(Arbitrary = new[] { typeof(G_Property_Tests_With_Oracle) })] // * 
        public void SheepBleatAssertion(SheepishTestCase nearlySheepish) =>
            Assert.AreEqual(
                Regex.IsMatch(nearlySheepish.Text, @"^baa+$"),
                Sheepish.IsSheepBleat(nearlySheepish.Text));

        // * FsCheck uses reflection to find all the public Arbitrary functions that this class exposes.
        public static Arbitrary<SheepishTestCase> NearlySheepish() =>
            Gen.zip(
                Gen.zip3(UsuallyEmptyString,
                         StringOfB,
                         UsuallyEmptyString),
                Gen.zip3(StringOfA,
                         UsuallyEmptyString,
                         StringOfA))
            .Select(t => new SheepishTestCase
            {
                Text = t.Item1.Item1 + t.Item1.Item2 + t.Item1.Item3
                     + t.Item2.Item1 + t.Item2.Item2 + t.Item2.Item3,
            })
            .ToArbitrary();

        static Gen<string> UsuallyEmptyString =>
            Gen.Frequency(
                Tuple.Create(9, Gen.Constant("")),
                Tuple.Create(1, Arb.Default.NonEmptyString().Generator.Select(s => s.Item)));

        static Gen<string> StringOfA =>
            Gen.Choose(-3, 3).Select(n => new string('a', Math.Max(n, 0)));
        static Gen<string> StringOfB =>
            Gen.Choose(0, 3).Select(n => new string('b', Math.Max(n, 0)));

        public class SheepishTestCase
        {
            public string Text;
            public override string ToString() => Text;
        }
    }

We will test with an implementation that is mostly right. (Hint: It is wrong for short strings!)

    public class Sheepish
    {
        public static bool IsSheepBleat(string text) =>
            text[0] == 'b' && text.Substring(1).All('a'.Equals);
    }

NUnit test output

When testing this faulty implementation, the first time it found one of the mistakes (too few 'a' characters.)

G_Property_Tests_With_Oracle_Fail 443 275

This randomly generated ba and didn't need to shrink. Falsifiable, after 8 tests (0 shrinks)

Test Name:  SheepBleatExpression
Test FullName: Sheepish.CSharp.G_Property_Tests_With_Oracle.SheepBleatExpression
Test Source:   C:\path\to\no-new-legacy\src\Sheepish.net\Sheepish.CSharp\G_Property_Tests_With_Oracle.cs : line 15
Test Outcome:  Failed
Test Duration: 0:00:00.123

Result Message:   
Falsifiable, after 8 tests (0 shrinks) (8688893743095023968,1720744240272185509)
Last step was invoked with size of 9 and seed of (870643180446620632,17206338264105658397):
Original:
ba
with exception:
System.Exception: Expected true, got false.

This didn't need to shrink either, since it was already too short. Falsifiable, after 1 test (0 shrinks).

Test Name:  SheepBleatAssertion
Test FullName: Sheepish.CSharp.G_Property_Tests_With_Oracle.SheepBleatAssertion
Test Source:   C:\path\to\no-new-legacy\src\Sheepish.net\Sheepish.CSharp\G_Property_Tests_With_Oracle.cs : line 20
Test Outcome:  Failed
Test Duration: 0:00:01.887

Result Message:   
Falsifiable, after 1 test (0 shrinks)                 (3398291736501447271,14834039787887665769)
Last step was invoked with size of 2 and seed of (12998797121066964081,9859987255001039891):
Original:
b
with exception:
NUnit.Framework.AssertionException:   Expected: False
  But was:  True

   at NUnit.Framework.Assert.ReportFailure(String message) in C:\src\nunit\nunit\src\NUnitFramework\framework\Assert.cs:line 394
   ... // skipping some stack frames for brevity
   at Sheepish.CSharp.G_Property_Tests_With_Oracle.SheepBleatAssertion(SheepishTestCase nearlySheepish) in C:\path\to\no-new-legacy\src\Sheepish.net\Sheepish.CSharp\G_Property_Tests_With_Oracle.cs:line 21
--- End of stack trace from previous location where exception was thrown ---
   ... // skipping some stack frames for brevity

I ran it again a couple more times without changing anything and it found the other bug (that empty strings would blow up.)

Test Name:  SheepBleatExpression
Test FullName:  Sheepish.CSharp.G_Property_Tests_With_Oracle.SheepBleatExpression
Test Source:    C:\path\to\no-new-legacy\src\Sheepish.net\Sheepish.CSharp\G_Property_Tests_With_Oracle.cs : line 15
Test Outcome:   Failed
Test Duration:  0:00:00.031

Result Message: 
Falsifiable, after 17 tests (0 shrinks) (3733660618497006796,9742222590813313245)
Last step was invoked with size of 18 and seed of (15948249697227250317,12068873245047634363):
Original:

with exception:
System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.String.get_Chars(Int32 index)
   at Sheepish.CSharp.Sheepish.IsSheepBleat(String text) in C:\path\to\no-new-legacy\src\Sheepish.net\Sheepish.CSharp\Sheepish.cs:line 9
   at Sheepish.CSharp.G_Property_Tests_With_Oracle.SheepBleatExpression(SheepishTestCase nearlySheepish) in C:\path\to\no-new-legacy\src\Sheepish.net\Sheepish.CSharp\G_Property_Tests_With_Oracle.cs:line 16
   ... // skipping some stack frames for brevity

Reproducing a test with Replay

Say we want to reproduce that exact test case in the future. We don't have to turn it into data and new up the data bit by bit; we can simply take the random seed tuple from the failed test above and use it again using the Replay property of the attribute.

        // Reproduce an test case that used to fail.
        [Property(Arbitrary = new[] { typeof(G_Property_Tests_With_Oracle) },
                  Replay = "(3733660618497006796,9742222590813313245)")]
        public bool SheepBleatExpressionThatFailedBefore(SheepishTestCase nearlySheepish) =>
            SheepBleatExpression(nearlySheepish);

It produces the same message as before, (but I am not going to make you read the whole thing twice. You're welcome ;-)

Message: Falsifiable, after 17 tests (0 shrinks) (3733660618497006796,9742222590813313245)
Last step was invoked with size of 18 and seed of (15948249697227250317,12068873245047634363):

After fixing the implementation with this:

        public static bool IsSheepBleat(string text) =>
            text.Length >= 3  // This should fix it!
            && text[0] == 'b' && text.Substring(1).All('a'.Equals);

It passes with:

Ok, passed 100 tests.

G_Property_Tests_With_Oracle 445 298

Finally, we can refactor to a more optimal implementation without fear of breaking anything, but that's an exercise for the reader.

Conclusion

This concludes the sheepish examples, but there are still more generative testing concepts to discuss. In a future post, we'll come back and discuss more things we can do to make sure the shrinkers can reduce tests to a minimal case.

Source code

If you want to follow along, the source code is here: