Styles of unit testing



The topic described in this article is part of my Unit Testing Pluralsight course.

In this post, I’ll describe different styles of unit testing and compare them in terms of their value proposition.

Styles of unit testing and their value proposition

There are 3 major styles of unit testing. The first one is functional, where you feed an input to the system under test (SUT) and check what output it produces:

Styles of unit testing: Functional style

Functional style

Obviously, this style only works for SUTs that don’t generate side effects. The yellow slash here depicts the point at which the examination is done.

The second one is state verification:

Styles of unit testing: State verification

State verification

With it, you verify the state the SUT ended up getting into after the operation is completed.

And finally, the third one is collaboration verification:

Styles of unit testing: Collaboration verification

Collaboration verification

Here, you focus on collaborations between the SUT and its neighbors: you check that all collaborators got invoked in a correct order and with correct parameters.

Now, let’s look at these styles from a value proposition perspective. In the last post, we defined what a valuable test is. It is a test that:

  • Has a high chance of catching a regression bug.
  • Has a low chance of producing a false positive.
  • Provides fast feedback.

Do the styles of unit testing I mentioned above have the same value proposition? They don’t. Although all three provide a fast feedback and, if not used for covering trivial functionality, have a high chance of catching a regression error, the second component differs significantly.

The functional style has the best protection against false positives. Among all other parts of the SUT, its input and output tend to change less frequently during refactorings. You can alter the internal implementation of the SUT completely and tests written in a functional way will not raise a false alarm as long as the signature of the method under test stays in place.

Note that there’s a difference between an input in a functional sense and a collaborator. While we pass both of them to the SUT the same way, the input value is immutable, whereas the collaborator either maintains its own internal state which can change over time or refers to an external dependency, for example to a database. Here is an example of the former:

public double Calculate(double x, double y)

{

    return x * x + y * y + Math.Sqrt(Math.Abs(x + y));

}

And here’s an example of a collaborator:

public double Calculate(ICalculator calculator, double x, double y)

{

    calculator.Push(x);

    calculator.Push(y);

    return calculator.CalculateFormula();

}

In the first sample, both the inputs and the output are immutable. The instance of ICalculator, on the other hand, does contain mutable state.

The second style of unit testing – state verification – is more prone to false positives but is still good enough as long as we verify the SUT’s state via its public API and don’t try to analyze its content via reflection. This, of course, requires you as a developer to think carefully about the parts of the SUT you expose publicly. You shouldn’t reveal its implementation details as that would introduce a tight coupling between the tests and the SUT.

As long as the SUT’s encapsulation is not violated, state verification is a good approximation to the functional style of unit testing. It’s still unlikely to produce false positives because the SUT’s public API tends to stay in place in the face of refactoring or addition of new functionality.

The following is an example of such a SUT:

public class Customer

{

    public string Name { get; private set; }

    public CustomerStatus Status { get; private set; }

 

    public Customer(string name)

    {

        Name = name;

        Status = CustomerStatus.Pending;

    }

}

There’s a valuable piece of domain knowledge to test here. That is, new customers must reside in a pending status. The verification can be performed via the Customer’s public API – its Status property:

[Fact]

public void New_customer_is_in_pending_state()

{

    var customer = new Customer(“John Doe”);

 

    Assert.Equal(CustomerStatus.Pending, customer.Status);

}

What about the third style – collaboration verification? This is where the value of unit tests starts to degrade.

The collaborations the SUT goes through in order to achieve its goal are not part of its public API. Therefore, binding unit tests to the SUT’s collaborations introduces coupling between the tests and the SUT’s implementation details. Such coupling makes the tests produce a lot of false positives as the collaboration pattern tends to change often during refactorings. And that, in turn, diminishes the value of such tests and the overall return of investments in them.

The problem with mocks and with the mockist approach in general is that they aim at verifying collaborations and often do that in a way that ties unit tests to the implementation details of the SUT.

Another problem with the mockist approach is that it encourages destructive decoupling – the situation in which you decouple parts of your code base so much that they lose their intrinsic cohesion. That is what I believe DHH referred to when he wrote about test-induced design damage.

With mocks, your class diagram often starts looking like this:

Styles of unit testing: Test-induced design damage

Test-induced design damage

Lots of connections – often introduced without necessity – result in an architecture with a large amount of cyclic dependencies which make it hard to grasp and understand the bigger picture.

The reason here is that classes are too fine-grained to consider them separate agents. Treating classes as collaborators makes the overall architecture too verbose and fragile.

Overall, the use of unit tests with mocks has the worst value proposition because of the bad signal/noise ratio. They do help catch regressions but they do that at the expense of producing lots of false positives. It defeats the whole purpose of unit testing: having a solid test suite which you can trust and rely upon.

The use of mocks is just a sign of a problem, of course, not the problem itself. However, it’s a very strong sign as it almost always signalizes an issue with your approach to unit testing. If you use mocks, you most likely couple your unit tests to the SUT’s implementation details.

There are a few legitimate use cases for mocks, though, but not in the context of unit testing. They can be useful in integration testing when you want to substitute a volatile dependency you don’t control. Also, mocks as an instrument also can be quite beneficial if you use them to create stubs. More details on that in the next post.

By the way, there is another style of unit testing: property-based testing which you can view as the functional style on steroids. It has essentially the same traits but does its work even better due to checking multiple inputs and outputs at once.

Summary

Seems that it’s time to end the post and I didn’t even get to the architectural changes I promised in the last article. I’ll cover them in the next post, along with the stuff I teased here (legitimate use cases for mocks).

Let’s summarize this one with the following:

  • There are 3 styles of unit testing: functional (output verification), state verification, and collaboration verification.
  • Functional style is the best one in terms of its value proposition as it has the lowest chance of producing false positives.
  • State verification is the second best choice. As long as you encapsulate your domain classes well and unit test them against their public API, you should be good.
  • Collaboration verification has the worst value proposition because such tests produce a lot of false positives.
  • Treating your classes as collaborators also encourages you to introduce lots of unnecessary dependencies which makes the overall architecture more complex and harder to grasp.

The first 2 styles are also known as the classicist approach. The third one is the mockist approach.

Other articles in the series

Share




  • cmllamacho

    Very nice article. I couldn’t agree more with you, mocks tend to degrade the value of a test suite very fast. And the amount of false positives that they produce is astonishing.

    The thing is that most of the work in a system is done via collaboration. The more I read and think about it, the more value I see to clearly define the architecture of the code in its infant stage to avoid this problems. I guess that comes in your next post, can’t wait.

  • Guillaume L

    I don’t think mocking couples you to the SUT’s implementation details. It couples you to its collaborator’s interface, which is a good thing to consider at design time when you’re fleshing out the contours of a dependency that doesn’t exist yet or discovering how to use an external library.

    You don’t need to keep all your mock-based tests in place for regression testing, some of them can be thrown away once the design is dry. But at the borders of your application, the only way to verify that something happens (an email is sent, a message is emitted on a bus, etc.) is sometimes to replace the outside world with a faster, simpler version that records the conversations that took place.

    When you think about it, the extra work you have to do to fix a unit test that fails because the dependency’s interface changed is not that huge. You already have to change all production code that uses that dependency anyway. You just need to do exactly the same in extra places.

    I’m not recommending to use mocks everywhere, but they work well for me in those situations.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      I agree with the use of mocks/test doubles on the boundaries of the system to integrate with the outside world. I think I didn’t emphasize it in the article: I wrote specifically about using mocks to unit test the domain model which is almost always a bad idea.

      The problem with the mockist approach is that it doesn’t differentiate these two areas, it encourages to use mocks for both the insides of a domain model and the communications with the external world.

      Throwing away mocks once the design is established can indeed be helpful but that’s not something I see often. And it’s definitely not something the authors of the mockist school do, according to the GOOS book.

      • Guillaume L

        Agreed. “Mockist” isn’t a term I like anyway, since mocks are a tool, not a goal 😉 But I’m not sure it’s an adjective Freeman and Pryce fully endorse either.

        I don’t delete mock-based tests often, and they probably don’t do it in GOOS, because it isn’t all that painful to maintain in my experience.

        I totally agree about mocking inside the domain model. I never do it.

    • Matthias

      @disqus_prB9VeDbHv:disqus I think what @vladimirkhorikov:disqus is saying is that since the SUT orchestrates its collaborators and all interactions between them, with mock objects you prepare and replay this orchestration precisely in the unit test itself (with varying degrees of liberty you get out of the different mocking libraries and modes.) This then means that an implementation detail, namely the specific ways in which the SUT’s collaborators interact, leaks into the test case. If you were to rearrange them without changing the observable behavior of the SUT, a mock based test would fail. It’s furthermore an implementation detail since on the API surface of the SUT no interaction plan is visible or should be assumed.

      • http://enterprisecraftsmanship.com/ Vladimir Khorikov

        Great explanation! I think I will adopt the wording you just used, it’s incredibly precise and clear.

      • Guillaume L

        @Matthias consider these few points :

        – A SUT that orchestrates calls to more than 2-3 collaborators is probably poorly designed. I would say that in 80% of my code, there is usually only one dependency, if any, used in a given method. Not exactly the orchestration hell you seem to imply.

        – As a general rule, I don’t assert on more than one mock in a given test. This is not new stuff (see “only verify one thing per test” best practice). If there is a chain of collaborators, intermediate ones are usually stubs not mocks.

        Example :

        Test “When a Reservation is confirmed, an email is sent to the user”

        SUT calls ReservationRepository (stub) and UserRepository (stub) in any order. SUT sets Reservation to confirmed. Then it calls NotificationService (mock).

        “If you were to rearrange them without changing the observable behavior of the SUT, a mock based test would fail.” – no it wouldn’t, because I don’t test the order in which dependencies are called, nor does the result of my test depend on it (see above).

        – I can’t think of a single occasion where I had to rearrange calls to dependencies without affecting the SUT’s external behavior. Do you have a concrete example? Even if we set aside the reservations above, it doesn’t look to me like a frequent enough reason to discard a whole testing technique.

        “It’s furthermore an implementation detail since on the API surface of the SUT no interaction plan is visible or should be assumed”. – don’t get me wrong. I think that we can leverage implementation details in order to test crucial outcomes of an object interaction. I’m not saying that testing implementation details is the ultimate goal.

        • http://enterprisecraftsmanship.com/ Vladimir Khorikov

          @Guillaume L I think the discussion above implied communication between domain classes. You’ve provided an example with interactions between separate applications which is a valid example of the use of mocks.

  • Michael G.

    In which of the three styles would you put domain event testing?

    Consider a system wherein you automatically charge a payment from a client’s credit card. Depending on the outcome of the charge, you would want to trigger several other events, such as sending an email to the client, or logging some financial information.

    Charge(Payment payment)
    {
    // Logic
    _eventPublisher.Publish(new PaymentSucceededEvent(payment.Id));
    }

    SendPaymentSucceededEmail(int paymentId)
    {
    // Logic
    }

    LogPaymentSucceeded(int paymentId)
    {
    // Logic
    }

    Unit tests to cover that SendPaymentSucceededEmail and LogPaymentSucceeded are invoked after a successful Charge, would seem a good idea in this case, but which style would it fit into?

  • Matthias

    Great article!

    I do have a few questions:

    While not arguing for or against mockist style testing, one perceived benefit of mock based testing I have is that you only ever test an object graph of depth 1, thus cutting out complexity that does not directly pertain to the SUT (this is all assuming your interactions follow the Law of Demeter.)

    I have been a mockist for the entirety of my career but am actively exploring “the other side of the fence”. Given the increased depth of an object graph when testing without mocks, how do you suggest testing / accounting for effects N layers down? More specifically:

    1 – Assuming dependency inversion is applied, when setting up the SUT I would have to pass the direct collaborators first, but those won’t be mocks, so I have to pass their collaborators, which aren’t mocks so I’d have to pass their collaborators, but… you get the point. I suppose this necessitates using a DI framework to inject your test cases?

    2 – Objects will collaborate. You can’t argue that point away. If I call a method on the SUT that requires collaboration, at which level would you assert the effect? Moreover, in order to test the effect, wouldn’t this likely imply e.g. exposing state getters on collaborators which might not have been added otherwise? (I find it a code smell to widen a public API solely for satisfying a test.)

    Thanks for the write up!

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Thank you for the comment, great questions.

      I agree that one of the benefits of the mock-based approach is the ability to “cut” the object graph. At the same time, I think it’s also one of its weaknesses as it encourages you to create deep object graphs. Or, rather, it doesn’t stop you from doing that. Without mocks, you are forced to create fewer layers of indirections, “flatten” the class structure, so to speak. And it is a good thing as it generally results in a simpler architecture. An example of the difference between the two styles is in the GOOS book review I posted recently ( http://enterprisecraftsmanship.com/2016/07/05/growing-object-oriented-software-guided-by-tests-without-mocks/ ). The original sample project implementation contained about twice as much classes comparing to the refactored version, and most of those classes were simply not needed to solve the task the book posed.

      With this approach, your class graph often starts resembling a wide and short tree: http://i.imgur.com/R6smISN.png . It means that, to create an object, you need to bring up just a few of its children.

      In cases where the tree is still too deep even after removing all unnecessary layers of indirection, Test Data Builder pattern can be used. That is, you can extract the Arrange logic to separate utility classes and reuse them across the test suite. That would address your first question. I wouldn’t recommend using a DI framework for anything other than the Application Services layer (view models in an MVVP application; controllers in a web application).

      2.

      I find it a code smell to widen a public API solely for satisfying a test.

      It is a code smell.

      To deal with this problem, you need to treat the whole set of communicating objects as a single SUT (I’m assuming the effect you want to test is spread across all of them). This is a common trap I see some Londoners face: they tend to equate a single class to the SUT. Whereas, a better way of putting it is to deem an aggregate (in a DDD sense) as such. When you start treating the whole sub-set of classes as a single unit of behavior, the need to introduce state getters vanishes. You can can examine the objects those inputs ended up at using their natural public APIs.

      • Matthias

        Thanks for the in-depth reply. Great points there. I find it refreshing to look at the problem from a new angle. Will definitely try to incorporate your suggestions to develop a better feel for it. I find it surprisingly difficult sometimes to stray from practices that are somehow ingrained in your development routine. Writing tests falls into that category I suppose.

        • http://enterprisecraftsmanship.com/ Vladimir Khorikov

          Yeah, coding habits are hard to overcome. I can tell that from my own experience too.