Unit tests value proposition



The topic described in this article is part of my Unit Testing Pluralsight course.

I’m starting a new series which will be devoted to the topic of unit testing. In it, I’ll try to define what a valuable test is and show how the use of mocks fits into this picture. I’ll also describe the approach that I think has the best return of investments in terms of the value it provides.

Unit tests value proposition

There is an opinion that unit testing leads to a better design. I personally don’t think that unit testing in and of itself leads to anything.

It’s true that your unit test suite can become a good litmus test which tells you there’s something wrong with the code base. If the code is hard to unit test, then it probably requires improvement. However, the sole existence of a unit test suite doesn’t provide any guarantees. I’ve seen many code bases which were a mess despite a good test coverage.

In my opinion, the single most important benefit of unit testing is confidence. A good regression test suite enables you to refactor or add new features to your application without constant fear to break some existing functionality. I find this feeling liberating. Aside from the pure mental favor, such a test suite increases the speed of development and decreases the number of bugs. All these are invaluable benefits for any software development project.

Not all unit tests are made equal, however, and not all of them will automatically entail such gains. It’s important to differentiate various kinds of tests in terms of their value proposition.

So, what is a valuable unit test? It is a test which:

  • Has a high chance of catching a regression bug.
  • Has a low chance of producing a false positive.
  • Provides fast feedback.

High chance of catching a regression

The first point relates to the amount of code that gets exercised during a test. It is the number and, more importantly, the significance of the lines of code that are traversed during the test execution.

In that sense, trivial code is not worth to be tested because it’s short and doesn’t contain any business logic. Tests that cover trivial code just don’t provide a sensible chance of finding a regression. An example of such code is simple one line properties:

public class User

{

    public string Name { get; set; }

    public string Email { get; set; }

}

Low chance of producing a false positive

The second point is related to the way the test verifies the correctness of the system under test (SUT). The more the test is tied to the SUT’s implementation details, the more false positives it produces. A false positive is a situation where your test suite raises a false alarm: indicates an error, whereas, in the reality, everything works fine.

False positives can have a devastating effect on the health of your test suite. Just as non-determinism in tests, they dilute your ability to quickly spot the problem in case something goes wrong. Once you get accustomed to tests failing with every bit of refactoring, you stop paying attention to such failures, and legitimate failures get ignored with them.

The only way to reduce the chance of having false positives is decoupling your tests from the SUT’s implementation details as much as possible. Basically, you need to make sure you verify the end result your code generates, not the actual steps it takes to do that. Without such decoupling, you inevitably end up getting red tests in each refactoring, regardless of whether you break something or not.

Fast feedback

The final point is how quickly you get the feedback. It is important because the shorter your feedback loop, the faster you can adjust your course, and the less effort you waste going in a wrong direction. Quick feedback can only be provided by a fast test suite.

Unit tests value proposition: examples

These three attributes are mutually dependent. It’s impossible to maximize one of them without damaging the others. To illustrate this point, let’s consider end-to-end tests. They usually provide the best protection against regressions as they exercise all layers of your code base and thus have a high chance of catching a bug.

They are also mostly immune to false positives. A refactoring, if done right, doesn’t change the appearance of your system and thus doesn’t affect the end-to-end tests. The only thing such tests care of is how a feature behaves from the end user’s point of view, they don’t impose a concrete way to implement that feature.

The main drawback end-to-end tests possess is slowness. Any system that relies solely on such tests would have a hard time getting rapid feedback. And that is a deal-breaker for many development teams.

Similarly, it’s pretty easy to write a test that has a good chance of catching a regression but does it with a lot of false positives. An example here would be the following:

public class UserRepository

{

    public User GetById(int id)

    {

        /* … */

    }

 

    public string LastExecutedSqlStatement { get; private set; }

}

[Fact]

public void GetById_executes_correct_SQL_code()

{

    var repository = new UserRepository();

 

    User user = repository.GetById(5);

 

    Assert.Equal(

        “SELECT * FROM dbo.[User] WHERE UserID = 5”,

        repository.LastExecutedSqlStatement);

}

As you can see, the test just copies the actual implementation of the GetById method in terms of the SQL code it generates.

Will this test catch a bug in case one sneaks through? Sure. A developer can mess up with SQL code generation and accidentally use ID instead of UserID, and the test will point that out.

Does this test have a low chance of producing a false positive? Absolutely not. Here are different variations of the SQL statement which lead to the same result:

SELECT * FROM dbo.[User] WHERE UserID = 5

SELECT * FROM dbo.User WHERE UserID = 5

SELECT UserID, Name, Email FROM dbo.[User] WHERE UserID = 5

SELECT * FROM dbo.[User] WHERE UserID = @UserID

The test will raise an error should you change the SQL script to any of them because it is tightly coupled to the repository’s implementation details. There are several ways the repository can do its job but the test insists on a particular one of them:

Unit tests value proposition: Testing an implementation detail

Testing an implementation detail

The fix is pretty simple here. We just need to shift our focus from hows of the SUT to its whats and verify the end result instead:

Unit tests value proposition: Testing the end result

Testing the end result

In many cases, however, it’s hard to keep the balance between the three components. Anything less than end-to-end tests will be prone to refactoring to some degree. Also, you cannot achieve full confidence unless you exercise the majority of your code base which, in turn, is almost impossible to implement without knowing at least some implementation details of the SUT.

However, finding a good balance, despite being a hard task, is possible. In many cases, it requires architectural changes. What changes, you ask? That is something I’m going write about in the next post.

Summary

Let’s summarize the article with the following:

  • Unit tests in and of themselves don’t lead to a better design.
  • The main purpose of having a test suite is achieving confidence.
  • Not all tests are valuable. Valuable tests are tests that:
    • Have a high chance of catching regressions,
    • Have a low chance of producing false positives,
    • Provide fast feedback.
  • You cannot maximize one of these attributes without damaging the others.

Other articles in the series

Share




  • Matt Searles

    Awesome article! I look forward to reading you every week!

    In regards to confidence, what value do you place on coverage, and is 100% coverage worth it? We’re acheving this on our projects, and despite it seeming it a little tedious and inefficient at times, and some tests having seemingly having little value with low chance of catching a regression error, the confidence that comes with 100% coverage and ability to know at a glance if tests are missing (coverage is not 100%) *seems* to outweigh the drawbacks…

    • Johannes Norrbacka

      I suggest u read this wonderful blog post about tdd from vladimir:
      http://enterprisecraftsmanship.com/2015/07/06/how-to-do-painless-tdd/

      It explains the 100% test coverage, and how u can think about it.

      To Vlaidimir: Great work, I love your contributions in this blog and on pluralsight. It has surely made me a better coder.

      • http://enterprisecraftsmanship.com/ Vladimir Khorikov

        Thanks Johannes, glad to hear you find it valuable!

      • Matt Searles

        Thanks Johannes, like a goldfish I’d read that article and forgotten the question already has an answer.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Thanks Matt!

      I personally don’t consider 100% test coverage in and of itself a goal we should absolutely strive for. Moreover, the metric itself isn’t quite representative because it usually doesn’t take into account the code which resides in the underlying framework libraries. What I would do instead is I would unit test the domain logic of my application heavily (with 100% or close to 100% coverage) but omit unit testing all other code. To verify that the parts of the system work together correctly, I would employ integration testing with a fewer number of integration tests.

      In my experience, investing in low value tests isn’t worth it because they by themselves don’t quite contribute to the overall level of confidence. Johannes has brought the link to the article I myself wanted to bring: http://enterprisecraftsmanship.com/2015/07/06/how-to-do-painless-tdd/

      • Matt Searles

        Ah.. hm, I wonder if my confidence is misplaced in that case… Given the example in the link

        [HttpPost]
        public HttpResponseMessage CreateCustomer([FromBody] string name)
        {
        Customer customer = new Customer(name);

        _repository.Save(customer);
        _emailGateway.SendGreetings(customer);

        return Ok();
        }

        My instinct would simply be to mock the interface and spy on _repository.Save() and _emailGateway.SendGreetings() to check they’re called, and check the return value (though I’m the first to admit my instincts aren’t always reliable).

        The referenced article says that writing mocks clutters up the Arrange section, therefore we probably should write as few mocks as possible.

        But… if I make those dependencies public, and do a [ClassInitialize()] to initialize the controller as

        [ClassInitialize()]
        public void Setup()
        {

        _controller = new CustomerController(new MockCustomerRepository(), new MockEmailGateway())
        _customer = new Customer(“Bob”);}

        Then the tests are just

        public void Should_save_customer_to_repository()
        {
        _controller.CreateCustomer(_customer);

        // Assert that _controller.CustomerRepository.Save() was called
        }

        public void Should_send_greeting_to_customer()
        {
        _controller.CreateCustomer(_customer);

        // Assert that _controller.EmailGateway.Save() was called
        }

        public void Should_return_ok_for_valid_customer()
        {
        var response = _controller.CreateCustomer(_customer);

        // Assert response is Ok()
        }

        I’m not cluttering up the Arrange section, it’s a bit tedious, but not really time consuming, and it does mean that I’m confident that when someone makes that [Post], so long as Customer is tested, CustomerRepository is tested, and EmailGateway is tested, which they are, I can always be confident that the Post to CreateCustomer is working. If I haven’t tested it, I can’t.

        I feel as though that Post request could create a regression error if the requirements for CreateCustomer, or it’s class, or it’s callers change, and would argue the time it takes to write those test’s is pretty low, and the value is actually quite high, because the value of the Post request working as expected is high, ie, it’s at least as important that the customer is saved to the repository when the request is posted as it is that the customers pending state is correct, and probably more so.

        • http://enterprisecraftsmanship.com/ Vladimir Khorikov

          Unit testing this controller method with mocks doesn’t contribute much to the overall confidence. The amount of real code that get exercised in such tests in addition to the existing unit test is minimal (only 3 last lines) and the cyclomatic complexity of such code is 1. It means that here we essentially test trivial code which is not much more complicated than a one line property. That means such tests don’t have a significant chance of catching a regression (the first component that comprises a valuable test) and thus are not valuable themselves.

          You are right that the CreateCustomer method should be tested, though. But I argue that we shouldn’t use mocks for that purpose. Instead, I recommend to perform integration testing that would involve a real database and a stub/spy for the email gateway. An integration test, unlike a unit test using mocks, does traverse a significant amount of real code and thus has a good chance of catching a regression error. Here is an example: http://enterprisecraftsmanship.com/2015/07/13/integration-testing-or-how-to-sleep-well-at-nights/

          • Matt Searles

            I think you have me sold… But… the biggest pain point for the projects I’m working on is setting up and maintaining a real seperate database. Many of the databases I develop are relational with 50 or more tables, automically keeping the structure synchronized, truncating everything, then setting up test data before each test is a non-trivial task, whereas mocking the repository is entirely trivial, yet, the main value in doing so is to test the database structure and sql, both of which have fairly limited potential for regression errors. But, as I said, I *think* you have me sold, mostly on the fact that I don’t trust my SQL in the same way I don’t trust any of my other methods. Do you have any tips for quickly and easily synchronizing database structure for integration tests? I’ve done it before, and it was painful and hard to maintain, which I think in part is why I think I baulk a little at using integration tests to test databases and repositories over mocking repositories and trusting the SQL.

          • http://enterprisecraftsmanship.com/ Vladimir Khorikov

            Regarding the value – integration tests also have a better protection against false positives which also makes them a more attractive alternative to unit tests with mocks. With them, you mostly don’t care how a controller uses the repository as long as the data saved to the DB is the same.

            Maintaining a database for integration tests may indeed become a trouble. There are some best practices, though. First, you need to use the migration-based approach to database delivery, it will help you keep the production and the test databases in sync in terms of their structure. I wrote about it here: http://enterprisecraftsmanship.com/2015/08/18/state-vs-migration-driven-database-delivery/ and I also just finished a Pluralsight course on this exact topic which is going to go live next week (hopefully).

            In integration tests, don’t recreate the test database, only wipe out its content (everything except reference data).

            Regarding the test data setup, this indeed requires some investments. You will need to set it up in each test, but this code can be extracted to a single place using the Data Builder pattern (example: https://gist.github.com/vkhorikov/e52e7366a8b3eac98c5b ).

            Here’s a fully fledged example of the approach I use for integration testing: https://github.com/vkhorikov/FuntionalPrinciplesCsharp/tree/master/New/CustomerManagement.Tests/Integration . I didn’t use the Builder pattern their, though, because the tests are not too complex.

          • Matt Searles

            *grins* I love fluent interfaces, probably too much lol.

            The migration pattern is exactly what I did when I last attempted to solve this problem, though I didn’t have a name for it then.

            In retrospect, it probably would have easier to loop through the tables and grab the “SHOW CREATE TABLE” and push them into the test database (though some magic would have to happen somewhere about the ordering), but you’d advise against this? On the surface it would seem that it would both keep the databases synced, and truncate the the data. Last time I did something like this, I just truncated everything and had a function to recreate the reference data.

            Btw, you probably already know, but you can probably use “TRUNCATE” instead of “DELETE FROM” in Tests.cs -> ClearCustomers(), it’s got better performance for emptying a table, at least in my experience.

            I’m going to put it out there… Someone needs to write a simple to use, in-memory database that can read a schema and respond to SQL queries as if it were a particular database engine. Hm, I should probably google it…

            Thanks Vladmir, I love your work, you’ve got me convinced on the subject. I got some free PluralSight (which was awesome) with my last MSDN subscription I think it was… actually, I think I might ask management to work a subscription into our training budget this year, hopefully I can catch you on there!

          • http://enterprisecraftsmanship.com/ Vladimir Khorikov

            I personally find it’s easier to manually write which tables I need to truncate. That solves the problems with ordering. Also, creating/changing reference data should be a part of the database delivery process, I wouldn’t recommend touching reference data in tests, only master data.

            you can probably use “TRUNCATE” instead of “DELETE FROM”

            TRUNCATE doesn’t work if there are any foreign key constraints pointing at the table being truncated, unfortunately, even if there are no conflicts in terms of consistency. In terms of performance, there’s usually not much data in the test database anyway, so the performance shouldn’t suffer here.

            I’m glad I’ve convinced you to try to change your approach to unit testing. Pluralsight is definitely worth its money (I’m saying that not because I’m one of the authors 🙂 ), there are plenty of great teachers there.

          • Matt Searles

            Ah… fair point with TRUNCATE, and I’ll certainly think more about reference data, maybe I’m just making life harder for myself by recreating it for each TestClass.

            I started implementing your recommended strategy today, Builder Pattern, no mocks or spies, with a real db on localhost and parsing the http response from the controllers. It’s totally win, I can barely remember why I baulked so hard at the idea *sheepish grin*

  • Guillaume L

    I know I’m going to sound like a grammar nazi, but I’d rather refer to this type of results as “false negatives” than “false positives”. False negatives are not nearly as bad as false positives as far as correctness is concerned, because you’re (wrongly) warned as opposed to not being warned at all and risking letting a bug slip into production. If the time and effort needed to shut the culprit test’s mouth is small enough and the value of true positives high enough, it might still be worth the tradeoff. Obviously, the SQL query case you gave is a perfect counterexample since the degree of fragility of a 50 character string value is extremely high and basically nobody wants to base their tests on that, but there are times when things are more balanced.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Well, according to wiki ( https://en.wikipedia.org/wiki/False_positives_and_false_negatives ), both terms have a close meaning, depending on how you pose the question (“Does the system have bugs?” vs “Does the system work correctly?”). I personally like to think about it the same way I think of flu tests. A flu test is positive when the patient is infected, although there’s nothing positive in this fact.

      What you refer to as “false positive” is a matter of test coverage. A test suite wrongly assumes everything is good when it doesn’t cover some important functionality. This goes under the first attribute of the taxonomy from the post: “Has a high chance of catching a bug”.

      I agree with you on the balance. No unit tests are fully protected against false positives/negatives. When the SUT’s signature changes and you need to adjust the tests, it’s also a kind of a false positive. The challenging trade-off we face as developers is maximizing all 3 attributes without incurring substantial damage to any of them.