Is Equality Comparison a Code Pollution?

July 13, 2020

This post is about where to put the equality comparison logic: under the test or production code.

Equality comparison

First off, what is equality comparison and how is it related to unit testing?

Equality comparison (or equality comparison logic) is a set of methods that allow you to compare one object to another. The primary use case here is checking whether these two objects are the same.

Here’s how it relates to unit testing. Sometimes, you need to verify a group of an object’s properties, like this:

[Fact]
public void Changing_tour_starting_point_changes_gathering_address()
{
    var busTour = new TourRoute();

    busTour.SetStartingPoint(TourStartingPoint.ThomasJeffersonMemorial);

    Assert.Equal("16 E Basin Dr SW", busTour.GatheringAddress.Street);
    Assert.Equal("Washington", busTour.GatheringAddress.City);
    Assert.Equal("DC", busTour.GatheringAddress.State);
    Assert.Equal("20242", busTour.GatheringAddress.ZipCode);
}

It’s quite tedious to write all these assertions in each and every test that verifies address-related info.

Instead, you could introduce an equality comparison logic that would allow you to compare address instances using just one assertion.

And that’s where this question comes up: where should you put this logic?

You could put it under the test code as a separate equality comparer:

public class AddressComparer : IEqualityComparer<Address>
{
    public bool Equals(Address x, Address y)
    {
        if (x is null && y is null)
            return true;

        if (x is null || y is null)
            return false;

        return x.Street == y.Street
            && x.City == y.City
            && x.State == y.State
            && x.ZipCode == y.ZipCode;
    }

    public int GetHashCode(Address obj)
    {
        unchecked
        {
            int hashCode = (obj.Street != null ? obj.Street.GetHashCode() : 0);
            hashCode = (hashCode * 397) ^ (obj.City != null ? obj.City.GetHashCode() : 0);
            hashCode = (hashCode * 397) ^ (obj.State != null ? obj.State.GetHashCode() : 0);
            hashCode = (hashCode * 397) ^ (obj.ZipCode != null ? obj.ZipCode.GetHashCode() : 0);
            return hashCode;
        }
    }
}

and then use it in the test like this:

[Fact]
public void Changing_tour_starting_point_changes_gathering_address()
{
    var busTour = new TourRoute();

    busTour.SetStartingPoint(TourStartingPoint.ThomasJeffersonMemorial);

    Assert.Equal(
        new Address("16 E Basin Dr SW", "Washington", "DC", "20242"), // expected
        busTour.GatheringAddress, // actual
        new AddressComparer()); // custom comparer
}

Or you could add this comparison logic to the Address class directly and simplify the assertion even further (because you’d be able to omit the custom comparer).

So, which approach is better?

Missing abstraction vs YAGNI

In the book, I wrote that when the test’s assertion section grows too large, it could be a sign of a missing abstraction. And that’s exactly the case for our sample code: the Address class is a Value Object, meaning that it doesn’t have an identity and we should compare its instances not by reference (which is the default way in most OOP languages), but using structural equality.

This obviously speaks in favor of putting the equality comparison logic directly to Address. But there’s a counter-argument: wouldn’t that logic be a code pollution?

Code pollution is an anti-pattern that takes place when you add production code that’s only needed for testing. Normally, it takes the form of various types of switches that make the code’s behavior differ in production and test environments:

public class Logger
{
    private readonly bool _isTestEnvironment;

    public Logger(bool isTestEnvironment) // the switch
    {
        _isTestEnvironment = isTestEnvironment;
    }

    public void Log(string text)
    {
        if (_isTestEnvironment)
            return; /* Don't log if it's a test */

        /* Log otherwise */
    }
}

Assuming that the production code doesn’t need the equality comparison logic in Address, isn’t that logic a code pollution too?

There’s also a counter-argument from a YAGNI perspective. YAGNI stands for "You aren’t gonna need it" and advocates against writing code that you don’t need at this particular moment. Since Address's equality comparison logic is not used by the production code, doesn’t it violate YAGNI?

Both are good questions. Indeed, it looks that the appropriate approach would be to extract the equality comparison out of the production code and keep it under the test code.

In reality, though, such comparison logic is not a code pollution, and it also doesn’t violate YAGNI. It would be a YAGNI violation if we didn’t need the Address class itself and then introduced it for any reason other than to implement a currently-existing business requirement.

But Address is already there; it’s part of the application’s domain model. And the fact that it is a value object and not an entity is also part of that domain model — it’s how this particular bounded context perceives the concept of addresses. In other words, the categorization into value objects and entities is an intrinsic property of any domain class. The addition of the equality comparison code to the Address class just finalizes that class as a value object.

It’s the same situation as with properties in domain classes. You should hide all such properties' setters from the client code by default, and only expose them once you have a convincing use case for why you should do so:

public string MyProperty { get; private set; }

Notice, though, that the version with the exposed setter is shorter:

public string MyProperty { get; set; }

Does it mean that putting additional code (the private modifier) violates YAGNI? After all, one could argue that you shouldn’t put in the modifier unless you already have an invariant to protect.

Of course not. The additional modifier removes functionality, not adds it. The fact that we need to write more code to remove that functionality is accidental and entirely due to how the framework’s authors decided to implement this feature.

Similarly, the introduction of the equality comparison code in Address doesn’t add functionality; it modifies the already existing behavior. We already can compare two Address instances — the default by-reference comparison logic is built into the framework. With the equality comparison code, we are changing that logic to reflect the nature of addresses in our application.

The accidentalness of the default comparison behavior becomes obvious if you look at other languages. For example, F# adds structural comparison code automatically, and thus treats all types as value objects by default. In F#, you’d have to write custom equality comparison code for entities, not value objects as in C#.

Summary

This article is an answer to a reader question. You can send me a question too (comments below also work). I read and reply to all questions. Sometimes, I also write in-depth responses with articles like this.

Let’s summarize:

When the test’s assertion section grows too large, it could be a sign of a missing abstraction (that a class in your domain model should be represented as a value object).
Such a value object requires custom equality comparison logic. The addition of such logic in order to satisfy unit testing seemingly violates YAGNI and can be categorized as code pollution.
In fact, such logic isn’t a code pollution and doesn’t violate YAGNI.