Partially initialized entities anti-pattern

The topic described in this article is a part of my Domain-Driven Design in Practice Pluralsight course.

In this post, I’d like to talk about the partially initialized entities anti-pattern: anti-pattern that is often used in conjunction with repositories.

Partially initialized entities

Partially initialized entities are entities which are not fully constructed and returned as a result of some operation, usually an operation of fetching them from the database.

Let’s take an example. Let’s say we have the following entity in our domain model:

public class Customer
{
    public int Id { get; private set; }
    public string Name { get; private set; }
    public IReadOnlyList<ContactPerson> Contacts { get; private set; } // <= 5 of them
 
    public void AddContact(ContactPerson contact) { /* ... */ }
    /* Other methods */
}

This class has an invariant stating that no customer can have more than 5 contacts attached.

Let’s also say there are three scenarios in our application. Each of them processes a number of customers but requires a different set of data from them. The first one needs all data associated with the customers, the second scenario requires all data except contacts, and the third one works with their identifiers only.

Because the requirements for the 3 scenarios differ, we decide to create 3 separate methods in our repository, like this:

public class CustomerRepository
{
    public IReadOnlyList<Customer> GetAll()
    {
        /* Returns a list of fully initialized customers */
    }
 
    public IReadOnlyList<Customer> GetAllWithoutContacts()
    {
        /* Returns a list of customers without contacts */
    }
 
    public IReadOnlyList<Customer> GetOnlyIds()
    {
        /* Returns a list of customers with identifiers only */
    }
}

This allows us to improve the performance of the last two methods (GetAllWithoutContacts and GetOnlyIds) by reducing the amount of data retrieved from the database.

This approach is perfectly justified: every scenario gets only the bare minimum of the data it works with, nothing more. The problem, however, is that the repository now returns partially initialized entities. The issue with such entities is that we, as software developers, are no longer able to ensure their invariants.

As we stated earlier, each customer must have no more than 5 contacts. By not returning the contacts along with the customers themselves, we leave a hole in our domain model which allows us to add a 6th contact and thus break this invariant.

Because of that, the practice of partial initialization should be avoided. If your repository returns a list of domain entities (or just a single domain entity), make sure the entities are fully initialized meaning that all their properties are filled out.

Performance gains without partially initialized entities

But what should we do in case we really need the performance benefits partial initialization provides?

The solution is simple: don’t use domain entities as returning objects in such situations. Instead, use a DTO class for GetAllWithoutContacts and the integer type for the GetOnlyIds method:

public class CustomerRepository
{
    public IReadOnlyList<Customer> GetAll()
    {
        /* Returns a list of fully initialized customers */
    }
 
    public IReadOnlyList<CustomerDto> GetAllWithoutContacts()
    {
        /* Returns a list of DTOs */
    }
 
    public IReadOnlyList<int> GetOnlyIds()
    {
        /* Returns a list of identifiers */
    }
}
 
public class CustomerDto
{
    public int Id { get; private set; }
    public string Name { get; private set; }
}

The benefit here is twofold. First and foremost, we avoid the use of partially initialized entities and thus are able to preserve their invariants. And second, we explicitly show what data the methods return. The users of these methods no longer need to dive into the implementation details in order to find out what data are actually attached to returning objects. It now becomes clear just by looking at their signatures.

Partially initialized entities and factories

While this anti-pattern is mostly used in conjunction with repositories, it doesn’t belong to repositories exclusively. Another use case for them is factories.

Factory is a well-known design pattern that removes the responsibility to create a domain entity from the entity itself. It helps simplify them in case the creation logic is rather complex.

The main guideline for working with factories is that they should always return a fully-fledged entity with all its invariants fulfilled. For example, if there is a business rule stating that the Customer class must have at least one contact, the factory should fill it out. It’s unacceptable to leave any of the invariants broken with the hope that the client code will complete the job. And of course, if some piece of data isn’t part of an entity’s invariant, the factory can very well skip it.

I guess you can see a pattern here. All classes in your domain model should reside in a valid state during the full length of their lifetimes.

Summary

Partial initialization leads to inability to maintain invariants of the entities. It deceives the users of the APIs (first and foremost, yourself) that return such entities and may also lead to inconsistencies where the client code relies on the data that wasn’t returned with the domain objects.

Because of that, partial initialization should be avoided. Whenever you return a domain entity as a result of an operation, make sure it is fully initialized.

Subscribe


I don't post everything on my blog. Don't miss smaller tips and updates. Sign up to my mailing list below.

Comments


comments powered by Disqus