Link to an aggregate: reference or Id?

March 8, 2016

In this post, I write about 2 ways of representing a link to an aggregate.

A relationship between aggregates

When it comes to encoding a relationship that goes from one aggregate to another, you basically have two choices: either use the identifier of the latter or the aggregate class itself. So for example, if you have two classes - Order and Customer - you may depict a relation between them in the following way:

public class Order // Root of its own aggregate
{
    public int Id { get; private set; }
    public int CustomerId { get; private set; } // Reference to a customer
    public DateTime DateOrdered { get; private set; }
}
 
public class Customer // Root of its own aggregate
{
    public int Id { get; private set; }
    public string Name { get; private set; }
}

Here, the relation between Order and Customer is represented using the CustomerId property.

You can make a step further and apply the Identity pattern - depict each identifier with a custom type, like this:

public class CustomerId
{
    public int Id { get; private set; }
 
    protected CustomerId(int id)
    {
        Id = id;
    }
}
 
public class Order
{
    public int Id { get; private set; }
    public CustomerId CustomerId { get; private set; } // Reference to a customer
    public DateTime DateOrdered { get; private set; }
}
 
public class Customer
{
    public CustomerId Id { get; private set; }
    public string Name { get; private set; }
}

This gives a nice strongly-typed notation which is a good thing as it allows you to avoid primitive obsession. That is the approach Vaughn Vernon proposes in his Modeling Aggregates with DDD and Entity Framework article.

The other way of representing a connection from one entity to another is to keep a link to the whole aggregate:

public class Order
{
    public int Id { get; private set; }
    public Customer Customer { get; private set; }
    public DateTime DateOrdered { get; private set; }
}
 
public class Customer
{
    public int Id { get; private set; }
    public string Name { get; private set; }
}

In this version, the entities don’t contain any identifiers except their own Ids.

Link to an aggregate: reference or Id?

So, what approach to choose? Should entities have a reference to the aggregates they are related to or should they just keep an identifier instead? To answer this question, we need to refer to the principle of Persistence Ignorance.

Persistence Ignorance holds that you should separate your domain model from the underlying persistence storage. Classes that you use for modeling your business domain shouldn’t be impacted by how they are stored in the database. Ideally, their design should be as close as possible to the design needed to solve the problem at hand, without taking into consideration persistence concerns.

The best way to check whether or not you follow this principle is to compare your current design with the design you might have ended up with in the case when saving and retrieval of your domain objects is not an issue. So basically just think about how you would solve the exact same problem should you forgo the persistence concerns altogether.

Would you still use identifiers? Probably not. Indeed, why would you if all the objects you work with reside in the memory?

And that is exactly the reason why you should prefer using a reference to the related entity itself instead of its identifier by default. Ids are a leaking abstraction which we must deal with because of the necessity to persist entities in a backing store. A direct link to an entity from another entity represents a relationship between them is the exact way we would be representing it in an OOP language with no database involved, and that is something we should aim for if we want to create a highly isolated, persistence ignorant domain model. Fortunately, with such ORMs as NHibernate and Entity Framework, it’s really easy to implement, thanks to lazy loading.

As a general guideline, try to limit the use of identifiers in your domain model to the bare minimum. Ideally, all entities should have only a single Id - the one that belongs to themselves. And you can extract it out to the base class so that your domain entities don’t have to deal with it at all. If you need to portray a relationship between two entities - just use a direct link from one to the other.

Persistence Ignorance is not the only reason why you should refrain from using identifiers. In some cases, their usage also violates encapsulation. Instead of writing code like this:

if (order1.Id == order2.Id)
{
    /* ... */
}

Compare the objects using the Equals method:

if (order1.Equals(order2))
{
    /* ... */
}

Or, even better, using the equals operator:

if (order1 == order2)
{
    /* ... */
}

Of course, you will need to define equality members for your entities in order to be able to do that.

A counter-argument I sometimes hear in favor of using Ids instead of references is that this way, it would be easier to work with entities in a detached mode. So basically, if you have all your related entities lazy-loaded, it would be harder to send them over the wire, for example as part of a domain event.

This argument, however, misses the point of having an isolated domain model. Domain entities are not something you should send via a message bus or display on the screen. They shouldn’t leave the boundaries of the model they belong to. If you do need to transfer the information about them outside of the model, that information should be represented using primitive types or DTOs.

Note that if you don’t use a big ORM and rely on hand-written SQL instead (or maybe use a micro-ORM like Dapper or employ a NoSQL DB), there’s no way around using Ids and you will have to introduce them to your classes. The guideline I described above is only applicable when your ORM supports lazy loading which gives a nice abstraction layer over the backing SQL store.

Update: Ids which are described in this article are surrogate Ids: artificial keys you assign to your entities in order to make them unique. There also is such concept as natural Ids which is an inherent part of the domain model. Natural Ids are not a leaky abstraction, of course. Also, the guideline regarding refraining from using Ids is applicable only when you model your domain using Entities, Value Objects and Aggregates. When you work with other concepts, such as Repositories, or when you write code for outer layers, such as Application Services layer, Ids is the only way you can reference your entities and aggregates.

Summary

We are used to identifiers so much that we don’t see any harm in using them. However, when employed in a domain model, Ids represent a leaking abstraction and it’s a good idea to keep in mind that the only reason why we use them is because we need to persist our domain model in a database. In a pure OOP world, objects don’t relate to each other with identifiers, they just keep a direct reference to the related instance.

Try to represent a relationship between aggregates in a pure way whenever possible; minimize the use of Ids to the bare minimum.