Link to an aggregate: reference or Id?

By Vladimir Khorikov

In this post, I write about 2 ways of representing a link to an aggregate.

A relationship between aggregates

When it comes to encoding a relationship that goes from one aggregate to another, you basically have two choices: either use the identifier of the latter or the aggregate class itself. So for example, if you have two classes – Order and Customer – you may depict a relation between them in the following way:

public class Order // Root of its own aggregate

{

    public int Id { get; private set; }

    public int CustomerId { get; private set; } // Reference to a customer

    public DateTime DateOrdered { get; private set; }

}

 

public class Customer // Root of its own aggregate

{

    public int Id { get; private set; }

    public string Name { get; private set; }

}

Here, the relation between Order and Customer is represented using the CustomerId property.

You can make a step further and apply the Identity pattern – depict each identifier with a custom type, like this:

public class CustomerId

{

    public int Id { get; private set; }

 

    protected CustomerId(int id)

    {

        Id = id;

    }

}

 

public class Order

{

    public int Id { get; private set; }

    public CustomerId CustomerId { get; private set; } // Reference to a customer

    public DateTime DateOrdered { get; private set; }

}

 

public class Customer

{

    public CustomerId Id { get; private set; }

    public string Name { get; private set; }

}

This gives a nice strongly-typed notation which is a good thing as it allows you to avoid primitive obsession. That is the approach Vaughn Vernon proposes in his Modeling Aggregates with DDD and Entity Framework article.

The other way of representing a connection from one entity to another is to keep a link to the whole aggregate:

public class Order

{

    public int Id { get; private set; }

    public Customer Customer { get; private set; }

    public DateTime DateOrdered { get; private set; }

}

 

public class Customer

{

    public int Id { get; private set; }

    public string Name { get; private set; }

}

In this version, the entities don’t contain any identifiers except their own Ids.

Link to an aggregate: reference or Id?

So, what approach to choose? Should entities have a reference to the aggregates they are related to or should they just keep an identifier instead? To answer this question, we need to refer to the principle of Persistence Ignorance.

Persistence Ignorance holds that you should separate your domain model from the underlying persistence storage. Classes that you use for modeling your business domain shouldn’t be impacted by how they are stored in the database. Ideally, their design should be as close as possible to the design needed to solve the problem at hand, without taking into consideration persistence concerns.

The best way to check whether or not you follow this principle is to compare your current design with the design you might have ended up with in the case when saving and retrieval of your domain objects is not an issue. So basically just think about how you would solve the exact same problem should you forgo the persistence concerns altogether.

Would you still use identifiers? Probably not. Indeed, why would you if all the objects you work with reside in the memory?

And that is exactly the reason why you should prefer using a reference to the related entity itself instead of its identifier by default. Ids are a leaking abstraction which we must deal with because of the necessity to persist entities in a backing store. A direct link to an entity from another entity represents a relationship between them is the exact way we would be representing it in an OOP language with no database involved, and that is something we should aim for if we want to create a highly isolated, persistence ignorant domain model. Fortunately, with such ORMs as NHibernate and Entity Framework, it’s really easy to implement, thanks to lazy loading.

As a general guideline, try to limit the use of identifiers in your domain model to the bare minimum. Ideally, all entities should have only a single Id – the one that belongs to themselves. And you can extract it out to the base class so that your domain entities don’t have to deal with it at all. If you need to portray a relationship between two entities – just use a direct link from one to the other.

Persistence Ignorance is not the only reason why you should refrain from using identifiers. In some cases, their usage also violates encapsulation. Instead of writing code like this:

if (order1.Id == order2.Id)

{

    /* … */

}

Compare the objects using the Equals method:

if (order1.Equals(order2))

{

    /* … */

}

Or, even better, using the equals operator:

if (order1 == order2)

{

    /* … */

}

Of course, you will need to define equality members for your entities in order to be able to do that.

A counter-argument I sometimes hear in favor of using Ids instead of references is that this way, it would be easier to work with entities in a detached mode. So basically, if you have all your related entities lazy-loaded, it would be harder to send them over the wire, for example as part of a domain event.

This argument, however, misses the point of having an isolated domain model. Domain entities are not something you should send via a message bus or display on the screen. They shouldn’t leave the boundaries of the model they belong to. If you do need to transfer the information about them outside of the model, that information should be represented using primitive types or DTOs.

Note that if you don’t use a big ORM and rely on hand-written SQL instead (or maybe use a micro-ORM like Dapper or employ a NoSQL DB), there’s no way around using Ids and you will have to introduce them to your classes. The guideline I described above is only applicable when your ORM supports lazy loading which gives a nice abstraction layer over the backing SQL store.

Update: Ids which are described in this article are surrogate Ids: artificial keys you assign to your entities in order to make them unique. There also is such concept as natural Ids which is an inherent part of the domain model. Natural Ids are not a leaky abstraction, of course. Also, the guideline regarding refraining from using Ids is applicable only when you model your domain using Entities, Value Objects and Aggregates. When you work with other concepts, such as Repositories, or when you write code for outer layers, such as Application Services layer, Ids is the only way you can reference your entities and aggregates.

Summary

We are used to identifiers so much that we don’t see any harm in using them. However, when employed in a domain model, Ids represent a leaking abstraction and it’s a good idea to keep in mind that the only reason why we use them is because we need to persist our domain model in a database. In a pure OOP world, objects don’t relate to each other with identifiers, they just keep a direct reference to the related instance.

Try to represent a relationship between aggregates in a pure way whenever possible; minimize the use of Ids to the bare minimum.

Related articles:

LinkedInRedditTumblrBufferPocketShare




  • John Smith

    In my opinion, references are also leaking abstractions. And it is more error proud because it is difficult to see problems like “select N + 1”.

    In the _real world_ an `order` is related to a `customer` in an indirect way, his name (or even his code) is in the `order`.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      You are right, we need to take into account the database concerns anyway. However, I don’t think references are more error-prone in that sense. Such issues as N+1 problem can be handled by repositories; domain entities themselves shouldn’t be responsible for dealing with them.

      • John Smith

        When you have lazy load you are opening the door for N+1. If repository handles this problem, there should not be possible to lazy load outside the repository.

        So, depending the necessity, your repository could return orders with Customer and orders without Customers (partially initialised entities)

        • http://enterprisecraftsmanship.com/ Vladimir Khorikov

          So, depending the necessity, your repository could return orders with Customer and orders without Customers (partially initialised entities)

          You don’t have to do that in order to solve the problem. Your repository just needs to have a method that returns all Orders with their Customers pre-fetched, no need to fall down to partially initialised entities here.

          So basically you need to choose which fetching strategy you should use in each particular situation. If a business operation requires a batch order processing, then load everything in one shot using pre-fetching. Something like this:


          public IReadOnlyList GetAll()
          {
          return _session.Query()
          .Fetch(x => x.Customer)
          .ToList();
          }

          If you need to update a single order, then rely on lazy load and let the ORM make 2 requests to the database: one for the order itself, and the other one for its related customer.

          Using lazy load doesn’t mean we fall into partially initialized entities anti-pattern because the lazy load either loads everything or doesn’t load anything at all.

  • Naeem Sarfraz

    I get the point about Persistence Ignorance but can’t get past the fact that if I am hydrating the Order aggregate I need to also hydrate the Customer aggregate? Using the Ids approach I could get a Customer only if I needed it.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Not at all, with lazy loading you basically combine the benefits of the both approaches: the corresponding Customer is loaded only when you need it. By default, only CustomerId is loaded along with the Order aggregate (you can access it by order.Customer.Id, and that wouldn’t trigger fetching of the full customer aggregate).

      Of course, this is applicable only if you use an ORM that supports lazy loading. There’s no way around using Ids in other cases.

      • Naeem Sarfraz

        Yes I see. Are we not running the risk of having many partially initialised entities? A topic you wrote about recently.

        • http://enterprisecraftsmanship.com/ Vladimir Khorikov

          A good part about lazy loading is that is works as “all or nothing” mechanism. You either load the full entity or you don’t load it at all, so you don’t get partially initialized entities with this approach.

          • http://mdbs99.com Marcos Douglas Santos

            if we are using lazy loading, we can say that we are not using immutable objects?

          • http://enterprisecraftsmanship.com/ Vladimir Khorikov

            Technically, objects do change internally when they are being loaded lazily, but that change cannot be observed from the outside. So, if a class itself is immutable, we can say that even if its objects are loaded lazily, they are still immutable for us. This kind of immutability is called observable immutability.

            Jon Skeet gave a nice talk about different kinds of immutability here:

  • http://alexzaytsev.me hazzik

    Identifiers are not leaky abstractions. There are two types of identifiers: natural and surrogate ids. Natural ID is an ID used to identify your entities from outside world. And surrogate – to identify your entities internally. Often people tend to use surrogate IDs as natural IDs and expose them to the public.

    What are you talking about here are surrogate keys. Yes, they can be omitted in the domain model, and your points become valid. But it does not make them a leaky abstraction.

    So, I do use IDs without databases-to reference my entities from the outside.

    Also, it is useful to use IDs in a context when domain is so huge, so it sits in separate assemblies. Or when we use very slim domains. Or in cross-cutting or modular applications.

    For example any applications with users.

    You will not write `user.Orders` to get all orders of a user. Because if you do this, the User class would be floated with irrelevant properties, and worse-it will be highly coupled with low cohesion.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Good points. Indeed, I was talking about surrogate Ids specifically, natural Ids are part of the domain itself. Also, the article is applicable only for the core part of the domain layer. When you leave the boundaries of that part, for example, when you load an entity in an Application Services layer (controller in ASP.NET MVC, View Model in WPF, etc.), Ids is the only way you can refer to it.

      However, when you work with entities, value objects and aggregates, you shouldn’t operate surrogate Ids at all.

  • Alexander Yevseyev

    @vladimirkhorikov:disqus,
    The topic you describe is very important. All DDD speakers say that we should have a separate domain model, that it should be indifferential to storage persistence. Anyway as you have mentioned it is not always possible.
    But some thing is somewhat obscure to me. If we use ORMs with lazy loading then we should apply it directly on our domain model and add ORMs attibutes to classes (like in EF, NHibernate) or other mapping configuration ways.
    But as you say our domain model doesn’t always correspond to DB-model (and shouldn’t). That way we need to separate in code our domain and DB models. We reference our DAL to Domain assemblies and implement repositories to fill in domain objects using ORM with it’s DB-objects (that correspond to table). We are not leaking ORM to the domain layer but what’s the deal with lazy loading in that case? We need to construct complete domain entites, not “partially-initialized” ones, so we need to load all data at once for our aggregates. Lazy loading wont help. Or am I incorrect?

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Hi Alexander,

      The main problem that lazy loading solves is that when your entities are connected to each other directly (not by Ids), the depth of the object graph you need to load would differ for each business transaction. For example, if an operation that promotes a user needs to know the user’s company email, you will need to load both the user and the company instances at the same time. For another operation, such as user update, you don’t need to know his company because let’s say you need to only update the user’s name, so you load just the user itself. Such differences lead to a situation where your repositories need to know the depth of the object graph upfront, and that, in turn, usually leads to creation of several very similar methods in the repository, like LoadById(), LoadByIdWithCompany(), etc. With lazy loading, you don’t have to do that, because all related entities are being loaded just in time, when you actually need them.

      Lazy loading don’t lead to partially initialized entities because while the entities technically aren’t loaded fully, you can’t observe that – whenever you access a property of such an entity, it acts as if it was already in memory.

      You touched upon a very interesting topic: whether or not to create a separate DB-model (DAL) when working with an ORM. It was in my backlog for a while and I’m going to write a separate post on this subject in the near future. In short, I personally think that you shouldn’t separate the domain model from the DAL. An ORM, such as NH or EF, already gives us a nice abstraction layer over the DB structure (NH does a better job here, admittedly), so the benefits you get from further isolating your business logic from the DB concerns doesn’t pay off in that it requires quite a lot of work and at the same time doesn’t provide a lot of benefits.

      It would be nice to hear why you decided to make such a decision. Did your ORM not fulfil some of the requirements you had? What was the ORM, by the way?

      • Alexander Yevseyev

        Our current project is conserning engineering with lots of calculations. So we have a would say “fat” entities with many necessary value fields and in order to make calcs we usually need most of them. Of course there are connections between entities and loading the whole object graph is expensive. So we have gradually moved to using the combination of NHibernate and Dapper. The former is used in complex user actions and the latter is used in conjunction with stored procedures for data showcases. SPs allow us to have faster queries and Dapper transforms it to DTOs which actually correspond to ViewModels.
        So my thought was to move completely to micro-ORM (Dapper, SqlFu) for DAL where I can fill Domain Aggregates for commands. Anyway, as DDD states – domain model doesn’t always correspond to Persistence model and Table structure. And in our case it is so.

        • http://enterprisecraftsmanship.com/ Vladimir Khorikov

          Ah, I see. So right now you are using NHibernate for the write model and Dapper for the read model. That’s BTW is a quite popular combination, and powerful too.

          I don’t think you will benefit from transferring your write model to a micro-ORM. There is a trade-off between performance and ease of programming against your database, and unless the performance requirements in your system are really tough, I would recommend you to choose the latter by default. A micro-ORM (or hand-written SQL) gives a lot of benefits in terms of performance, but that usually goes against ease of use. Also, the benefits a micro-ORM provides are usually not so big in write models comparing to read ones.

          I’m not familiar with your code base, so that might not apply to you but with a micro-ORM, people often end up writing a lot of functionality that already exists in NHibernate by their own. I personally find that the mapping capabilities of NH cover most of the needs, even if the DB structure and the domain model do not correspond. But again, that might not be the case in your situation.

          Here I wrote on this topic in more detail: http://enterprisecraftsmanship.com/2015/11/30/do-you-need-an-orm/

          • Alexander Yevseyev

            Yes, it’s kind of CQRS we have. I don’t complety want to rewrite our “write” model using micro-ORM.
            Returning to my original question, lets assume I have an IRepository declared in domain and domain logic classes use it to retreive entity (aggegates). The Repository (inheriting IRepository) is declared in DAL and uses it’s own persistence model to fill in the domain entity (aggregate). Will lazy loading help? Or will the calls from domain logic accessing aggregate sub-entities propagate call to DAL to lazily load it?

          • http://enterprisecraftsmanship.com/ Vladimir Khorikov

            In this case, the ORM will handle the calls to the aggregate’s sub-entities and will load them the first time you try to access them. You don’t need to do it yourself, so the DAL becomes much simpler because of that.

            Not sure if this answers your question, though. Let me know if I understood it incorrectly.

          • Alexander Yevseyev

            Thank you. I think that’s exactly the answer I expected to get.

  • Venkat Raj

    What do you mean by :

    Of course, you will need to define equality members for your entities in order to be able to do that.

    Please expand.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      I mean that in order for this code to work

      if (order1 == order2)

      you need to define a custom “operator==” method because otherwise the “==” comparison will lead to comparing the objects’ references. I wrote about how to do that here: http://enterprisecraftsmanship.com/2014/11/08/domain-object-base-class/