Entity Identity vs Database Primary Key



Today, we’ll discuss the difference between identity in the DDD sense and database primary keys. We often mix the two together but are they really the same thing?

Entity Identity

In the context of DDD, identity is something inherent to an entity. Only entities have it; it’s something that uniquely identifies them among all other entities.

For example, we deem two people as being different regardless of any “attributes” they possess. It could be, by sheer coincidence, that they have the same name, age, address, etc. We still wouldn’t treat them as the same person because of that. Each human has an inherent identity that differentiates them from other people.

But what is this identity, exactly?

The thing is, you can’t pinpoint it. You cannot tell what comprises someone’s identity. And that’s by design. The reason why stems from the characteristics of the concept of identity: each identity must be immutable and globally unique.

That’s how we, humans, operate with objects we need to keep track of. We assign an intangible, unique label to each of them and then use that label to identify that object among all others.

That’s how you keep track of people too. A person could undertake a huge transformation (for example, grow up from a kid into a fully developed adult) and you still know that it’s the same individual. That’s because when you first meet that person, you assign to them a unique identity which doesn’t change even after the person himself transforms drastically.

Because of these two characteristics – immutability and uniqueness – you can’t really use any of the entity’s natural attributes as its identity. Those attributes tend to change over time. And they tend not to be unique either.

Want to use the person’s full name as their identity? Nope, names are not unique. Social security number (SSN)? Nope, those can change.

Of course, when it comes to domain modeling, you need to remember that any model is just that – a model. A simplified representation of the real world, sophisticated just enough to be useful. You don’t need to reflect the full complexity of the problem domain in your model.

And so, it could be that within some particular domain model, it’s just fine to use a person’s SSN (or email) as their identity. But those tend to be quite simple models. If you work on anything more or less complex, you would inevitably run into issues trying to reconcile a change in the supposedly immutable identity. And all complex systems experience such changes. Just consider a use case of incorrectly entered email. Modifying it should not alter who the person is.

So, in general, the use of natural attributes (attributes that come from the real world) is not a good way to represent an entity’s identity. It must be something intangible; something that you artificially create and assign to the entity and thus can ensure that it is both unique and immutable.

Database Primary Key

What about the database primary key? How is it related to the concept of identity?

Nohow. It is a completely separate concept, not related to DDD or domain modeling.

But here’s the thing. You need to persist your entities somehow. And not only persist but also restore them later and be able to keep track of their identity. Which means that the identity must not change during this persist-restore cycle.

And what database feature allows you to do that best? That’s right, primary keys.

It turns out that the database primary key is a good approximation for the concept of identity from Domain-Driven Design. Databases provide durability which allows you to ensure the identity doesn’t change after you persist it. And primary keys in particular help achieve uniqueness across all identities in your system. The row’s table name plus its primary key make for a great implementation of the concept of identity.

You might have heard about the guideline of never using a natural primary key when designing the database. Martin Fowler wrote about it in his Patterns of Enterprise Application Architecture. Now you know where this guideline comes from.

Natural primary keys are not a good fit for representing an identity precisely because they tend to change over time (if only by mistake) and might not be that unique after all. Surrogate primary keys deal with these issues much better. You can assign them once when creating a record and then keep immutable even after modifying such foundational properties as email or SSN. It’s also very easy to ensure their uniqueness as there are no external constraints on what those keys should look like.

Value Objects and Identity

Let’s also mention value objects here. Now, I hear you saying: don’t value object have no identity? That’s true but there’s another way to view this: the identity of a value object spans across all its attributes.

But isn’t there a contradiction? Didn’t I just tell you that natural properties don’t make for a good identity?

There’s no contradiction here because we don’t need to keep track of value objects. They are immutable. As soon as you need to introduce a change in a value object, you create a new one and replace the old one with it.

Because value objects are immutable, we automatically fulfill the first requirement of Identity: immutability.

What about uniqueness? How do we achieve it?

Well, remember that value objects are interchangeable. We can freely replace one with another as long as their attributes match. And there’s also another way to view this: if two value objects are interchangeable, it means they are the same. They have the same identity. Otherwise, you wouldn’t be able to use one in place of the other (just as you can’t use one John in place of another John unless they are the same person).

So, each value object is unique by definition. Which gives us the other component of Identity.

Therefore: Value Object == Its Properties == Identity

Entity Identity vs Database Primary Key: Summary

  • An entity’s Identity is intangible
  • Identity is immutable and globally unique
  • Don’t use natural attributes as the entity’s Identity. They change over time and they might not be unique
  • Database primary keys are a good approximation for the concept of Identity
  • Don’t use natural primary keys. Surrogate primary keys are the best fit for entities’ identities
  • The attributes of a Value Object are its own identity

Related

Entity vs Value Object: the ultimate list of differences

Share




  • Michael Hodgson

    What are your thoughts on using database generated id’s as domain identities (auto-increment), vs identities generated as part of the application (e.g. GUIDs or systems like twitter’s snowflake)?

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      I personally prefer HiLo (built-in into NHibernate, EF Core devs said EF should have it too by now). It’s cleaner as it doesn’t require a db roundtrip to get a new Id for an entity (well, it does, but only once every 10 or whatever number you configure it for, and the call is done independently from saving the object itself which is a good thing too). For simple scenarios, auto-increment is good too.

      I don’t quite like GUIDs. They are too clunky to work with (just in terms of manual manipulation) but might be necessary if you need to exclude the possibility of enumeration attacks. Raw GUIDs are also slower in high load scenarios as they require page splits and primary key rebalancing (need to use Sequential (comb) Guids to avoid that).