In Defense of Lazy Loading



In this article, we’ll talk about ORMs and lazy loading.

In Defense of Lazy Loading

I don’t know how this happened but for the last couple years (at least), whenever I read an author who writes about ORMs, I often see a sentiment like this: “ORMs are fine, just make sure you disable this pesky feature called Lazy Loading”.

It’s like this feature is not even needed and only brings confusion and performance issues to everyone who chooses to use it. Well, as you may guess from the title of this article, I disagree with this point of view completely.

But let’s take an example and review the common criticism first.

Let’s say we have the following domain model:

So, what happens when you need to display a student on a web page and you are not careful about it? With the lazy loading, you might get an N+1 problem: you first select a student into memory, then all their enrollments, then the courses of those enrollments and so on until you traverse the full object graph.

With a student who has 5 enrollments and 5 sports activities, you will get the total of 13 database roundtrips:

  • 1 for Student
  • 1 for Student.Enrollments
  • 5 for Course in each element of Student.Enrollments
  • 1 for Student.SportsActivities
  • 5 for Sports in each element of Student.SportsActivities

This is obviously terrible for performance. For a given page view, there should be only one database roundtrip. And so one way to make sure N+1 never happens to you is to disable the Lazy Loading entirely.

This will force you to load all related entities eagerly. If you forget to set up the eager load for, say, the Enrollments collection, you will get a null reference exception when referring to it in the view/controller. With the Lazy Loading disabled, no one will load this collection for you behind the scenes.

So, the problem is solved. Right?

As always, the reality is more nuanced than that. In fact, in some cases, the Lazy Loading provides a performance boost, not performance degradation.

To find out which one that is for your specific situation, you need to analyze the object graph traversal patterns. You need to see what percentage of the graph needs to be loaded in each particular use case, and how often that happens.

Let’s take an example. Let’s say that you have a use case of disenrolling a student from a course. Here’s the code (input validation and domain model encapsulation are omitted for brevity):

Which parts of the object graph get affected by this code? Only the student itself and its enrollments:

In Defense of Lazy Loading: disenrolling a student from a course

Disenrolling a student from a course

So, with the lazy loading, we have the total of 2 database roundtrips here. That’s more than we would get without the lazy loading (which would be just 1) but remember, here, we are transferring just enough data between the database and the application server to fulfill this particular request. Without the lazy loading, we would need to select everything about the student. And that includes the information from the Course, SportsActivity, and Sports tables which we don’t need in this use case.

To translate it into SQL, here’s what we’d get with the lazy loading:

And here’s what we would get without one:

So, in spite of doing only 1 database roundtrip, the option without the lazy loading transfers an excessive amount of data between the application and the database.

It’s hard to say which one is better for performance but it’s probably either a wash or the option without the lazy loading is only slightly ahead here. It’s definitely not as decisive of a situation as when choosing between 13 and 1 database roundtrips.

Let’s now take a more sophisticated scenario. Let’s say that you need to enroll a student into a sports but you can only do that if their total debt doesn’t exceed $10,000.

This is the code:

Once again, without the lazy loading, the full object graph with every single detail about it will be loaded: the student, their enrollments, courses, and sports activities.

However, with the lazy loading, the net balance will depend on how many users are getting rejected due to the dept restriction:

In Defense of Lazy Loading: Enrolling in a sports

Enrolling in a sports

If the situation when the students can afford sports activities is rare, then the performance will be better than it would without the lazy loading. In the vast majority of cases, there will be only one database call – to retrieve the student itself. However, the amount of data transferred from the database will be much less. Which by itself will provide a significant performance boost.

Separating read and write models

In both use cases I brought above, the option with lazy loading shows either the same or better performance than the option without it.

Can you see what is common between them? They both modify the application’s state. They are commands in the CQRS taxonomy.

And it turns out that it’s a common trait among all writes to the application. They don’t produce as many database roundtrips as reads, and with some of them, the code with the lazy loading results in even better performance than the code without one.

You could fine-tune your code without the lazy loading to retrieve only the data needed for a particular use case but that usually leads to a disaster in terms of code complexity. What you end up with is code like this:

When you turn the lazy loading on, the code becomes much simpler all of the sudden.

Alright, but what about reads? The criticism of the lazy loading is still valid there, right? We would still get those 13 database roundtrips just to display a single student.

That’s correct. But I rarely see this distinction drawn by other authors. And that’s a crucial distinction: the situation with the lazy loading is not the same for reads and writes in the application. It is harmful in reads. But it’s beneficial in writes (if you consider both performance and code simplicity).

And it’s very easy to have the best of the two worlds: just don’t use domain classes in the read model. Write SQL queries yourself, and materialize data directly to DTOs either manually or using a lightweight ORM like Dapper.

That’s, by the way, the essence of CQRS: the separation of read and write models. And as you can see, even a slight adherence to it will do the job. No need in explicit commands and queries, just a simple rule of not using domain classes and writing the SQL manually when selecting data for the UI.

Of course, your mileage may vary, and if you have a simple application, you can still use the ORM for both reads and writes. But the above approach is how I structure my projects by default: NHibernate with the lazy loading enabled in writes and handwritten SQL queries with Dapper in reads.

Summary

The criticism of the lazy loading lacks the nuance. Performance wise, it’s a balance between the amount of data you load upfront versus the number of database roundtrips you’ll need to perform later during the operation. Too many database roundtrips are not good but too large upfront costs are not good either.

Let’s summarize:

  • All criticism of the lazy loading boils down to performance issues and the N+1 problem
  • Lazy loading is beneficial in writes (in terms of performance and simplicity)
  • Lazy loading is harmful only in reads
  • The drawbacks of the lazy loading can be overcome by the adherence to CQRS: use lazy loading only in writes, handwrite SQL queries in reads

Related articles

Share




  • Luís Barbosa

    Hi Vladimir,

    one common approach to avoid multiple GetBy[Something] methods in the repositories, is the Specification Pattern. What do you think about it?

  • Normand Bédard

    Maybe your Student aggregate is too big? Looks like Enrollments and Sports Activities are two distinct models / graphs. Personnaly I try to avoid lazy loading as much as possible, and splitting aggregate typically helps reducing the problem of loading object graphs for no reason. However, it sometime implies to add eventual consistency between the new aggregates (maybe for the TotalDept in this case, that would be required for the Sport Activites, but “impacted” by the Enrollments)

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      It could very well be but it depends on the use cases. It’s mainly a trade-off between complexity and software performance/throughput (the larger the aggregates, the easier it is to work with the code but the less requests per second the software can handle). With smaller aggregates, the problems with the code that Lazy Loading helps tackle are not as big.

  • Joseph N. Musser II

    Interesting point. I have one more criticism of lazy-loading that I didn’t see mentioned: it blocks a thread for a potentially arbitrary amount of time rather than awaiting (where there is no thread), so it doesn’t scale as well.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      It was true for a long time (and still is for EF I believe). Not anymore, though. Here’s what the latest version of NHibernate allows you to do:

      List orders = await customer.Orders
      .AsQueryable()
      .Where(x => x.Amount > 10)
      .ToListAsync();

      More info: https://enterprisecraftsmanship.com/2017/12/11/nhibernate-async-support/

      • Joseph N. Musser II

        EF (and finally NHibernate) both have async eager loading, but this isn’t lazy-loading though. Lazy-loading blocks the thread by design since there’s no await in navigation property traversal.

      • Joseph N. Musser II

        This is more like eager loading than lazy loading, isn’t it? It’s not significantly different from:

        List orders = await dbContext.Orders
        .Where(x => x.CustomerId == customer.Id && x.Amount > 10)
        .ToListAsync();

        • http://enterprisecraftsmanship.com/ Vladimir Khorikov

          Good point. I think it depends on how this code would behave:

          List orders = await customer.Orders
          .AsQueryable()
          .ToListAsync();
          If the collection of orders stays in the memory and you can later access it synchronously without an additional load, then it’s the true lazy loading. If not, then it’s just a sugar syntax for the query you brought up. I’ll need to check on that. And now that I think of that it’s most likely the latter.

  • Yuriy

    I’m a bit confused by the example of disenrolling a student from a course without lazy loading. Do we really need selects in this case at all? Couldn’t it be as simple as

    DELETE FROM dbo.SportsActivity WEHRE StudentID = @StudentID

    The only transfer here would be an integer representing the number of deleted rows.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      This is an anemic domain model/transaction script and it works only to a certain degree – until the application is small enough. In complex applications, you usually want to have an rich/encapsulated domain model where such operations would go through an aggregate root (Student in this example).