In Defense of Lazy Loading

In this article, we’ll talk about ORMs and lazy loading.

In Defense of Lazy Loading

I don’t know how this happened but for the last couple years (at least), whenever I read an author who writes about ORMs, I often see a sentiment like this: "ORMs are fine, just make sure you disable this pesky feature called Lazy Loading".

It’s like this feature is not even needed and only brings confusion and performance issues to everyone who chooses to use it. Well, as you may guess from the title of this article, I disagree with this point of view completely.

But let’s take an example and review the common criticism first.

Let’s say we have the following domain model:

public class Student : Entity
{
    public virtual string Name { get; set; }
    public virtual decimal TotalDebt { get; set; }

    public virtual IList<Enrollment> Enrollments { get; set; }
    public virtual IList<SportsActivity> SportsActivities { get; set; }
}

public class Enrollment : Entity
{
    public virtual Student Student { get; set; }
    public virtual Course Course { get; set; }
    public virtual Grade Grade { get; set; }
}

public enum Grade
{
    A = 1, B = 2, C = 3, D = 4, F = 5
}

public class Course : Entity
{
    public virtual string Name { get; set; }
    public virtual int Credits { get; set; }
}

public class SportsActivity : Entity
{
    public virtual Sports Sports { get; set; }
    public virtual DateTime PlayingSince { get; set; }
}

public class Sports : Entity
{
    public virtual string Name { get; set; }
}

So, what happens when you need to display a student on a web page and you are not careful about it? With the lazy loading, you might get an N+1 problem: you first select a student into memory, then all their enrollments, then the courses of those enrollments and so on until you traverse the full object graph.

With a student who has 5 enrollments and 5 sports activities, you will get the total of 13 database roundtrips:

  • 1 for Student

  • 1 for Student.Enrollments

  • 5 for Course in each element of Student.Enrollments

  • 1 for Student.SportsActivities

  • 5 for Sports in each element of Student.SportsActivities

This is obviously terrible for performance. For a given page view, there should be only one database roundtrip. And so one way to make sure N+1 never happens to you is to disable the Lazy Loading entirely.

This will force you to load all related entities eagerly. If you forget to set up the eager load for, say, the Enrollments collection, you will get a null reference exception when referring to it in the view/controller. With the Lazy Loading disabled, no one will load this collection for you behind the scenes.

So, the problem is solved. Right?

As always, the reality is more nuanced than that. In fact, in some cases, the Lazy Loading provides a performance boost, not performance degradation.

To find out which one that is for your specific situation, you need to analyze the object graph traversal patterns. You need to see what percentage of the graph needs to be loaded in each particular use case, and how often that happens.

Let’s take an example. Let’s say that you have a use case of disenrolling a student from a course. Here’s the code (input validation and domain model encapsulation are omitted for brevity):

public IActionResult Disenroll(long studentId, int enrollmentNumber)
{
    Student student = _studentRepository.GetById(studentId);
    Enrollment enrollment = student.Enrollments[enrollmentNumber];
    student.Enrollments.Remove(enrollment);

    return Ok();
}

Which parts of the object graph get affected by this code? Only the student itself and its enrollments:

Disenrolling a student from a course
Disenrolling a student from a course

So, with the lazy loading, we have the total of 2 database roundtrips here. That’s more than we would get without the lazy loading (which would be just 1) but remember, here, we are transferring just enough data between the database and the application server to fulfill this particular request. Without the lazy loading, we would need to select everything about the student. And that includes the information from the Course, SportsActivity, and Sports tables which we don’t need in this use case.

To translate it into SQL, here’s what we’d get with the lazy loading:

-- Roundtrip 1
SELECT *
FROM dbo.Student s
WHERE s.StudentID = @StudentID

-- Roundtrip 2
SELECT *
FROM dbo.Enrollment e
WHERE e.StudentID = @StudentID

And here’s what we would get without one:

-- Roundtrip 1
SELECT *
FROM dbo.Student s
INNER JOIN dbo.Enrollment e ON s.StudentID = e.StudentID
INNER JOIN dbo.Course c ON e.CourseID = c.CourseID
WHERE s.StudentID = @StudentID

SELECT *
FROM dbo.SportsActivity a
INNER JOIN dbo.Sports s ON a.SportsID = s.SportsID
WHERE a.StudentID = @StudentID

So, in spite of doing only 1 database roundtrip, the option without the lazy loading transfers an excessive amount of data between the application and the database.

It’s hard to say which one is better for performance but it’s probably either a wash or the option without the lazy loading is only slightly ahead here. It’s definitely not as decisive of a situation as when choosing between 13 and 1 database roundtrips.

Let’s now take a more sophisticated scenario. Let’s say that you need to enroll a student into a sports but you can only do that if their total debt doesn’t exceed $10,000.

This is the code:

public IActionResult Enroll(long studentId, string sportsName)
{
    Sports sports = _sportsRepository.GetByName(sportsName);
    Student student = _studentRepository.GetById(studentId);

    if (student.TotalDebt > 10000M)
        return Error($"Cannot enroll into {sportsName}, too much debt");

    student.SportsActivities.Add(new SportsActivity
    {
        PlayingSince = DateTime.Now,
        Sports = sports
    });

    return Ok();
}

Once again, without the lazy loading, the full object graph with every single detail about it will be loaded: the student, their enrollments, courses, and sports activities.

However, with the lazy loading, the net balance will depend on how many users are getting rejected due to the dept restriction:

Enrolling in a sports
Enrolling in a sports

If the situation when the students can afford sports activities is rare, then the performance will be better than it would without the lazy loading. In the vast majority of cases, there will be only one database call - to retrieve the student itself. However, the amount of data transferred from the database will be much less. Which by itself will provide a significant performance boost.

Separating read and write models

In both use cases I brought above, the option with lazy loading shows either the same or better performance than the option without it.

Can you see what is common between them? They both modify the application’s state. They are commands in the CQRS taxonomy.

And it turns out that it’s a common trait among all writes to the application. They don’t produce as many database roundtrips as reads, and with some of them, the code with the lazy loading results in even better performance than the code without one.

You could fine-tune your code without the lazy loading to retrieve only the data needed for a particular use case but that usually leads to a disaster in terms of code complexity. What you end up with is code like this:

public sealed class StudentRepository
{
    public Student GetByIdWithEnrollments(long studentId)
    {
        /* Load only enrollments */
    }

    public Student GetByIdWithSportActivities(long studentId)
    {
        /* Load only sport activities */
    }

    public Student GetByIdWithEnrollmentsAndSportActivities(long studentId)
    {
        /* Load everything */
    }
}

When you turn the lazy loading on, the code becomes much simpler all of the sudden.

Alright, but what about reads? The criticism of the lazy loading is still valid there, right? We would still get those 13 database roundtrips just to display a single student.

That’s correct. But I rarely see this distinction drawn by other authors. And that’s a crucial distinction: the situation with the lazy loading is not the same for reads and writes in the application. It is harmful in reads. But it’s beneficial in writes (if you consider both performance and code simplicity).

And it’s very easy to have the best of the two worlds: just don’t use domain classes in the read model. Write SQL queries yourself, and materialize data directly to DTOs either manually or using a lightweight ORM like Dapper.

That’s, by the way, the essence of CQRS: the separation of read and write models. And as you can see, even a slight adherence to it will do the job. No need in explicit commands and queries, just a simple rule of not using domain classes and writing the SQL manually when selecting data for the UI.

Of course, your mileage may vary, and if you have a simple application, you can still use the ORM for both reads and writes. But the above approach is how I structure my projects by default: NHibernate with the lazy loading enabled in writes and handwritten SQL queries with Dapper in reads.

Summary

The criticism of the lazy loading lacks the nuance. Performance wise, it’s a balance between the amount of data you load upfront versus the number of database roundtrips you’ll need to perform later during the operation. Too many database roundtrips are not good but too large upfront costs are not good either.

Let’s summarize:

  • All criticism of the lazy loading boils down to performance issues and the N+1 problem

  • Lazy loading is beneficial in writes (in terms of performance and simplicity)

  • Lazy loading is harmful only in reads

  • The drawbacks of the lazy loading can be overcome by the adherence to CQRS: use lazy loading only in writes, handwrite SQL queries in reads

Subscribe


I don't post everything on my blog. Don't miss smaller tips and updates. Sign up to my mailing list below.

Comments


comments powered by Disqus