Reference data as code

March 17, 2016

In this article, I’d like to write about a powerful technique that can potentially save you a lot of work and make your code much more concise: representing reference data as code.

Referring to reference data in code: the classic way

Reference data is any data that is required for your application to run properly. It might be a list of country codes, a list of user types, and so on. Anything your code base can’t live without. Because of that, the reference data your software relies on should be treated the same way as the database schema, and every change should be reflected in migration scripts. Check out this post for more details.

Let’s take an example. Let’s say that there’s an industry entity in our domain model with the following simple structure:

public class Industry : Entity
{
    public string Name { get; private set; }
}

Let’s also say that it’s part of the reference data our application relies upon, and a set of predefined industries should exist in the database. A common way to define those industries is to use constants, like this:

public class Industry : Entity
{
    public const string CarsIndustry = "Cars";
    public const string PharmacyIndustry = "Pharmacy";
    public const string MediaIndustry = "Media";
 
    public string Name { get; private set; }
}

Now, if we have a customer that works in one of those industries, we can represent it with the following code:

public class Customer : Entity
{
    public Industry Industry { get; private set; }
    public EmailCampaign EmailCampaign { get; private set; }
 
    public Customer(Industry industry)
    {
        UpdateIndustry(industry);
    }
 
    public void UpdateIndustry(Industry industry)
    {
        EmailCampaign = GetEmailCampaign(industry);
        Industry = industry;
    }
 
    private EmailCampaignGetEmailCampaign(Industry industry)
    {
        if (industry.Name == Industry.CarsIndustry)
            return EmailCampaign.LatestCarModels;
 
        if (industry.Name == Industry.PharmacyIndustry)
            return EmailCampaign.PharmacyTips;
 
        if (industry.Name == Industry.MediaIndustry)
            return EmailCampaign.MediaNews;
 
        throw new ArgumentException();
    }
}
 
public enum EmailCampaign
{
    LatestCarModels,
    PharmacyTips,
    MediaNews
}

As you can see, the customer also contains an EmailCampaign parameter which depends on the type of industry the customer is assigned to. For example, if it’s the cars industry, the customer will get a newsletter about the latest cars models. If it’s pharmacy - pharmacy tips, and so on. Whenever we update the customer’s industry, the email campaign should be updated as well.

Representing reference data as code

There are several issues with this classic way of working with reference data. The first and most important one is that we damage the cohesion between the name of the industry and the industry itself. Look at this code again:

public class Industry : Entity
{
    public const string CarsIndustry = "Cars";
    public const string PharmacyIndustry = "Pharmacy";
    public const string MediaIndustry = "Media";
 
    public string Name { get; private set; }
}

We already have the Name property defined in the industry class. Specifying separate name constants means we distribute the same responsibility between two places in the code.

Another drawback here is that even though the list of industries doesn’t change while the application is running, we are forced to fetch them from the database whenever we need one. Here’s an example of creating a customer in the pharmacy industry:

public class CustomerController
{
    public void CreatePharmacy(string name)
    {
        Industry industry = _industryRepository.GetByName(Industry.PharmacyIndustry);
        var customer = new Customer(industry);
        _customerRepository.Save(customer);
    }
}

It would be much better to just keep the list of industries in the memory instead of referring to the data store all the time.

We can solve both of these issues by representing reference data as code. As such data can’t change at runtime, we can rely on the existence of those industries in the database and define them explicitly in our code base. Here’s how we can do that:

public class Industry : Entity
{
    public static readonly IndustryCars = new Industry(1, "Cars");
    public static readonly IndustryPharmacy = new Industry(2, "Pharmacy");
    public static readonly IndustryMedia = new Industry(3, "Media");
 
    public string Name { get; private set; }
 
    private Industry(int id, string name)
    {
        Id = id;
        Name = name;
    }
}

As you can see, the constants were removed and static read-only fields were added instead. Note that the constructor is made private, so there’s no way to create a new industry, which is a good thing as they are not something that can be added programmatically. All possible combinations are predefined and ready to be used by the client code.

Here’s how the Customer class would look like after that:

public class Customer : Entity
{
    public Industry Industry { get; private set; }
    public EmailCampaign EmailCampaign { get; private set; }
 
    public void UpdateIndustry(Industry industry)
    {
        EmailCampaign = GetEmailCampaign(industry);
        Industry = industry;
    }
 
    private EmailCampaignGetEmailCampaign(Industry industry)
    {
        if (industry == Industry.Cars)
            return EmailCampaign.LatestCarModels;
 
        if (industry == Industry.Pharmacy)
            return EmailCampaign.PharmacyTips;
 
        if (industry == Industry.Media)
            return EmailCampaign.MediaNews;
 
        throw new ArgumentException();
    }
}

Instead of dotting into the industry instance and comparing its name with the constants, we now just compare the industry itself with the predefined objects. Of course, you will need to override the equality members for your entities to make it work. The best way to do that is to factor this logic out to the base Entity class.

The customer controller also gets simpler. Instead of fetching the pharmacy industry from the data store, we can just use the one we’ve defined:

public class CustomerController
{
    public void CreatePharmacy(string name)
    {
        var customer = new Customer(Industry.Pharmacy);
        _customerReposoitory.Save(customer);
    }
}

So basically, whenever you need to get an industry, you can refer to the read-only properties of the Industry class, and that’s it. This is often helpful in unit tests as it allows you to make them more concise:

[Fact]
public void Updating_industry_updates_email_campaign()
{
    var customer = new Customer(Industry.Pharmacy);
 
    customer.UpdateIndustry(Industry.Cars);
 
    Assert.Equal(EmailCampaign.PharmacyTips, customer.EmailCampaign);
}

Reference data and integration testing

Defining reference data in code in a declarative manner helps simplify your application’s code base but you need to be careful with it and cover all outlined values with integration tests to make sure the Id and the name you set up in the domain model are the same as in the database.

Here’s how it can look like:

public class IndustryTests
{
    [Fact]
    public void All_requires_values_are_in_DB()
    {
        Verify(1, Industry.Cars);
        Verify(2, Industry.Pharmacy);
        Verify(3, Industry.Media);
    }
 
    private void Verify(int industryId, Industry hardCodedInstustry)
    {
        using (var unitOfWork = new UnitOfWork())
        {
            var repository = new IndustryRepository(unitOfWork);
            Industry industry = repository.GetById(industryId);
 
            Assert.Equal(industry.Id, hardCodedInstustry.Id);
            Assert.Equal(industry.Name, hardCodedInstustry.Name);
        }
    }
}

Note that you need to not only check the existence of the reference data in the DB but also verify that all values in the domain model and in the data store match. So if the Industry class gets a new Description field, you need to add a check for it to the integration test as well.

Summary

The declared values essentially act as constants we started off with but at the same time, they are part of your domain model. They are fully-fledged instances of a given type which are indistinguishable from those fetched from the database. By explicitly defining reference data as code, you are able to:

Avoid damaging cohesion in your application,
Cash this data and refer to it without additional roundtrips to the database,
Simplify the work with this data as it is already predefined and ready to be used.

Don’t forget to cover these values with integration tests, though, because they can easily go out of sync with the actual data from the DB.