Reference data as code

By Vladimir Khorikov

In this article, I’d like to write about a powerful technique that can potentially save you a lot of work and make your code much more concise: representing reference data as code.

Referring to reference data in code: the classic way

Reference data is any data that is required for your application to run properly. It might be a list of country codes, a list of user types, and so on. Anything your code base can’t live without. Because of that, the reference data your software relies on should be treated the same way as the database schema, and every change should be reflected in migration scripts. Check out this post for more details.

Let’s take an example. Let’s say that there’s an industry entity in our domain model with the following simple structure:

public class Industry : Entity

{

    public string Name { get; private set; }

}

Let’s also say that it’s part of the reference data our application relies upon, and a set of predefined industries should exist in the database. A common way to define those industries is to use constants, like this:

public class Industry : Entity

{

    public const string CarsIndustry = “Cars”;

    public const string PharmacyIndustry = “Pharmacy”;

    public const string MediaIndustry = “Media”;

 

    public string Name { get; private set; }

}

Now, if we have a customer that works in one of those industries, we can represent it with the following code:

public class Customer : Entity

{

    public Industry Industry { get; private set; }

    public EmailCampaign EmailCampaign { get; private set; }

 

    public Customer(Industry industry)

    {

        UpdateIndustry(industry);

    }

 

    public void UpdateIndustry(Industry industry)

    {

        EmailCampaign = GetEmailCampaign(industry);

        Industry = industry;

    }

 

    private EmailCampaign GetEmailCampaign(Industry industry)

    {

        if (industry.Name == Industry.CarsIndustry)

            return EmailCampaign.LatestCarModels;

 

        if (industry.Name == Industry.PharmacyIndustry)

            return EmailCampaign.PharmacyTips;

 

        if (industry.Name == Industry.MediaIndustry)

            return EmailCampaign.MediaNews;

 

        throw new ArgumentException();

    }

}

 

public enum EmailCampaign

{

    LatestCarModels,

    PharmacyTips,

    MediaNews

}

As you can see, the customer also contains an EmailCampaign parameter which depends on the type of industry the customer is assigned to. For example, if it’s the cars industry, the customer will get a newsletter about the latest cars models. If it’s pharmacy – pharmacy tips, and so on. Whenever we update the customer’s industry, the email campaign should be updated as well.

Representing reference data as code

There are several issues with this classic way of working with reference data. The first and most important one is that we damage the cohesion between the name of the industry and the industry itself. Look at this code again:

public class Industry : Entity

{

    public const string CarsIndustry = “Cars”;

    public const string PharmacyIndustry = “Pharmacy”;

    public const string MediaIndustry = “Media”;

 

    public string Name { get; private set; }

}

We already have the Name property defined in the industry class. Specifying separate name constants means we distribute the same responsibility between two places in the code.

Another drawback here is that even though the list of industries doesn’t change while the application is running, we are forced to fetch them from the database whenever we need one. Here’s an example of creating a customer in the pharmacy industry:

public class CustomerController

{

    public void CreatePharmacy(string name)

    {

        Industry industry = _industryRepository.GetByName(Industry.PharmacyIndustry);

        var customer = new Customer(industry);

        _customerRepository.Save(customer);

    }

}

It would be much better to just keep the list of industries in the memory instead of referring to the data store all the time.

We can solve both of these issues by representing reference data as code. As such data can’t change at runtime, we can rely on the existence of those industries in the database and define them explicitly in our code base. Here’s how we can do that:

public class Industry : Entity

{

    public static readonly Industry Cars = new Industry(1, “Cars”);

    public static readonly Industry Pharmacy = new Industry(2, “Pharmacy”);

    public static readonly Industry Media = new Industry(3, “Media”);

 

    public string Name { get; private set; }

 

    private Industry(int id, string name)

    {

        Id = id;

        Name = name;

    }

}

As you can see, the constants were removed and static read-only fields were added instead. Note that the constructor is made private, so there’s no way to create a new industry, which is a good thing as they are not something that can be added programmatically. All possible combinations are predefined and ready to be used by the client code.

Here’s how the Customer class would look like after that:

public class Customer : Entity

{

    public Industry Industry { get; private set; }

    public EmailCampaign EmailCampaign { get; private set; }

 

    public void UpdateIndustry(Industry industry)

    {

        EmailCampaign = GetEmailCampaign(industry);

        Industry = industry;

    }

 

    private EmailCampaign GetEmailCampaign(Industry industry)

    {

        if (industry == Industry.Cars)

            return EmailCampaign.LatestCarModels;

 

        if (industry == Industry.Pharmacy)

            return EmailCampaign.PharmacyTips;

 

        if (industry == Industry.Media)

            return EmailCampaign.MediaNews;

 

        throw new ArgumentException();

    }

}

Instead of dotting into the industry instance and comparing its name with the constants, we now just compare the industry itself with the predefined objects. Of course, you will need to override the equality members for your entities to make it work. The best way to do that is to factor this logic out to the base Entity class.

The customer controller also gets simpler. Instead of fetching the pharmacy industry from the data store, we can just use the one we’ve defined:

public class CustomerController

{

    public void CreatePharmacy(string name)

    {

        var customer = new Customer(Industry.Pharmacy);

        _customerReposoitory.Save(customer);

    }

}

So basically, whenever you need to get an industry, you can refer to the read-only properties of the Industry class, and that’s it. This is often helpful in unit tests as it allows you to make them more concise:

[Fact]

public void Updating_industry_updates_email_campaign()

{

    var customer = new Customer(Industry.Pharmacy);

 

    customer.UpdateIndustry(Industry.Cars);

 

    Assert.Equal(EmailCampaign.PharmacyTips, customer.EmailCampaign);

}

Reference data and integration testing

Defining reference data in code in a declarative manner helps simplify your application’s code base but you need to be careful with it and cover all outlined values with integration tests to make sure the Id and the name you set up in the domain model are the same as in the database.

Here’s how it can look like:

public class IndustryTests

{

    [Fact]

    public void All_requires_values_are_in_DB()

    {

        Verify(1, Industry.Cars);

        Verify(2, Industry.Pharmacy);

        Verify(3, Industry.Media);

    }

 

    private void Verify(int industryId, Industry hardCodedInstustry)

    {

        using (var unitOfWork = new UnitOfWork())

        {

            var repository = new IndustryRepository(unitOfWork);

            Industry industry = repository.GetById(industryId);

 

            Assert.Equal(industry.Id, hardCodedInstustry.Id);

            Assert.Equal(industry.Name, hardCodedInstustry.Name);

        }

    }

}

Note that you need to not only check the existence of the reference data in the DB but also verify that all values in the domain model and in the data store match. So if the Industry class gets a new Description field, you need to add a check for it to the integration test as well.

Summary

The declared values essentially act as constants we started off with but at the same time, they are part of your domain model. They are fully-fledged instances of a given type which are indistinguishable from those fetched from the database. By explicitly defining reference data as code, you are able to:

  • Avoid damaging cohesion in your application,
  • Cash this data and refer to it without additional roundtrips to the database,
  • Simplify the work with this data as it is already predefined and ready to be used.

Don’t forget to cover these values with integration tests, though, because they can easily go out of sync with the actual data from the DB.

Source code

The source code with the “before” and “after” code samples

Related articles

LinkedInRedditTumblrBufferPocketShare




  • Damir Bagapov

    How could you be sure that properties of industries defined in code are synced with database? What if someone change in database the name (or even Id?) of industry?

    • Naeem Sarfraz

      In the past I’ve used integration tests as a means to keep checking values in the database against the ones defined in code (in my test). This worked because we had a very good set of migration scripts but you’ve identified a weakness in that anyone with access to the database could change the values. However if it is a developer change and they remembered to create and checkin a migration script we would catch that at the CI level.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      You need to meet 2 conditions in order to make this work:
      1) Have a set of integration tests which verify the values match (I gave an example in the article)
      2) You need not to change reference data manually, ever. I wrote about that a little bit here: http://enterprisecraftsmanship.com/2015/08/10/database-versioning-best-practices/, the idea is that reference data has the same meaning as database schema for your application and the only way you can update it is via migration scripts. With the migration scripts, it’s pretty easy to verify the values defined in code are in sync with the database using a standard CI environment.

  • Naeem Sarfraz

    I don’t know xUnit very well but I find the way you’ve defined essentially three tests in one frustrating. Just wondering why you wouldn’t use the Theory and InlineData attributes? Similar to this approach using NUnit http://stackoverflow.com/questions/1382313/how-do-i-specify-test-method-parameters-with-testdriven-net/7399946#7399946

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      InlineData would be an ideal solution, indeed. Unfortunately, to make it work, you need to use “compiler-friendly” constants (i.e. strings and integers), you can’t specify fields or properties as a parameter for an attribute.

  • Dennis

    Hey,
    I think that the solution that you’ve presented violates the SOLID principles.
    Take a look at the GetEmailCampaign method. It violates the OCP. For every new industry you’ll have to modify this class. Same goes with the Industry class.

    I’d refactor this code to be more Domain Driven Design. First of all I’d have a class for every Industry type and have the EmailCampaign or the Update service injected to the customer without making him care for the industry types.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Hi Dennis,

      It’s a nice alternative solution but I think a separate class hierarchy for industries would be an overkill here. A method with an if/switch statement like in the sample above would be a simpler design decision in this particular case.

  • Chris Clarke

    Hi Dennis,

    I like the approach but whereas you can drop a defined constant variable into some C# LINQ you cannot replace it with a reference to GetEmailCampaign(“Cars”).

    It won’t compile – I’ve tried it.

    The only way to make it work in LINQ is to define a receiving variable first and then include the receiving variable in the LINQ statement, makes it unwieldy and undermines the proposed simplification.

  • Michael G.

    One way to ensure that your reference data is kept updated, would be to include a small T4 template in your project, and set it to be automatically executed before each build.

    The above approach might be overkill if the reference data is fairly simple and few in number. But reference data can be fairly complex, and there might be several to keep updated. In such a case, T4 templates can help alleviate some of the manual labor involved.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Great point!