Storing information in its highest form

There’s an interesting guideline I’ve been meaning to write about for a long time. I call it Storing information in its highest form.

1. Storing information in its highest form

I received a question on this topic recently, which prompted me to finally write about it, but we’ll start with a simpler example first and discuss the question after that.

So what is this guideline about, exactly?

Let’s say we are building an online movie theater and need to store movie duration in our database:

Movie example
Movie example

On the front-end, the duration is represented as 1h 47m. But how should we store it in our database? What data type should we use?

We could keep the duration as a 1h 47m string, just as we render it on the screen, but what if we later decide to change the string format? We might need to display it as 1:47 or 107 minutes instead. Do we then parse all the existing strings and convert them into a new format?

Obviously, that would be terrible. For that reason, we shouldn’t store the movie duration as a string, because if we choose one particular format, we would lock ourselves into that format, and it’d be difficult to change it later on.

Instead, we should store the movie duration in a form that can be easily converted into any of these three formats, which is the number of minutes (of type integer):

Storing information in its highest form
Storing information in its highest form

Here, the number of minutes can be easily converted into any string format, and therefore, it is the highest form of the information about movie duration.

Here’s another way to phrase this guideline:

Store the source, not a rendition.

There could be multiple renditions of some piece of information; you shouldn’t restrict yourself to just one of them. Keep the source of that information instead.

2. A more sophisticated example

I’m sure the guideline sounds trivial when illustrated using the example with movie duration. After all, the decision to store the duration as the number of minutes is quite obvious.

But while the above example is indeed simple, the guideline itself is really not. There’s a number of more sophisticated examples where the solution with keeping the source of the information isn’t that obvious.

Let’s look at one of them. This is an example from the question I mentioned previously.

Let’s say we have a Customer entity and a LoyaltyPoints value object:

public class Customer : Entity 
{
    public LoyaltyPoints Points { get; private set; }

    public void AddLoyaltyPoints(LoyaltyPoints points)
    {
        Points += points;
    }

    public void RedeemLoyaltyPoints(LoyaltyPoints points)
    {
        if (Points < 250 || points > Points)
            throw new Exception();

        Points -= points;
    }
}

There are two use cases associated with them:

  • When a customer places an order, the Order entity calculates loyalty points based on the order amount and calls AddLoyaltyPoints on the customer.

  • The customer can redeem loyalty points when they have a minimum value of 250.

So far so good. The code above addresses these requirements perfectly.

Now let’s say we’ve got a new requirement. In addition to placing a new order, the customer can update an existing order. When an item is removed from an order, Order entity must calculate the difference in loyalty points and subtract it from the customer’s amount.

How can we implement this requirement? Here are three potential solutions that come to mind.

  • One way is to reuse RedeemLoyaltyPoints to do the subtraction. The problem is that this method checks for the minimum value of 250 and would throw an exception for a customer with no loyalty points.

  • Another way is to use AddLoyaltyPoints, but pass a LoyaltyPoints with a negative value, so that Points += points would lead to subtraction instead of addition.

    That’s also no good because, by the business rules, LoyaltyPoints can’t be negative. Customers can’t "borrow" loyalty points and thus the principle of always valid domain model demands that we don’t allow loyalty points to be negative. Letting it become negative just for the sake of this workaround violates that foundational principle.

  • Finally, we could introduce a separate method that doesn’t check for the minimum value of 250:

    public void SubtractLoyaltyPoints(LoyaltyPoints points)
    {
        Points -= points;
    }
    

    But this option brings another set of potential issues. Now we have two public methods in Customer entity that do point subtraction, and it’s not clear at all which of them should be used when just by looking at their names. While RedeemLoyaltyPoints is meaningful enough for us to guess its purpose, SubtractLoyaltyPoints is just too generic. Alternative names, such as ReduceLoyaltyPointsForOrderUpdate aren’t much better.

    The fact that we can’t easily trace a public method to a business use case is a huge red flag that means we are exposing an implementation detail.

So, how should we deal with the new requirement then?

This is where the guideline of storing information in its highest form comes into play. In fact, all 3 above solutions are just an attempt to deal with the consequences of choosing an incorrect design. The guideline helps correct that choice.

But what does it mean to store the information in its highest form when it comes to this particular example?

It means we need to store the source data of the calculation, not its result. The remaining loyalty points are the result of two pieces of information: how many points the customer earned and how many they redeemed.

Storing information in its highest form -- more sophisticated example
Storing information in its highest form — more sophisticated example

Instead of storing the remainder like we do now, we should keep the two upstream values and calculate the remainder on the fly:

public class Customer : Entity 
{
    public LoyaltyPoints PointsEarned { get; private set; }
    public LoyaltyPoints PointsRedeemed { get; private set; }

    public LoyaltyPoints Points => PointsEarned - PointsRedeemed;

    public void IncreaseEarnedPoints(LoyaltyPoints points)
    {
        PointsEarned += points;
    }

    public void ReduceEarnedPoints(LoyaltyPoints points)
    {
        PointsEarned -= points;
    }

    public void RedeemPoints(LoyaltyPoints points)
    {
        if (Points < 250 || points > Points)
            throw new Exception();

        PointsRedeemed += points;
    }
}

This solution fixes our issues. The core problem with the initial approach was that the single field didn’t properly convey the meaning of the SubtractLoyaltyPoints method. It could mean either of the two use cases:

  • Removal of earned points, or

  • Addition of redeemed points.

The single field is just too crude and can’t express the meaning properly. Meanwhile, the distinction between the two use cases is crucial because one of them must contain the validation (redeeming points), while the other must not (readjusting earned points). By splitting the field in two, we are eliminating this ambiguity.

Notice that although the above examples (movie duration and loyalty points) are different, the guideline application is exactly the same. Just like we calculate the 1h 47m string on the fly from its 107 source, we are also calculating the remaining loyalty points from PointsEarned and PointsRedeemed.

In both scenarios, the guideline allows us to not lock ourselves into a particular format and change it later on. For example, in addition to showing the customer the remaining loyalty points, our system could also display the total amount of redeemed points to emphasize how much they saved.

3. Event Sourcing and friends

You have probably noticed that our current design of the Customer entity could be improved further. While we’ve moved upstream from the loyalty points remainder toward the PointsEarned and PointsRedeemed, only one of these fields really is the source information.

Can you see which one is that?

It’s PointsRedeemed — this field is the sum of all redemptions made by the customer. PointsEarned, on the other hand, is derived from the customer’s orders.

So, we can make a step further in accordance to the guideline and store the loyalty points each order has procured within the Order class. We can then keep the list of orders in Customer instead of just the single PointsEarned field:

Replacing PointsRedeemed with a list of orders
Replacing PointsRedeemed with a list of orders

Here’s how it may look in code:

public class Customer : Entity 
{
    public Order[] Orders { get; private set; }
    public LoyaltyPoints PointsRedeemed { get; private set; }

    public LoyaltyPoints PointsEarned => Orders.Sum(x => x.LoyaltyPoints);
    public LoyaltyPoints Points => PointsEarned - PointsRedeemed;

    public void RedeemPoints(LoyaltyPoints points)
    {
        if (Points < 250 || points > Points)
            throw new Exception();

        PointsRedeemed += points;
    }
}

There’s no need for IncreaseEarnedPoints and ReduceEarnedPoints anymore, since the earned loyalty points are now controlled by the Order class.

Even with this solution, we can still make another step further and break down the PointsRedeemed field by storing individual redemptions instead of the total sum.

If we keep doing this until the logical conclusion, what we end up with is Event Sourcing. Events in the Event Sourcing architecture are the highest possible form for any information, and it can be molded to any shape we like.

4. The importance of balance

So, do I propose to use Event Sourcing everywhere?

No, I don’t. It’s important to keep a balance when following this guideline. In that sense, it’s not a guideline per se, but just something to be aware of — a tool that can help solve a design issue like in the example with loyalty points. How fine-grained the source information needs to be depends on your project needs.

In the above example, I would probably stop with just the two fields (PointsEarned and PointsRedeemed) unless there are new requirements that this solution can’t address.

The balance here is ultimately between flexibility on one hand and storage requirements, complexity, and performance on the other:

Costs and benefits of following the guideline
Costs and benefits of following the guideline

It’s more flexible to keep the source of the information but you will pay for this flexibility with additional storage requirements, increased complexity, and even reduced performance:

  • If we decide to pre-aggregate data, we give up a degree of flexibility. We lock ourselves into a single format and potentially even lose valuable information (for example, with a single Points field, we can no longer say how much points the customer has earned in total).

  • But if we lean to much in the other direction, we may unnecessarily overcomplicate our solution. The need for the added flexibility may never arise in practice.

It’s hard to say where exactly you should stop when considering the guideline. Each project is different. But I’d say this: do follow the guideline as long as the effect of the 3 drawbacks is minimal.

For example:

  • For movie duration, there’s no difference between storing an integer and a string, so it’s a no-brainer.

  • For loyalty points, two fields are also not much different from one — they can both be stored in the same database table.

  • Keeping points in the Order class may be fine as long as the number of orders per customer is small and you can make Order part of the Customer aggregate. But going further than that is probably an overkill.

    And if the number of orders per customer is large and you have to make Order an aggregate of its own, then I’d stop at the solution with the PointsEarned and PointsRedeemed fields. This would probably be best balance-wise.

As always, balancing out different pros and cons is hard.

5. Summary

  • Store information in its highest form. Another way to phrase this guideline is: store the source, not a rendition.

  • Event Sourcing is the logical conclusion of this guideline. Events in Event Sourcing are the highest possible form for any information; they can be molded to any shape.

  • Consider balance when following this guideline. Added flexibility comes at a cost of storage requirements, complexity, and performance.

Subscribe


I don't post everything on my blog. Don't miss smaller tips and updates. Sign up to my mailing list below.

Comments


comments powered by Disqus