Functional C#: Primitive obsession

March 7, 2015

The topic described in this article is a part of my Applying Functional Principles in C# Pluralsight course.

This is the second article in my Functional C# blog post series.

Functional C#: Immutability
Functional C#: Primitive obsession
Functional C#: Non-nullable reference types
Functional C#: Handling failures and input errors

What is primitive obsession?

Primitive obsession stands for using primitive types to model domain. For example, this is how Customer class might look like in a typical C# application:

public class Customer
{
    public string Name { get; private set; }
    public string Email { get; private set; }
 
    public Customer(string name, string email)
    {
        Name = name;
        Email = email;
    }
}

The problem here is that when you want to enforce validation rules specific for your domain, you inevitably end up putting validation logic all over your source code:

public class Customer
{
    public string Name { get; private set; }
    public string Email { get; private set; }
 
    public Customer(string name, string email)
    {
        // Validate name
        if (string.IsNullOrWhiteSpace(name) || name.Length > 50)
            throw new ArgumentException("Name is invalid");
 
        // Validate e-mail
        if (string.IsNullOrWhiteSpace(email) || email.Length > 100)
            throw new ArgumentException("E-mail is invalid");
        if (!Regex.IsMatch(email, @"^([\w\.\-]+)@([\w\-]+)((\.(\w){2,3})+)$"))
            throw new ArgumentException("E-mail is invalid");
 
        Name = name;
        Email = email;
    }
 
    public void ChangeName(string name)
    {
        // Validate name
        if (string.IsNullOrWhiteSpace(name) || name.Length > 50)
            throw new ArgumentException("Name is invalid");
 
        Name = name;
    }
 
    public void ChangeEmail(string email)
    {
        // Validate e-mail
        if (string.IsNullOrWhiteSpace(email) || email.Length > 100)
            throw new ArgumentException("E-mail is invalid");
        if (!Regex.IsMatch(email, @"^([\w\.\-]+)@([\w\-]+)((\.(\w){2,3})+)$"))
            throw new ArgumentException("E-mail is invalid");
 
        Email = email;
    }
}

Moreover, the exact same validation rules tend to get into the application layer:

[HttpPost]
public ActionResultCreateCustomer(CustomerInfo customerInfo)
{
    if (!ModelState.IsValid)
        return View(customerInfo);
 
    Customer customer = new Customer(customerInfo.Name, customerInfo.Email);
    // Rest of the method
}
public class CustomerInfo
{
    [Required(ErrorMessage = "Name is required")]
    [StringLength(50, ErrorMessage = "Name is too long")]
    public string Name { get; set; }
 
    [Required(ErrorMessage = "E-mail is required")]
    [RegularExpression(@"^([\w\.\-]+)@([\w\-]+)((\.(\w){2,3})+)$", 
        ErrorMessage = "Invalid e-mail address")]
    [StringLength(100, ErrorMessage = "E-mail is too long")]
    public string Email { get; set; }
}

Apparently, such approach breaks DRY principle which claims the need for a single source of truth. That means that you should have a single authoritative source for each piece of domain knowledge in your software. In the example above, there are at least 3 of them.

How to get rid of primitive obsession?

To get rid of primitive obsession, we need to introduce two new types which could aggregate all the validation logic that is spread across the application:

public class Email
{
    private readonlystring _value;
 
    private Email(string value)
    {
        _value = value;
    }
 
    public static Result<Email> Create(string email)
    {
        if (string.IsNullOrWhiteSpace(email))
            return Result.Fail<Email>("E-mail can't be empty");
 
        if (email.Length > 100)
            return Result.Fail<Email>("E-mail is too long");
 
        if (!Regex.IsMatch(email, @"^([\w\.\-]+)@([\w\-]+)((\.(\w){2,3})+)$"))
            return Result.Fail<Email>("E-mail is invalid");
 
        return Result.Ok(new Email(email));
    }
 
    public static implicitoperatorstring(Email email)
    {
        return email._value;
    }
 
    public override bool Equals(object obj)
    {
        Email email = obj asEmail;
 
        if (ReferenceEquals(email, null))
            return false;
 
        return _value == email._value;
    }
 
    public override int GetHashCode()
    {
        return _value.GetHashCode();
    }
}
public class CustomerName
{
    public static Result<CustomerName> Create(string name)
    {
        if (string.IsNullOrWhiteSpace(name))
            return Result.Fail<CustomerName>("Name can't be empty");
 
        if (name.Length > 50)
            return Result.Fail<CustomerName>("Name is too long");
 
        return Result.Ok(new CustomerName(name));
    }
 
    // The rest is the same as in Email
}

The beauty of this approach is that whenever validation logic (or any other logic attached to those classes) changes, you need to change it in one place only. The fewer duplications you have, the fewer bugs you get, and the happier your customers become!

Note that the constructor in Email class is closed so the only way to create one is by using the Create method which does all the validations needed. By doing this, we make sure that an Email instance is in a valid state from the very beginning and all its invariants are met.

This is how the controller can use those classes:

[HttpPost]
public ActionResultCreateCustomer(CustomerInfo customerInfo)
{
    Result<Email> emailResult = Email.Create(customerInfo.Email);
    Result<CustomerName> nameResult = CustomerName.Create(customerInfo.Name);
 
    if (emailResult.Failure)
        ModelState.AddModelError("Email", emailResult.Error);
    if (nameResult.Failure)
        ModelState.AddModelError("Name", nameResult.Error);
 
    if (!ModelState.IsValid)
        return View(customerInfo);
 
    Customer customer = new Customer(nameResult.Value, emailResult.Value);
    // Rest of the method
}

The instances of Result<Email> and Result<CustomerName> explicitly tell us that the Create method may fail and if it does, we can know the reason by examining the Error property.

This is how Customer class can look like after the refactoring:

public class Customer
{
    public CustomerName Name { get; private set; }
    public Email Email { get; private set; }
 
    public Customer(CustomerName name, Email email)
    {
        if (name == null)
            throw new ArgumentNullException("name");
        if (email == null)
            throw new ArgumentNullException("email");
 
        Name = name;
        Email = email;
    }
 
    public void ChangeName(CustomerName name)
    {
        if (name == null)
            throw new ArgumentNullException("name");
 
        Name = name;
    }
 
    public void ChangeEmail(Email email)
    {
        if (email == null)
            throw new ArgumentNullException("email");
 
        Email = email;
    }
}

Almost all of the validations have been moved to Email and CustomerName classes. The only checks that are left is null checks. They still can be pretty annoying, but we’ll get to know how to handle them in a better way in the next article.

So, what benefits do we get by getting rid of primitive obsession?

We create a single authoritative knowledge source for every domain problem we solve in our code. No duplications, only clean and dry code.
Stronger type system. Compiler works for us with doubled effort: it is now impossible to mistakenly assign an email to a customer name field, that would result in a compiler error.
No need to validate values passed in. If we get an object of type Email or CustomerName, we are 100% sure that it is in a correct state.

There’s one detail I’d like point out. Some people tend to wrap and unwrap primitive values multiple times during a single operation:

public void Process(string oldEmail, string newEmail)
{
    Result<Email> oldEmailResult = Email.Create(oldEmail);
    Result<Email> newEmailResult = Email.Create(newEmail);
 
    if (oldEmailResult.Failure || newEmailResult.Failure)
        return ;
 
    string oldEmailValue = oldEmailResult.Value;
    Customer customer = GetCustomerByEmail(oldEmailValue);
    customer.Email = newEmailResult.Value;
}

Instead of doing it, it is better to use custom types across the whole application unwrapping them only when the data leaves the domain boundaries, i.e. is being saved in database or rendered to HTML. In your domain classes, try to use them as much as possible. It would result in a cleaner and more maintainable code:

public void Process(Email oldEmail, Email newEmail)
{
    Customer customer = GetCustomerByEmail(oldEmail);
    customer.Email = newEmail;
}

The other side: limitations

Unfortunately, custom types creation in C# is not as neat as in functional languages like F#. That probably will be changed in C# 7 if we get record types and pattern matching, but until that moment we need to deal with overall clunkiness of that approach.

Because of that, I find some really simple primitives not worth being wrapped. For example, money amount with the single invariant stating that the amount can’t be negative probably could still be represented as decimal. That would lead to some validation logic duplication, but - again - that is probably a simpler design decision even in a long run.

As usual, appeal to a common sense and weight pros and cons in every single situation. And don’t hesitate to change your mind, even multiple times.

Summary

With immutable and non-primitive types, we are getting closer to designing applications in C# in a functional way. Next time, I’ll show how to mitigate the billion dollar mistake.