State vs migration-driven database delivery



The topic described in this article is a part of my Database Delivery Best Practices Pluralsight course

In the previous post, I wrote about the database versioning best practices. I’ve got a lot of feedback on that topic. One of the responses stated that the article was (admittedly) one-sided as it didn’t cover the other widely spread approach to managing the database delivery process. I decided to fill that gap and provide my own comparison of the two. This article is about the state versus migration-driven database delivery.

State-driven database delivery

The state-driven approach was initially introduced by such vendors as Redgate and later popularized by Microsoft when they shipped Visual Studio database project. The idea is pretty simple and yet very appealing: all we need to do is we need to maintain a snapshot of our database structure. A compare tool then will auto-generate the scripts required to upgrade any existing database to mimic the etalon.

The whole upgrade process looks like this:

State-driven database delivery

State-driven database delivery

It doesn’t matter how many differences are there between the database being upgraded and the snapshot database. The target DB can be empty or it can be one of the previous versions of the snapshot DB, the scripts generated will do everything needed to get it in sync with the snapshot DB.

Essentially, the tool generating the upgrade scripts does all the hard work. The intention behind this approach is to free us, developers, from that work and make our life easier.

Migration-driven database delivery

The migration-driven approach takes another side. Instead of having a single etalon, it implies creating a bunch of migrations that transition a database from one version to another:

Migration-driven database delivery

Migration-driven database delivery

Those migrations might be represented in different ways. Plain SQL scripts is the most popular choice, but they could also be written in some DSL-like language:

[Migration(1)]

public class CreateUserTable : Migration

{

    public override void Up()

    {

        Create.Table(“Users”);

    }

 

    public override void Down()

    {

        Delete.Table(“Users”);

    }

}

Regardless of how the migrations are represented, they should be able to be unambiguously translated into upgrade scripts which then can be executed against the database. This means that, by and large, we have to code the migrations manually.

State vs migration-driven database delivery: comparison

The two approaches have different implications to the database delivery process. Those differences can be boiled down to two points: resolving merge conflicts and handling data migrations. Let’s elaborate on that.

Merge conflicts arise when two or more programmers work on the same piece of a database. The state-driven approach allows us to handle them in the easiest way possible. As the whole database is represented as an SQL script, the merge process is pretty straightforward:

Resolving merge conflicts using an etalon

Resolving merge conflicts using an etalon

At the same time, if you deal with a set of migration scripts, it might be extremely tedious to handle conflicts in them. This is mainly related to the work with stored procedures and database functions as they are prone to the “last migration wins” problem the most.

With migrations, you have to carefully proof-read every line in the conflicting changes to make sure the resulting migrations take all of them into account.

Data migration is the process of changing the shape of existing data so that it conforms to the new database schema. An example of such migration would be the process of splitting a name column up into two fields: the first name and the last name.

There is no easy way to implement this change with state-driven database delivery. The compare tools are awful when it comes to managing not only a database schema but also its data. It’s no wonder. While the DB schema is objective, meaning that there is the only way to interpret it, the data inside is context-dependent. That means the tool can’t make reliable assumptions about the data in order to generate the upgrade scripts, and we have to deal with it by ourselves anyway.

Such duality is a huge problem in the world of state-driven database delivery. Eventually, we find ourselves in a position where we have auto-generated scripts for DB schema on one hand, and manually crafted scripts for the data on the other. This loosens the cohesion between schema and data changes and eventually becomes a pain in the butt, especially in a project with multiple production databases.

With the state-driven approach, we end up with messy hook-up scripts for pre-execution:

— Pre-execution script

IF @Version > 15 AND @Version <= 25

BEGIN

  — Create a temp table with the Cusomer names

  CREATE dbo.tmp_CustomerNames (ID int, Name)

END

 

IF @Version > 25 AND @Version <= 28

BEGIN

 

END

And post-execution steps:

— Post-execution script

IF @Version > 15 AND @Version <= 25

BEGIN

  — Fill the new First Name and Last Name columns

  UPDATE dbo.Customer

  SET FirstName = fn_Split(c.Name, 0), LastName = fn_Split(c.Name, 1)

  FROM dbo.Customer c

  INNER JOIN dbo.tmp_CustomerNames n ON n.CustomerID = c.CustomerID

 

  DROP TABLE dbo.tmp_CustomerNames

END

 

IF @Version > 25 AND @Version <= 28

BEGIN

 

END

With these hook-ups, it’s hard to overlook the whole picture of what is going on with the database as you need to assemble this picture from 3 different places.

The migration-driven approach, at the same time, handles this problem with ease. The migrations themselves may contain scripts for both schema and data upgrades. The cohesion between them comes out naturally, without any additional effort.

What approach to choose?

So the state-driven approach is good at handling merge problems while the migration-driven – at dealing with data migrations. What approach to choose? To answer this question, we should look at the problem from a slightly different perspective.

I strongly believe that a heavy use of stored procedures and database functions breaks the Separation of Concerns principle. Application logic should be stored in the application code, not in the database. This applies not only to C#/Java/etc code but also to SQL code. If you need to use any SQL within your application, go ahead but be sure you put this code in your application code files.

Ideally, a database should become a dump storage which doesn’t know how to deal with data in it. A project where the SoC principle is properly applied doesn’t contain many (if at all) stored procedures and functions, only tables, views, and the data in them.

In such projects, merge problems are not as big of a problem because of the lack of logic inside the database. Therefore, the migration-driven approach does the best job possible in such projects. I talked about best practices that regard to this approach in the previous post, so be sure to read them.

Does it mean we should always use migrations for maintaining our databases? Not at all. There still are some cases where the state-driven approach is applicable.

One of such cases is legacy projects. If you are working on a project which heavily relies on logic baked into the database, you might not be able to extract that logic out to the application code. If there is a large team of developers working on this project, the use of the state-driven approach might be the best choice as it allows for alleviating merge problems when it comes to stored procedures and database functions.

Another valid use case would be a small project that is not yet shipped to production. Usually, you don’t have to worry about data in such projects, so you can just wipe all of it out and re-create the whole database from scratch whenever you need to upgrade it. This approach allows you to quickly sketch your database without putting a lot of effort in building proper migration scripts. None the less, be sure to switch to the migration approach after the project is deployed.

Summary

This post was inspired by an NDC talk given by Alex Yates. Check it out if you want to learn more on this topic.

Let’s summarize:

  • The state-driven approach is good for projects with a lot of logic in the database and a large team working on it. You are also better off choosing it for small projects that are not in production yet.
  • The migration-driven approach is a better choice in all other cases.

Other articles in the series

Share




  • Alex Yates

    A nice post – and thanks for the shout out. 🙂

    I do have a few questions:

    1. With a migrations based approach, it is bad to go and edit a migration script after you have committed it. Agreed. However, lets say that I’m currently up to script 10, but I discover that script 5 made a change that accidentally dropped data. Perhaps I made a mistake in a complicated data migration or perhaps I didn’t think I needed the data at the time but now I realize that was an unwise decision. One of my customers is up to script 3 so in order to deploy to this customer I have a problem. Script 5 drops data. Ideally I should add a script 11 to correct the mistake at script 5, but how do I avoid data loss?

    2. Can you define “application logic”? Does this imply that there is also a category called “data logic”? Where does “business logic” fit in? I guess I get concerned when I hear people using these sorts of terms as I feel like these definitions are a bit fluffy.

    3. On a related point, you’ve clearly stated that you generally dislike to use many stored procedures. What do you think of the idea that applications should decide the data to put into/take out of my database, but the database should be responsible for handling the actual data changes, within transactions, probably using stored procedures. Have you read this thoroughly entertaining and thought provoking post that hit the twittersphere earlier this year?: http://rob.conery.io/2015/02/21/its-time-to-get-over-that-stored-procedure-aversion-you-have/ I’m not saying I agree with every point he makes but he makes a few good ones.

    Once again, thanks for sharing this post.

    • http://enterprisecraftsmanship.com/ Vladimir Khorikov

      Thank you for your feedback!

      I’ll start with #3.

      3. The article reminded me about the dispute taking place for as long as I can remember: the data vs the model approaches to software development. I do have a strong opinion on this topic as well 🙂 In short: I think the “business logic in the database” approach doesn’t scale because it’s not flexible enough, meaning that you can’t rely on it to express any more or less complex domain model. OO (as well as functional) design gives us much more opportunities for that than the procedural design that SP leaves us with. SP and database functions are “closer to the ground” and thus provide some benefits (performance for example) but they don’t allow us to build proper abstractions. I believe DDD is a key to creating successful software projects (at least in the enterprise world). SPs are just too rigid to follow the DDD principles.

      What do you think of the idea that applications should decide the data to put into/take out of my database, but the database should be responsible for handling the actual data changes, within transactions, probably using stored procedures.

      I think that wouldn’t differ much from the situation where we use stored procedures to define the application logic.

      2. I equate “application logic” with “business logic” but you are right, the definition for those is indeed inconsistent. I use it as a broad term – generally to define the behavior of a system that is related to business rules. I haven’t heard of the term “data logic” before.

      1. If the script 5 wasn’t yet committed to the source control, then that would be relatively easy to implement: just change the script #5 locally and adjust all subsequent scripts so that they work properly. That situation actually doesn’t counter the best practices I described in the previous post. The scripts become immutable only when they are deployed. While a developer works on them locally, that’s fine to change them. On the other hand, if they were deployed to QA or STG (but not to production), that would be much tougher. I think in this situation I would go ahead and change them despite the rule and then try to communicate to others that I screwed up and there should be some unusual steps taken to resolve the problem. There’s no simple solution in such situations (at least I don’t know of such). In my experience, they don’t happen too often, though.