Domain-centric vs data-centric approaches to software development

By Vladimir Khorikov

In this post, I’d like to make a comparison of two approaches that prevail in the world of (mostly enterprise) software development: domain-centric and data-centric.

If you read my last post (or any other post, quite frankly), you might have noticed I personally gravitate towards the domain-centric approach. Although this article is intended to be an impartial one, keep in mind that my bias can leak out.

Domain-centric vs data-centric approaches

The main difference between the two approaches is in the way people adhering to them treat software.

The data-centric style of thinking views data as the most valuable part of the application:

Domain-centric vs data-centric: Data as the most important element of the system

Data as the most important element of the system

There are two corollaries flowing naturally from that point:

  • Business logic tends to be placed as close to the data as possible. In case of relational databases, it is usually DB functions and stored procedures.
  • Application code is often considered to be secondary. The development is usually started with modeling the database structure. Application code conforms to the model built in DB.

With the domain-centric approach, on the other hand, programmers view the domain model as the most important part of the software project. It is usually represented in the application code, using an OO or functional language. Data (as well as other notions such as UI) is considered to be secondary in this case:

Domain-centric vs data-centric: Domain model as the most important element of the system

Domain model as the most important element of the system

Each of the approaches brings its own pros and cons, as well as some differences in the way developers address common design challenges. Let’s elaborate on that.

Code reuse

The data-centric approach tends to achieve code reuse by using the database as an integration point. It introduces common functionality in the database itself using DB functions and stored procedures. This unlocks the ability to have more than one application working with the same data:

Domain-centric vs data-centric: Code reuse with the data-centric style

Code reuse with the data-centric style

The domain-centric standpoint, on the other hand, enables the code reuse by creating APIs in the application code using such protocols as REST and SOAP. Software developers adhering to this approach tend not to share the database between applications. It means that each application usually has its own DB instance which it owns entirely:

Domain-centric vs data-centric: Code reuse with the domain-centric style

Code reuse with the domain-centric style

Consistency

One of the biggest benefits the data-centric style provides is the ease of maintaining data consistency. It’s much easier to support the consistency when all data is gathered in a single place and controlled primarily by the database itself in the form of stored procedures.

With the domain-centric approach, it is harder for developers to assure consistency. Such things as stale data and write conflicts are more common in this case. That is true even if there’s only a single application working with the database because, comparing to stored procedures, application code is “farther” away from the data it operates and thus more prone to the consistency issues.

Evolution

When it comes to refactoring, or modifying your software according to new requirements, the domain-centric approach works better than the data-centric one.

It is easier to apply changes to both the database and the application code if your application owns this DB completely. In this case, you can perform database migrations without negotiating them with other applications that might depend on the current database structure.

Even if your database is accessed by a single application only, the data-centric point of view usually implies there are separate developers or teams of developers working on the DB and the application. It means there’s still some negotiation required in order to do the change.

On the contrary, teams adhering to the domain-centric standpoint tend to work on the application code and the database together and thus can apply the evolutionary approach with fewer frictions.

Complexity growth

The most important distinction between the two methods is in the way they affect overall project complexity over time.

The data-centric approach is often easier to start with due to the simple and concise programming paradigm it proposes: Transaction Script. Application code in a data-centric code base tends to perform simple CRUD-like operations and is easy to grasp in the early stages of the project.

However, in my experience, the more complex a project gets, the less appealing the data-centric approach becomes. After a certain point, the effort required to evolve such a system explodes making it nearly impossible to introduce new functionality at a reasonable pace.

On the contrary, the domain-centric approach brings additional maintenance overhead at the beginning but pays off greatly over time:

Domain-centric vs data-centric: Complexity growth

Complexity growth

Starting from some point, the domain-centric method overtakes the data-centric approach in terms of complexity; it becomes easier to maintain and evolve a system adhering to its principles.

The reason here is that the problem domain itself is more important than the data it produces. Because of that, the investments we make in modeling of that domain have better ROI.

The main drawback with the domain-centric style of thinking is its learning curve. It is much steeper because it requires you to learn both database and OOP (FP) best practices.

That’s right, the domain-centric approach doesn’t mean you can ship a software without ever knowing how your database works. You still need to dive into it pretty deep and get your head around such topics as SQL, N+1 problems, normalization pros and cons – in case of relational storages – and sharding, replication and schemaless data design – in case of NoSQL DBs. But in addition to this, you also have to learn OO/functional design patterns and best practices in order to express your domain in the simplest and most maintainable way possible.

Domain-centric vs data-centric: conclusion

The two approaches aren’t really as opposed to each other as it might seem. I view the domain-centric way of programming as a natural expansion for the data-centric one. But that’s only my opinion: I myself gradually moved from one approach to the other earlier in my career.

To conclude, I’d like to summarize the points made in this post:

  • The data-centric style is easier to start with
  • The domain-centric approach does better in the long run
  • The domain-centric approach has a steeper learning curve: you have to study both database and application code design patterns and best practices.

For those who wants to learn more about the domain-centric approach, I highly recommend reading this book, if you haven’t already.

LinkedInRedditTumblrBufferPocketShare




  • David Raab

    That article should really be named “Two ways of working with relational databases” not the general “approaches to software design”.

    In your data-centric approach you describe a solution for code-reuse would be stored procedures and functions in the database. That only applies to relational database in general. For example most NoSQL databases don’t support those kind of things. On top you don’t even have to use a database at all.

    Your data also can be just files, in it’s own binary format or CSV, Excel, XML or whatever you come up with persistence. Nearly any Image manipulation program works this way (Photoshop, Gimp in binary files) Inkscape (on SVG/XML files) and so on. And you can even have application that only have in-memory data. Games usually only process its data in-memory, or Online Games fetches everything from a network.

    You can still do a “data-centric” approach in all of those applications. This kind is also named “data-driven development” but none of those will use a relational database nor does it mean you ever save your functions as close to your data as possible.

    In that sense data-driven is exactly the opposite. Data-driven means the data are the most important things. Data stands on its own and don’t have any functionality attached to it. Because in a data-driven design you always can replace the code at all with something different.

    For example, if you have a PNG file you can work with it from Photoshop, Gimp, Krita or any other application that supports the PNG file format. Data stands on its own and a PNG file doesn’t have any functionality attached to it.

    So in general what you explain here doesn’t map to “software development” in general. It only applies to software that uses a relational data storage.

    • David Raab

      Just some additional notes/information. data-centric and data-driven is actually something different. But i combine it because it still usually comes together. If you have a data-centric solution you will very likely also be data-driven. Data-Driven basically means that the program flow is based on data.

      As for data-centric. The book “Real-world Functional Programming” ( https://www.manning.com/books/real-world-functional-programming ) has its own Chapter (7) “Designing data-centric programs” about this topic. Here is what the first paragraph says.

      The first thing to do when designing a functional program is to think about the data that the program works with. Since any non-trivial program uses data, this phase is extremely important in the application design. When implementing a program in a functional language, we also begin with the data structures that we’ll use in the code and then write operations to manipulate the data as the second step. This is different to the object-oriented design, where data is encapsulated in the state of the objects; processing is expressed as methods that are part of the objects and interact with other objects involved in the operation. Most of functional programs are data-centric, which means that data is clearly separated from operations and adding a new operation to work with the data is a matter of writing a single function.

      Tomas Petricek also named this “Type-first-development”. http://tomasp.net/blog/type-first-development.aspx/

      And in this style you are basically also data-driven. Because if you define your data-types in that way you usually use pattern matching on your data and your program-flow is controlled by the data you put into a function.

      In that sense most functional programs are data-centric or data-driven. Data and code are separated from each other. In that sense also the whole Internet works in that way. The Actor model is also just basically a data-centric/data-driven programing model. You send messages to actors as data. And those actors operate based on what message they get. The internet works this way, as all internet protocols are just a messaging exchanging. A web server receives a message in the form of an HTTP Request and it returns a Response. The flow of a webserver and what it should do is completely controlled by the data he receives as a Response.

      In that sense your graphs should really be flipped. What raises complexity is the long run is really a domain-centric model. As the more functionality you have the more stuff you add to your classes (in OO programing). Classes just gets bigger and bigger.

      In a data-centric approach on the other hand you have your data on its own, without any logic. You can create a library working on some data-sets. And another one can create another library doing something completely different. Thus different things are separated from each other, complexity is reduced.

      In a domain-centric approach if you want to do two things with a domain object. The domain object itself has to know about all the two things. As behaviour is directly attached to the domain itself and not clearly separated.

      It is exactly what you end up described here:

      http://mergeconflict.com/coupling-in-object-oriented-programming/

      • ardalis

        I describe evolving OOP applications from data-centric to domain-centric in my N-Tier courses on Pluralsight, and then go into further detail on Domain-Driven Design in the DDD Fundamentals course there. The last point made about objects getting bigger and bigger is an anti-pattern. As long as you keep following the Single Responsibility Principle (also covered in one of my courses – SOLID), you should have many small, cohesive, well-factored objects that are each individually easy to understand and test but together and model the domain effectively.

        Shameless plug for my courses: bit.ly/PS-ssmith

        • http://enterprisecraftsmanship.com/ Vladimir Khorikov

          Didn’t watch your N-Tier courses but I did watch your and Julie’s course on DDD fundamentals. Great course. Especially liked your idea with the guest star.

        • David Raab

          Yes, i totally agree in what you say. But that is not what i’m saying.

          DDD is still about putting your data and behaviour in one place together. Data-centric is the opposite, a complete separation between data and behaviour.

          That you choose to not put not too much behaviour along the or “don’t make classes to big” is a good design decision and i totally agree to that. But here you just fights symptoms. With not putting any data and behaviour at all together (or avoiding DDD at all) you never run into the problem that something gets too big in the first place.

          When you embrace to make your classes as small as possible you also can just switch to a data-centric or functional style of programming.

          http://blog.ploeh.dk/2014/03/10/solid-the-next-step-is-functional/

          • http://enterprisecraftsmanship.com/ Vladimir Khorikov

            I think you are both right. It’s just the OOP paradigm isn’t suitable for separation of data and behavior, so all points you making are correct but only within it’s area of application – functional programming.

            In OOP, with have to conform to the existing reality. And the reality is that anemic domain model is a real problem in OO languages. This, arguably, makes FP superior to OOP, but that’s a topic for another discussion.

          • David Raab

            I actually don’t think anymore so much in OOP or Functional. I primarily use F# for development now and i use both OOP and Functional design alongside.

            I even think that “OOP” is just a small subterm of functional programming anyway. OOP is good when you need/want mutable state. In that case i use a class. If i don’t have have mutable state i use Records/DUs and Modules. Actually it goes hand in hand.

            On the other hand an even better way to handle mutable state are Agents that uses Message Passing. For example using a MailboxProcessor in F#. And while that technique is often referred to “functional” it really is more on exactly what Alan Key defined as OO when he invented the term OO.

            I also don’t think that an anemic or data-centric approach is bad in OO. It really comes down just to the question whether you have mutable state or not.

            OO is often considered of being mutable. So if you have mutable state an anemic model is bad. But if you have immutable then it also works fine in an OO language. It really just comes down of whether you have mutability or immutability.

            With mutability a class, or DDD approach is better. If you do immutability then a data-centric/anemic model is better. And you also can do both in OO languages.

            But that doesn’t change the general idea, and overall i think an data-centric approach that embraces immutability is often easier and less complex and more flexible.

            The reality is more that a full immutable and forcing immutability everywhere doesn’t makes sense everywhere. Sometimes you just need mutability, for example because performance matters more.

  • johnnywell

    Nice starting point thanks!