Does software own data, or just use it? This post explores some of the issues around data ownership; particularly, whether it is safe to directly modify data in production.

Not so long ago, software design began with an ERD. The (relational) database was king and software revolved around it like planets around the sun. More recently, this thinking has been turned around, considering the database as another internal concern of the software. This is obviously not true of all systems, but it certainly holds for a large number of systems I've come across.

Treating the data store (regardless of kind) as an internal concern gives us the flexibility to choose the most appropriate data storage mechanism; possibly selecting different data stores for different parts of the system. We are largely free to change this whenever necessary and to expect these changes not to have any external effect (assuming, of course, that the outward behaviour of the system does not change). Whether we choose to use SQL Server, Oracle, MongoDB or Neo4j shouldn't be of concern to the user - provided the system meets the requirements. Many projects constrain the choice of data storage technologies, but there's almost always room to select a product within these constraints.

We can face a challenge when a team must transition from the old approach to the new. Where they would previously just change data directly in the database, they must now adapt to doing everything through the software. The data is no longer theirs to control: it belongs to the software. Failing to respect this change can lead to significant problems in production.

Data Integrity

Relational data stores provide powerful constraints to ensure data integrity. The ability to define foreign keys, CHECK constraints, UNIQUE constraints (and many others) is useful for ensuring basic data integrity is enforced. But this leads us into a false sense of security. Where databases were historically responsible for maintaining this level of data integrity, software is increasingly taking over. Many business rules either cannot be expressed with these constraints or are considered the responsibility of the application, with teams avoiding business logic embedded into the database.

Changing data directly in a production system by circumventing the application can cause major problems that are hard to recover from. Depending on how and when the business validity of data is checked, the application may no longer be able to load data if it is incorrectly updated.

Complexity

One issue that has reared its head recently (and the primary motivation behind writing this post) is around a third-party library that is used in one of my projects. In this scenario, the Operations Team wanted to amend the schedules of various jobs managed by Quartz . Unfortunately, many people made the assumption that they could simply amend the triggers in the database... probably because it looks so easy. This assumption was not tested against the facts, and was made without appreciation for the complexity behind Quartz. This is a good example of where a simple review of the documentation would have resolved the issue.

Making assumptions about your system without verifying them is a reckless approach to system management, and one that is easily avoided. Sometimes it might be safe to modify the database directly, but it's definitely an area where angels fear to tread. If in doubt, don't!



Published

23 January 2014

Share

submit to reddit