Schema Versioning in CodaServer

This is a repost of an article from my old blog about CodaServer. It's being published here for posterity.

Databases have always been a little awkward in the versioning department.

A database server is an environment all its own.  Different databases can have identical schemas but much different data in them.  Some of this data is very much part of the "structure" of the schema:  Tables that are used to populate important application drop down boxes, for instance, or metadata tables for various reporting functions.

This data is programmer data instead of user data, and the application can't work without it.

How does one keep both schema and programmer data in sync across different environments?

The Old Way

SQL databases are specialists.  They do their thing handling queries and update statements flying in from all directions.  They maintain data integrity by following the rules of the schema.  They handle enormous loads.  But that's about it.

Database servers are often known as "instances," and for good reason:  They are entirely self-contained.  They generally don't know about each other and don't involve themselves in the rest of the software ecosystem.  If you want to set up a testing environment, you "fire up a new instance" and load your schema onto it.  As you roll out new versions of application code, you run the diffs of the schema against the new environment and backfill the data.

The developer (or system administrator) does the thinking while the database server just runs its queries and twiddles its hard drive platters.

We should be so lucky.

The CodaServer Way

CodaServer has a much different architecture.  One CodaServer manages your development, testing, and production environments.  If you need more capacity, you can fire up multiple CodaServer instances, point them at the same system datasource, and put a load balancer in front of them.  Since all the communication with CodaServer is over HTTP-based Web Services, it has an architecture very similar to that of the web, and can leverage all the same techniques used to scale general purpose web sites.

(For more detail on CodaServer's architecture, you can look at the manual's architecture page)

Having one server (or load balanced group of servers) manage all of your environments has another benefit:  Simple version control.

As you build up your CodaServer application in the DEV environment, CodaServer keeps track of all Data Definition Language commands (like CREATE TABLE) in its transaction log.  When you want to move your application to TEST, simply run the PROMOTE command.  When it's ready for PROD, PROMOTE from test.  Your application is always constant in each environment because deployment always follows the same pattern.  There cannot physically be a schema in production that was not in development first.

But wait, there's more.

CodaServer also has the concept of REF TABLES, tables that remain constant at all times between environments.  All schema changes and INSERTs, UPDATEs, and DELETEs to REF TABLES occur immediately in DEV, TEST, and PROD.  This is ideal for most types of programmer data.

Some might think it's limiting that CodaServer doesn't allow developers to create tables directly on different application instances.  This isn't really as big a problem as it seems.

CodaServer's only concern is protecting the data model of your applications.  Since the applications use standard SQL databases, you are free to add any additional tables you want for reporting, ad hoc denormalization, data transformations, etc.  The only caveat is that CodaServer based applications won't know they are there, which is perfectly fine since they don't affect the datamodel.

(It is highly recommended that you adopt a naming convention for these tables that is different than that of your CodaServer tables to prevent collisions.)

So there you have it:  A database that does the heavy lifting for you.

Til next time.

comments powered by Disqus