Databases Are Less Important Now
The database has been a centerpiece of my work in application development for the better part of a decade. But now, Rails and other comparable frameworks are reshaping the way that I and others look at databases. A lot of this is stepping down the importance of the database’s role, which can take some getting used to.
The database’s role is now limited to:
- Storage
- Indexing - i.e. fast lookups
- Concurrency - i.e. multiuser access, locking, transactions
There are many things that I’ve used the database for in the past that are now considered bad form:
- Multiapp / multilanguage access - In enterprise settings, it’s a common reality that you have different applications which may have been written in different eras, by different people, and in different languages. They may be running on different operating systems and in different physical locations. The database became a central communications hub by which these applications could all interact and share data, since every language worth its salt can speak SQL.
In Rails, the database has been relegated to just a persistence engine; your data models are really not complete without the actual model code. Sure, you can tap into the database directly, but it’s not recommended; think of it as bypassing your OS’s filesystem code and reading or writing bytes directly to sectors on your hard drive. It works, sure, but over the long run it’s a fragile solution.
The modern solution is to share data through well-defined APIs, RESTful whenever possible, XMLRPC when necessary. This is a better solution over the long run. Schema changes are less likely to require changes to existing applications and data is less likely to be munged by an outside application, to name two benefits.
- Permissions - Database permissions are clumsy, as anyone who has ever tried to create a new MySQL user can well attest. One popular permissions technique is to hide account information or passwords in a table while still allowing reads on the remaining fields. You do this by creating a view which removes or obfuscates the sensitive data, for example:
CREATE VIEW v_orders AS SELECT name,address,phone, RIGHT(cc, 4) as cc_last_4_digits FROM orders;Any users which should not have access to the full credit card number are given read access to v_orders, but not to the underlying orders table.
This works reasonably well, but permission management is non-transparent and easy to mess up when migrating the database (something that will typically happen many times over the course of a long-lived application). It also puts you in the position of doing string manipulation in SQL, which is far inferior to doing it in your app’s primary language.
This point is solved by the same technique as the previous one: abstract access to your data through an API, and then you have nearly infinite flexibility in access control.
- Data transforms - A simple data transform (hiding part of the credit card number) is shown above. I’ve worked on apps that do some fairly complex calculations on the data in SQL. A simple example:
SELECT *,(sales_tax_rate * total) as sales_tax, (SELECT max(transaction_date) FROM transactions WHERE order_id=o.id) as last_transaction_date FROM orders o ORDER BY last_transaction_date LIMIT 20;This offers a number of benefits, such as consistent access across multiple applications. One of the most important benefits is the ability to use the database’s optimization capabilities, such as the example above where we can pluck out the top twenty orders from a list of millions in the blink of an eye.
This technique has its place, but its use should be limited. Sales tax calculation should be a method on the model, not in the database. This allows your calculation to grow arbitrarily complex. You’ll bump your head against the limitations of SQL if your calculation goes beyond a few simple operations. Sales tax calculation can potentially be pages of code if you want to handle all the variations of sales and use tax in all fifty US states, many of which have varying tax rates for different types of goods.
In a few cases it might make sense to cache a field which can be calculated from others in the database. Generally one wants to avoid this, but if you need to sort of list of millions of orders by the sales tax amount, this is often a good way to go.
- Views - Views are one of those things that I fell in love with as soon as I discovered them. But the truth is that they are really just a way to hide nasty SQL tricks (like some of those shown above) behind a pretty face. The nastiness is still there, and whenever you want to work with it you’ve got trouble. Plus they have to be destroyed and remade every time you update the schema, which is a lot more work than you might think. (Bitscribe’s internal app framework, which we used before getting on the Rails bandwagon, had a whole portion of its migrations system devoted to managing views.)
I also have historically used views to wrap up the sort of large hierarchy joins that are so common in business apps, e.g.
CREATE VIEW v_tasks AS SELECT * FROM companies JOIN departments USING (company_id) JOIN projects USING (department_id) JOIN milestones USING (milestone_id) JOIN tasks USING (milestone_id) ORDER BY company,department,project,milestone,task;Here’s what I’ve been using as the equivalent in Rails:
class Task def self.find_deep(options={}) options2 = { :include => { :milestone => { :project => { :department => :company } } }, :order => 'company,department,project,milestone,task' } Task.find(:all, options.merge(options2)) end end class Company def tasks Task.find_deep(:conditions => [ 'companies.id=?', id ]) end end - Validation - Data types, field sizes and precisions, constraints, triggers to check data integrity, and of course foreign keys are all techniques used to validate data correctness at the time of an INSERT or UPDATE. But once again, SQL is limited in many ways. It think the biggest problem here - at least with Postgres, the database I have used the most on enterprisey apps - is total lack of agility. When you use CREATE FUNCTION to make a function, it’s hard to see your code again, hard to edit it (really you just have to retype it), and next to impossible to debug. Even something as simple as getting a list of all defined functions or triggers is a huge hassle.
The modern solution is to validate on the model object. Rails makes this easy, and so do most other good frameworks. Now realize that old-school database types like myself cringe a little at this thought, because we think “how can we trust the code to always validate correctly?” Database constraints, foreign keys, and the like may be unwieldy to work with, but once you’ve got them working you know you can count on them. We’ve got years of accumulated trust in our database, and at the same time mistrust code (whose frequent changes and much higher level of complexity makes bugs far more likely). So what needs to happen is that we learn to trust the Rails validations, before/after save hooks, and so forth just as well. That will come with time, as they prove their reliability.
So here the modern solution is just about 10 gazillion times better than the SQL-based one. Besides being vastly more flexible and easy to use, validations offer a much better use experience since we can catch errors and present them in a friendly matter.
Foreign keys a bit more of a gray area. For starters, foreign keys are not particularly clumsy to deal with, unlike most of the other database validation techniques. But more importantly, it seems as if they belong in the database. A constraint which checks to see if a numeric field is non-negative or within a certain range is really application layer logic, and it seems right to check it in the application code. But foreign keys protect the integrity of the data’s relations to itself. Regardless of what your business logic is, having a dangling pointer (e.g. one table has an order_id=3 but there’s no order with id=3 in the orders table) is always invalid.
Migrations don’t directly support foreign keys, so many Rails apps are being built without them. And since the data is mostly managed by the relations defined on the model, this is probably more or less ok, at least on a small scale. I think any largeish app is still going to want to take the time to define foreign keys, as an extra layer of protection against data corruption.
But then, this exposes another place where SQL is simply not in touch with the times: polymorphism. The difficulty of mapping object-oriented models onto non-object-oriented SQL schema is a huge topic of its own that I won’t spend much time on here. But polymorphic table relations has been one of the biggest stumbling blocks of building complex SQL-backed applications, at least in my experience. Rails has a simple hack that works around this very nicely: track the relation via association_id and association_type. (I’ve done this in traditional SQL apps with association_id and association_table.) But this won’t work with foreign keys at all. So if you can get by without foreign keys on some of your relations, and that doesn’t cause a problem, what’s the big deal with leaving them off all your relations?
I’m still on the fence about Rails + foreign keys, but I’ll be keeping my eye on the matter. So far all I’ve witnessed from the lack of foreign keys are to be orphaned records when :dependent => true was forgotten on the relationship definition.
The guiding principle of modern agile development - and the (dare I say it) Web 2.0 phenomenon - is simple, simple, simple. Doing more with less. Complex problems, straightforward solutions. Streamlining the role of the database fits nicely with this theme. Coding in SQL sucks, but you shouldn’t be coding in SQL because that’s not what a database is good at. Databases are concerned only with storage and lookup, and when focused on that role, they can do it extraordinarily well.
December 27th, 2006 at 8:20 am
You need to take a look outside the Rails box, mate!
Web services are a nice abstraction if Bob’s team needs to access Alice’s app data, and there’s no common business supervision for both. But if both Alice’s and Bob’s teams are working for the same guys, why the extra insulation?
I already know my RDMS’ SQL dialect, or I can learn it. And databases are part of the infrastructure for a longer time than languages, as I see it.
I keep my function’s definition in a text file anyway if I use Rails. I roll my own migration support for each CREATE FUNCTION file (or VIEW, etc.), as I don’t have a framework for it. It’s basically a folder with numbered text files and some obvious shell commands.
December 27th, 2006 at 12:42 pm
i think youll find the idea that “databases arent important” and that they are just “simple object stores” is limited to the Ruby on Rails community. in the rest of the world, databases, SQL, and relational concepts are as important as ever. while some of the things you mention as being “bad form”, namely fine-grained permissions and validation, are the kinds of things that often work better in the application space rather than the database space, other concepts such as multi-app access are absolutely critical in the real world (maybe not for outside access, but within the organization the need to script against the tables/have multiple applications access is very common), views are incredibly useful (for abstracting away complex/legacy/highly tuned queries/often changing schemas), as well as what you call “transforms” which are lifesavers. designing your database around a particular tool that decides that those things are “bad” or “you shouldnt need them” just chops yourself off at the knees.
plus, you dont mention at all the supremely important relational concepts of joins and selection (i.e. building selects from selects) which are firmly in the SQL domain. Just because ActiveRecord takes the opinion that these things (such as composite primary keys) arent important doesnt make SQL irrelevant; ActiveRecord as part of Rails seeks to be “opinionated”, and that is just the “opinion” of one development platform. In reality it just prevents ActiveRecord from being taken seriously in a wider variety of organizations.
March 16th, 2007 at 8:34 am
I think the other commenters may be missing the point.
The opinion being expressed in using the database less is this: ‘the database and SQL isn’t such a great way to integrate between applications’. Neither is the filesystem. Or to take it further, neither is shared state, as application software developers are experiencing in moving with a more concurrent and parallel world.
To address a point Luis made: the reason to have the ‘extra insulation’ provided by a web application is because the constraints and invariants in the domain model are easier to enforce than in a database application. Maybe the storage layer level is just too low to express the same domain models that some are use DSLs to express at the application level.
This also may explain why MySQL is good enough (”worse is better”) for these applications, even though it is not as capable as other contemporary RDBMSs.
March 17th, 2007 at 4:08 am
I’ve (mis)interpreted Vincent’s comment as moving the “intelligence” out of the database and into a web app. Y’know, I can’t really disagree with this: I personally prefer to point several apps at the database and let her do the constraining — it ensures the “OS” can’t be bypassed, I guess.
But I can’t fault using the web app as the contraint enforcer; for me, it’s a matter of identifying the loophole and closing it.
Your thoughts?