Archive for the ‘SQL’ Category

Why Do We Still Use Relational Databases in an OO Era?

Tuesday, January 9th, 2007

Object-Relational Mappers (ORMs) have been a hot topic for, oh I don’t know, a decade now. Mapping an object-oriented data set onto a flat relational schema is unexpectedly difficult. Inheritence, polymorphism, and even simple deviations in type handling between SQL and your language of choice make ORMs a virtual minefield. Some have even said that ORMs are the Vietnam of computer science.

Really, though, it seems rather silly when you think about it. Why go to so very much trouble to map the data from an OO format and back again? Why not just keep the objects in object format all the time?

It’s been done. But it doesn’t work terribly well, and certainly doesn’t have anywhere near the robustness necessary for large and long-lifetime applications. Why not?

As it turns out, the limitations of a relational database are the very thing that make it so robust over the long term. There’s benefit to be gained by forcing the data to be turned into a simple, transparent, structured format every time you save it. Or indeed, to force an explicit saving step at all, instead of just persisting objects flagged to do so.

ZODB and its ilk are hauntingly similar to a core dump: easy to save and load, but fragile over any significant timespan. There’s no enforced strictness to data cleanliness when you can just dump everything out effortlessly.

Another case of embracing constraints? Maybe. I’d like to think that someday this longstanding dilemma will be resolved. Until then, thinking about ORMs this way will make me feel better about using them.

Databases Are Less Important Now

Wednesday, December 27th, 2006

The database has been a centerpiece of my work in application development for the better part of a decade. But now, Rails and other comparable frameworks are reshaping the way that I and others look at databases. A lot of this is stepping down the importance of the database’s role, which can take some getting used to.

The database’s role is now limited to:

  • Storage
  • Indexing - i.e. fast lookups
  • Concurrency - i.e. multiuser access, locking, transactions

There are many things that I’ve used the database for in the past that are now considered bad form:
(more…)

Massaging Data

Tuesday, August 22nd, 2006


Update: It seems that this behavior is no longer exhibited in the current version of Rails (1.1.6), as each test is wrapped in a transaction. So this entire post is pretty much moot now.

Rails’ tests (unit, functional, and integration) use a database with read-only values. Anything which modifies the database is not saved. This allows the tests to run in isolation from each other, which is a Good Thing.

In the real world, however, I often find myself needing to write methods which go and massage a bunch of data in the database. A simple example would be a nightly cron job which creates invoices for accounts whose billing cycle are due.

In Rails this is done by creating a static method like Account.create_invoices, and invoking it from the crontab via “script/runner -e production Account.create_invoices”. Since it is a method, it can be tested in a unit or integration test. (I think the latter is more appropriate, since a high-level method like this often touches a number of models.)

But this method’s main “output” is not a return value, but rather an adjustment (or lots of them) to the database. For example, I may want to go find every account with an open billing item, create an invoice in the invoices table, and then mark the item closed. What I really want to test when this method is done is that there are some number of new invoices in the invoices table, that the billing items have been marked closed, and that the account has no open billing items. But you can’t check most of this output, because it exists as changes to the database, not direct return values.

My impression is that the Rails way views this as a Bad Thing. Every method should return its results, rather than going in and massaging a bunch of data in the database. This makes it more orthogonal and easier to test.

I agree with this philosophy in theory, I’m not sure that this is realistic for real-world applications. I’m like to think that that’s because I’m still trapped in the SQL paradigm, so maybe someone can enlighten me on the pure Rails way to do this.

Here’s a concrete example:

class Account < ActiveRecord::Base
        has_many :invoices
        has_many :billable_items

        def open_billable_items
                items = []
                billable_items.each do |item|
                        items << item if item.open?
                end
                return items
        end

        def create_invoice
                open_billable_items.each do |item|
                        item.close   # creates the invoice and marks the item closed in the db
                end
        end

        def self.create_all_invoices
                Account.find(:all).each do |account|
                        account.create_invoice
                end
        end
end

class BillingTest < ActionController::IntegrationTest
        fixtures :accounts

        def test_create_all_invoices
                assert_equal 5, accounts(:first).open_billable_items.length
                assert_equal 0, Invoice.count

                Account.create_all_invoices

                assert_equal 0, accounts(:first).open_billable_items.length
                assert_equal 5, Invoice.count
        end
end

Maybe the right thing to do here is not to test at such a high level, but instead test only BillableItem.close by having it return the created invoice. This bothers me though. It needs to work at the high level, so why can’t I test that?

And this is a very simple example. In reality, the nightly cron job may be touching dozens of tables and thousands or even millions of rows. Returning all affected rows as a result doesn’t make much sense, and may be completely impossible due to memory limitations. (The whole reason we use a database is so that we can operate on large sets of data without having to instantiate every record at once!)