Archive for the ‘Ruby’ Category

Comet with Rails + Mongrel

Tuesday, May 8th, 2007

In my last post I described how to create a mongrel handler. I said you might want to do this for optimization purposes, but my own interest came about in an attempt to solve the server-push problem with Rails.

Comet is the term that seems to be catching on for server-push via XmlHttpRequest. Possible applications include chat clients or a stock ticker. Anything that wants constant updates will be both responsive and less demanding of server resources if it waits for data to be pushed to it, instead of opening a new status query every few seconds.

Since the server can’t initiate a connection to the user’s browser, the only possible solution is to have the browser hold a connection open indefinitely, waiting for an update. Since Rails is single-threaded, however, this means that one whole server instance would be tied up by this connection - clearly infeasible in almost all situations.

You might say, “Why not have another small server listening on a separate port to hold on to these push-status connections?” Good idea - except that XmlHttpRequest won’t let you connect to another port. This is because the port is considered part of the hostname, and connecting to another hostname from within the javascript sandbox would be a big security no-no. (It would be trivially easy, for example, to inject a little javascript into a site which caused all of its visitors’ browsers to start hammering another unrelated site as soon as they visited the homepage.)

Juggernaut gets around this with a little hidden Flash component. This is a nifty idea, but for me it is unappealing because Flash is not readily available for my platform (Ubuntu AMD64). More importantly, I’d prefer to avoid building technology that depends on a proprietary plugin built by a monolithic, old-fashioned (i.e., shrink wrap) software company.

So holding open connections to Rails won’t work due to its controller lock. But as was demonstrated in the previous entry, a mongrel handler won’t have that problem. I’ll extend the auction example shown there to use server-push.


require 'active_record'

class StatusHandler < Mongrel::HttpHandler
   def process(request, response)
      id = request.params['PATH_INFO'].slice(1, 20)
      current = request.params['QUERY_STRING']

      while status(id) == current do
         sleep 0.2
      end

      response.start(200) do |head, out|
         head["Content-Type"] = "text/html"
         out.write status(id)
      end
   end

   def status(id)
      connection.select_value("select status from auctions where id=#{id.to_i}")
   end

   def connection
      ActiveRecord::Base.connection
   end
end

uri "/status", :handler => StatusHandler.new, :in_front => true

This assumes your auctions table has a field named “status,” which I’m using as an integer, but any type should work. http://localhost:3000/status/1 now delivers just one value, the status. Where it gets interesting is something like http://localhost:3000/status/1?100, assuming that the status of auction id=1 is currently set to 100 in the database. Now, the connection will hang and wait for the value to change. (You’ll see the database queries in development.log, but no web hits.) Pop open a sql shell and run “update auctions set status=101″ and the connection will resolve immediately, printing out the new value.

Here’s a simple example of making an ajax call to this url from within a page:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
   <%= javascript_include_tag :defaults %>
</head>
<body>
   Status is now: <span id="status"></span>

   <script language="javascript">
      function respawn()
      {
         new Ajax.Updater('status', '/status/1?' + $('status').textContent, { onComplete: respawn });
      }

      respawn();
   </script>
</body>
</html>

Experiment with updating the status value in the sql shell and you’ll see that the page always updates instantly. To watch the connections, open the Firebug console, click Options in the upper-right, and make sure “Show XMLHttpRequests” is checked. Reload the page and you’ll now see a POST each time you update the status. There will always be an active one at the bottom, waiting, waiting for the status update.

And there you go. Server-push connections with only Rails and Mongrel.

Update: Mere minutes after I finished writing this article, I came across Shooting Star, a Rails plugin for adding Comet to your apps. So far this looks a little heavy-weight for my purposes, and somewhat platform-dependent so far - not to mention that they push the meteor metaphor a bit far in their method naming. Still, this may be a more robust solution than my little hack, so check it out. If anyone has tried Juggernaut, Shooting Star, and my hack, I’d be curious to hear a comparison.

HOWTO: Custom Mongrel Handlers

Sunday, May 6th, 2007

There doesn’t seem to be any good documentation for creating custom Mongrel handlers. The Mongrel site seems to be completely silent on the subject. I was able to extract what I needed to know from this article and Ezra Zygmuntowicz’s slides.

It’s actually quite easy to make a mongrel handler, so here’s a quick tutorial that should tell you everything you need to know.

First, why would you want to make a mongrel handler? The main reason would be speed and scalability. Mongrel is multithreaded even though Rails isn’t. Mongrel is fast, Rails is slow. Of course, Rails is Rails, and Mongrel is just a webserver; but many types of apps may have some services that are hit far more than others, and thus it would be worth writing what amounts to a stripped-down version of the controller action in order to handle just those requests.

For example, if you were writing eBay in Rails, you might want to implement the API calls which are used to check the status of an auction as a custom mongrel handler. This part of the site may be accessed with great frequency by third party apps trying to keep a current status, and chances are generating the results are pretty simple (converting a row in the database to XML and printing it out).

Here’s what that might look like:


class StatusHandler < Mongrel::HttpHandler
   def initialize
      @mutex = Mutex.new
   end

   def process(request, response)
      id = request.params['PATH_INFO'].slice(1, 20)  # trim leading slash

      response.start(200) do |head, out|
         head["Content-Type"] = "application/xml"
         out.write status(id).to_xml
      end
   end

   def status(id)
      rows = @mutex.synchronize { ActiveRecord::Base.connection.select_all("select * from auctions where id=#{id.to_i}") }
      return { 'error' => ‘No such record’ } if rows.length < 1
      return rows.first
   end
end

uri "/status", :handler => StatusHandler.new, :in_front => true

Name this status_handler.rb and drop it into the root dir of your Rails project. Instead of running script/server, execute this command:


mongrel_rails start -S status_handler.rb

Assuming you’ve set up a little database with some sample data in the table named by the handler (”auctions” in my example), accessing the url http://localhost:3000/status/1 will show the data for the record with id=1.

Now what’s so hot about this? For one, it’s fast - see Erza’s slides for benchmarks. But more importantly - to my mind - is that a long request won’t hold up any others. Try putting a “sleep 10″ as the first line of the process method. Restart your server and hit the status url again. The connection will hang temporarily, but now open another tab and hit any other page in your Rails app. Notice that it displays right away, even though the other tab is still loading.

The downside is that you don’t have Rails, and as it turns out, we like Rails. So suddenly you’re stuck doing a lot of your own dirty work. Here, for example, I load up ActiveRecord and manage the database connection and raw sql manupulation. (This whole thing could be done in one line as a Rails controller: respond_to { |f| f.xml { Auction.find(params[:id]).to_xml } }) Parsing the request string can be time-consuming so I went for simplicity - String#slice instead of a regular expression or tokenization. You also have to protect against CGI parameter attacks, which I again simplify with a to_i.

I wasn’t able to figure out how to load a custom handler from a mongrel yaml config file. It seems like the keyword should be config_script, but it doesn’t seem to produce the same result as -S. Anyone knows how to make this work, please comment.

Now that you know how to write a mongrel handler, the real fun can begin. In my next post I’ll describe how this can be used for server-push connections.

Update: Rick Olson improved the code by trimming out the superfluous establish_connection, I’ve used his version above.

Working With Rails

Tuesday, May 1st, 2007

I’ve been aware of Working With Rails for a while, and have even connected with a few developers who have done work for Bitscribe through it. Cool concept, definitely. Just recently I noticed that someone had recommended me (one of the contributors to Gyre - thanks, Michael!) so I decided to make a real profile. Interesting to note that, with one recommendation, my popularity is 89%. Guess there are a lot of empty records up there.

The profile form asks how long you’ve been working with Ruby, and Rails. For the former I looked at the datestamp of my first fooling-around Ruby script. For the latter I hit the svn log of my first Rails project. I was surprised to see that I’ve been doing Rails for exactly a year this month, with my Ruby tinkerings predating that by about six months. Woah, really? Ruby and Rails feel so comfortable now that it seems like years since I’ve used anything else. It’s gratifying that time seems to stretch out during periods of rapid change - I guess time doesn’t always fly when you’re having fun.

Rails / Ubuntu Feisty Quickstart

Saturday, April 21st, 2007

If you’ve just installed a fresh copy of Ubuntu 7.04 (Feisty Fawn), the following sequence of commands will give you everything you need to run Rails with MySQL or Postgres and Mongrel. This should be run as root (”sudo su -” will get you a root shell).

First, core packages through apt-get:


apt-get install ruby rubygems rake ruby1.8-dev irb rdoc libopenssl-ruby1.8 postgresql-8.2 libpgsql-ruby libmysql-ruby1.8 mysql-server gcc libc6-dev make subversion openssh-server

And your gems:


gem install -y rails mongrel --no-rdoc --nori

If it prompts you for which version of Mongrel (or other gems) to install, the first one on the list (type “1″ and press enter) is almost always right, unless it reads “win32″, in which case pick the first one that says “ruby”. (This silliness is definitely a major weak point of the gem package manager. I’ve created a patch that fixes this issue, which is being studiously ignored by the rubygems maintainers.)

Now, enable mod_rewrite and mod_proxy in your Apache modules (the latter is only necessary if you plan to proxy mongrel, but might as well have it):


a2enmod rewrite
a2enmod proxy
/etc/init.d/apache2 restart

For bonus points, you might want to install a few other useful developer tools:


apt-get install php5 php5-cli php5-pgsql php5-mysql vim-gtk vim-ruby

Getting Chop-Happy with Axeman

Friday, March 16th, 2007

An important but oft-overlooked principle of software design is the aggressive culling of unused features. The best software products are slim and lean, with exactly the features its users need and few that they don’t. This like weeding and pruning in your garden: without it, you’ll eventually be overrun.

The types of apps that I most commonly work on are internal applications used by perhaps a few hundred users in a single organization, or across several organizations. You’d think that with such a narrow audience, it would be easy to get information about what’s being used and what isn’t. Not as easy as it seems, though, because the users are not good at analyzing their own use. If you ask them whether a particular report is used, for example, they’ll make vauge noises like “Oh yeah I clicked on that once” or “Huh I hadn’t seen that before, but I’ll definitely use it now that I know about it.” In most cases this stuff is not true, but it’s very hard to tell.

Historically my approach to this has been to cut a feature I think is unused and wait for someone to complain. This works well enough because my intuition is right 90% of the time; but the 10% it is wrong, I can end up with cranky users. (In doing experimental cutting like this, I usually remove the link a few days prior to deleting the connected code. That makes it a cinch to put it back in when necessary.)

A burst of inspiration hit me the other evening. We don’t need to ask the users: the application should be able to track this! To this end, I’ve created Axeman. This is a tiny Rails plugin that tracks usage in a SQLite database, and displays a simple report with the results constrained by time. Screenshot:

In the left column is a traffic report, comparable to a web log analyzer like the venerable AWStats. The Axeman report is way simpler and doesn’t have any fancy graphs, though, so this isn’t too exciting. Besides, you can install AWStats or whatever to get this info about your Rails app. Where it gets more interesting is the right column.

Here we see controller actions that have not been accessed during the selected time period. These are determined by analyzing the source of your app/controllers directory, and cross-referencing it against the usage data.

As an aside, I think it’s interesting that this is only possible because of the structured and convention-based nature of modern application frameworks. Axeman is a very simple example, but I am hoping that as time progresses, we will see more self-aware / self-introspecting application components.

What does it mean when actions appear in the righthand column? Let’s look at the example shown in the screenshot. This is a tiny app and I didn’t expect there to be many dark corners, but as it turns out there’s quite a bit of dead code. First, there’s a bunch of account signup stuff which is unused - this was created by the generator for the login engine. It’s not used, so axe it.

Next, we see that categories and authors both have index actions which are never accessed. Looking at the code I see that these are just redirects to the list action. However, the list action seems to be linked directly, since that one appears in the lefthand column. No need for them then: the axe claims two more victims.

Books has a few unused actions. sort_order is a vestigial remnant of an ajax feature which is no longer used; it goes under the axe. destroy is a working method, but not linked anywhere; most likely created as part of CRUD, but then whoever did the UI didn’t feel that it was needed. We could link it, I suppose, but why bother? If the app has gone this long without anyone complaining that they can’t delete books, then there seems little need to maintain the code that implements the feature. Chop, chop. Last, it seems that there is some confusion about new, edit, and new_and_edit methods on the Books controller. Looking at the code I see that new_and_edit is called by both new and edit, but is never accessed directly by the user’s browser. Therefore it should be made private (Axeman ignores private and protected controller methods). With all of these changes, the Books controller is quite a bit cleaner.

Also on the executioner’s block should be methods with low hit counts, that is, ones that appear at the very bottom of the lefthand column. This requires more knowledge of the user story for each page than does completely unaccessed pages. For example, you could have a page which displays some tax information which is only accessed once a year by one person in the organization. Therefore a low hit count should be expected, and the page should not face the axe. But most other kinds of pages should probably be removed if they haven’t been accessed frequently. The default time period is 3 months, which I think is about the timespan in which you’d expect something to be accessed at least a few dozen times. If it’s only got one or two clicks, chances are good this was just someone who hit the wrong link, or perhaps was just curious. Truly useful pages will have hundreds or thousands of views, depending on the size of your user base.

What about the idea that a page does offer useful features, but people don’t know about it? If you think that this is why it’s unused, then you need to find a better place to link it, or a better way to educate your users. The bottom line is that it doesn’t matter how theoretically useful a page is: if no one is using it, then it is not actually useful.

And keep in mind that this tool (and in fact, the entire concept of aggressive feature culling) is most effective not as a one-time event, but as a habit over time. A page which might have been extremely popular last year could fall into disuse when another page is added which provides similar but slightly improved functionality.

This plugin was the result of just an hour or two of hacking, but I’ve already been surprised at how useful it has been in my production apps. New ideas are suggesting themselves as I use it, including watching for unused partials, showing changes over time in a visual fashion, or even trying to look for unused model methods. For this last item, it’s been my experience that over time, model start to bristle with methods, many of which are remnants of historic functionality and no longer used, though this will be by far the hardest one to implement.

Another feature that I’ll try to add soon is a logfile anaylzer which scans production.log in a manner similar to how AWStats processes Apache’s access.log. This will allow the importation of historic data, and will also make Axeman more suitable for use on high-traffic, public-facing sites, where hitting an external SQLite database on each pageview may not be acceptable.

A subtle but powerful point that is driven home by the usefulness of this plugin is just how much design is an evolutionary process, not a one-time occurrence. Of course I know this, as do most of us, but I’m finding that Axeman makes it tangible. Here’s a piece of code which exists for no other reason than to help the application’s design change over time. The only other component I can think of that really acknowledges this is migrations, but these are more at the underlying technical level, rather than at the level of user-facing features.

Gyre, the Open Source, Web-Based Debugger for Rails

Friday, February 2nd, 2007

Ruby people don’t need a debugger, because our language is so incredibly elegant that we nevercreate any bugs, right?

Er, maybe not. It does seem like my code tends to work right the first time I run it a lot more with Rails, but this only causes me to tackle yet more complex problems. The tools offered by a traditional IDE environment, though such things are generally scorned by Rails types (and this is a vim user talking here), can be quite effective in certain situations.

Thus I’ve created Gyre, a visual debugger for Rails that runs in a web browser. It’s still very raw - in fine open source tradition, I’m releasing early, releasing often. But it does work, and I’ve already found it to be of use in tracking down gnarly issues in some of my own apps.

Databases Are Less Important Now

Wednesday, December 27th, 2006

The database has been a centerpiece of my work in application development for the better part of a decade. But now, Rails and other comparable frameworks are reshaping the way that I and others look at databases. A lot of this is stepping down the importance of the database’s role, which can take some getting used to.

The database’s role is now limited to:

  • Storage
  • Indexing - i.e. fast lookups
  • Concurrency - i.e. multiuser access, locking, transactions

There are many things that I’ve used the database for in the past that are now considered bad form:
(more…)

Rails Change Logger

Saturday, December 9th, 2006

Audit trails are extraordinarily useful in enterprise applications. When you’ve got dozens or hundreds of users doing CRUD operations on a large database full of data, sooner or later someone is going to come across a record that has been changed, and want to know: who did this, and when? (And maybe: why? - but they can find that out if they know the first two items, namely by going and asking the person who made the change.)

I recently put together a small module for storing change logs for one of our larger Rails apps. Here’s the code:

lib/change_logger.rb


module ChangeLogger
   def self.included(other_mod)
      other_mod.module_eval do
         has_many :change_logs, :as => :record, :order => 'created_at'

         def log_change(verb)
            log = ChangeLog.new
            log.record = self
            log.verb = verb
            log.user = ChangeLogger.user
            log.save!
         end

         after_create :log_create
         def log_create
            log_change('created')
         end

         after_update :log_update
         def log_update
            log_change('updated')
         end
      end
   end

   def ChangeLogger.user=(user)
      $changelog_user = user
   end
   def ChangeLogger.user
      $changelog_user
   end
end


(more…)

Zero, Blank, and Nil

Tuesday, November 14th, 2006

Back in the bad old days of web development, I used JSP to create page content. Java is a terribly language for doing front-end web work, because it’s bad at processing text, and which usually comprises the majority of what web apps do. (And that’s also why Perl was the reigning king in this realm for so long.)

One of the things that drove me absolutely batty about JSP was the need to check every string for both null and blank, which meant your code was littered with stuff like:


if (request.get("quantity") != null && request.get("quantity") != "")
{
   quantity = (int)request.get("quantity")
   doSomething(quantity);
}

Yikes. When I moved on to PHP, I was immediately struck by the improved readability of the same code:


if (strlen($quantity) > 0)
   doSomething($quantity);

An empty string and a null will both give you zero on a string length. (You can always use === or isset() if you want to check for null explicitly.) A similar guard clause if often used for integers and arrays, too. In Java you have to check both whether a value is null, and whether it is zero or empty, respectively. In PHP you can just do:

if ($quantity != 0)

and

if (!empty($array))

Now that I’m living in Ruby land, I’ve found that my code is generally much prettier than PHP for most things… except this one. Ruby is a bit more strongly typed that PHP, so I’m back to the JSP syntax:

if params[:value] != nil and !params[:value].empty?

Hardly the succinct readability one has come to expect.

Ruby’s base classes provide some nice convenience functions to make your code more readable: Array.empty?, String.blank?, and Fixnum.zero?. The solution here seems obvious: make all of these methods exist on NilClass.

Thanks to Ruby’s metaprogramming, it’s easy to add this to your own application:


class NilClass
   def empty?; true; end
   def blank?; true; end
   def zero?; true; end
end

Great! Now we can just do:

if params[:value].empty?

Furthermore, I would tend to think that all of these should be collapsed together into a single method. I think empty? is the most applicable. So empty? means a blank string, a zero, or an empty array or hash. This reflects what the code really wants to know: is there anything in this variable? Should I bother doing anything with it? 95% of the time, the difference between nil and non-nil but empty is irrelevant.

Update: aidanf suggests using .blank? instead, which I like a bit better than the names suggested above.

Update 2: A coworker informed me that Rails extends most of the basic Ruby types with blank?. So you can do [].blank?, {}.blank?, nil.blank?, 5.blank?, etc. Apparently this is pretty unknown, because not one person mentioned it in the many dozens of comments on this post appearing on various news aggregators. Note that 0.blank? does return false, but in most cases I’m more interesting in strings anyway.

Massaging Data

Tuesday, August 22nd, 2006


Update: It seems that this behavior is no longer exhibited in the current version of Rails (1.1.6), as each test is wrapped in a transaction. So this entire post is pretty much moot now.

Rails’ tests (unit, functional, and integration) use a database with read-only values. Anything which modifies the database is not saved. This allows the tests to run in isolation from each other, which is a Good Thing.

In the real world, however, I often find myself needing to write methods which go and massage a bunch of data in the database. A simple example would be a nightly cron job which creates invoices for accounts whose billing cycle are due.

In Rails this is done by creating a static method like Account.create_invoices, and invoking it from the crontab via “script/runner -e production Account.create_invoices”. Since it is a method, it can be tested in a unit or integration test. (I think the latter is more appropriate, since a high-level method like this often touches a number of models.)

But this method’s main “output” is not a return value, but rather an adjustment (or lots of them) to the database. For example, I may want to go find every account with an open billing item, create an invoice in the invoices table, and then mark the item closed. What I really want to test when this method is done is that there are some number of new invoices in the invoices table, that the billing items have been marked closed, and that the account has no open billing items. But you can’t check most of this output, because it exists as changes to the database, not direct return values.

My impression is that the Rails way views this as a Bad Thing. Every method should return its results, rather than going in and massaging a bunch of data in the database. This makes it more orthogonal and easier to test.

I agree with this philosophy in theory, I’m not sure that this is realistic for real-world applications. I’m like to think that that’s because I’m still trapped in the SQL paradigm, so maybe someone can enlighten me on the pure Rails way to do this.

Here’s a concrete example:

class Account < ActiveRecord::Base
        has_many :invoices
        has_many :billable_items

        def open_billable_items
                items = []
                billable_items.each do |item|
                        items << item if item.open?
                end
                return items
        end

        def create_invoice
                open_billable_items.each do |item|
                        item.close   # creates the invoice and marks the item closed in the db
                end
        end

        def self.create_all_invoices
                Account.find(:all).each do |account|
                        account.create_invoice
                end
        end
end

class BillingTest < ActionController::IntegrationTest
        fixtures :accounts

        def test_create_all_invoices
                assert_equal 5, accounts(:first).open_billable_items.length
                assert_equal 0, Invoice.count

                Account.create_all_invoices

                assert_equal 0, accounts(:first).open_billable_items.length
                assert_equal 5, Invoice.count
        end
end

Maybe the right thing to do here is not to test at such a high level, but instead test only BillableItem.close by having it return the created invoice. This bothers me though. It needs to work at the high level, so why can’t I test that?

And this is a very simple example. In reality, the nightly cron job may be touching dozens of tables and thousands or even millions of rows. Returning all affected rows as a result doesn’t make much sense, and may be completely impossible due to memory limitations. (The whole reason we use a database is so that we can operate on large sets of data without having to instantiate every record at once!)