Archive for the ‘Code’ Category

Go Ahead, Do The Big Rewrite

Thursday, January 11th, 2007

Many experienced developers caution against the Big Rewrite. Perhaps the most famous of these is Joel’s adamant proclamation that the Mozilla team’s decision to rework Netscape from the ground up was “the single worst mistake” that a software team could make.

Although I agree with Chad’s arguments and to a lesser degree with Joel’s, I can’t help but to notice: what Joel called a huge mistake turned into Firefox, which is the best thing that ever happened to the web, maybe even the best thing that’s ever happened to software in general. Some “mistake.”

Ok, ok. So most of the time the big rewrite is a bad idea. Mozilla got lucky, or tried really hard, or it doesn’t count if you’re an open source project, or something.

Except that I’ve done several big rewrites of commercial applications in my software career, and every time it has gone very smoothly. In all cases the end users have been pleased, and I and the other developers find ourselves saying “We should have done this ages ago!”

In fact, Bitscribe is currently finishing up one of these right now. The application being rewritten is one of those sprawling enterprise apps with countless nooks and crannies of functionality. Written in a blend of PHP, Perl, PL/PGSQL stored procedures, and C++ (GTK 1.0, baby!), it dates back to right around the turn of the milenium. So not utterly ancient, but plenty has changed in the software world since then. The rewritten application is 100% Rails.

Everyone involved with the project just couldn’t be more pleased, from what I’ve seen. It’s been a lot of hard work, sure, but the benefits are massive. Developers are energized and excited to be working with the latest tools, and throwing out years of accumulated developer debt. (The legacy app had been maintained by a variety of different companies over the years, so you can imagine what the code looked like.) The client’s staff is loving the shiny ajaxified interfaces. And the client’s management (you know, the people who sign the checks) are seeing a whole new world of software unfold before them, and seem very aware of the efficiency improvements that will come with it.

There was a benefit that we didn’t anticipate, and it’s turned out to be one of the most useful. In the process of rewriting we’re not just porting code; we’re rethinking the design. The technical design of course, but also application design. It turns out that the wisdom the clients had gained from years of production use of the legacy application pointed to many insights on a streamlined design. We’re able to provide the majority of the functionality of the original app with a fraction of the complexity.

And not just code complexity, but user interface complexity. When all is said and done, the client’s training costs for new staff will be greatly reduced; there’s much less likelihood of entry error; and the application matches their business process more closely. Et cetera.

So I’ve never had a rewrite go badly. But maybe that’s because I’m inherently cautious and skeptical about rewrites. I lean toward waiting a bit too long, rather than doing it a bit too soon. So then by the time we go to do it, it’s really overdue.

Raganwald says “And you’ll need to be 100% sure your team has the horsepower to get the job done and is going to use a process that can handle the load.” Maybe that’s why it’s gone well for us; Bitscribe has massive developer horsepower, and massive focus on process. And even for us, there’s been lots of strain on the team doing the rewrite.

I’ll admit that there’s a unique kind of pressure that you get in this situation. It’s not like the ordinary pressure to meet software deadlines. I think it’s that sense that you can’t turn back. You don’t want to fix bugs or add features to the legacy code, but the client needs those changes so that they can do business. So then you have to press on ahead and try to get the new version workable as quickly as you can. And as Chad points out in his essay, there are seemingly endless little bits of functionality in the original app that need to be provided in the new system. It’s not a kind of pressure I’d be willing to bear most of the time, but it can be worth it when an app is really due for a bottom-up overhaul.

And let’s face it, you’ve gotta rewrite sometime - no software system can run forever, and the cost of running even a marginally outdated app can be tremendous. I guess the trick is just picking the right time to do it - and then having a healthy fear of just how big a job it really is. The moment you think it’s going to be easy, that’s the moment you’re digging your own grave. Be scared of what you’re facing and you may just have a chance.

Cross Pollination

Tuesday, September 19th, 2006

I get bored easily. As such, I always want to move around to different projects to keep myself challenged, break up the routine, and get fresh perspectives. Luckily my current line of work, consulting, gives plenty of opportunity for this sort of intellectual wanderlust.

One surprising side-effect of my thirst for new projects is that it seems to result in higher quality output, especially in terms of code maintainability. I’m going to call this effect cross pollination. By moving developers around within the team to work on different components, no area of the code becomes the exclusive domain of just one person. This in turn results in healthier code, because different people are examining and trying to understand any given module. They ask questions like “Why does this bit of code work this way?” - a question very rarely asked of code one is familiar with.

If something is hard to understand or change right now, it’s probably going to be fragile over the long run. (It’s like the old technique for programmer job security: “If it was hard to write, it should be hard to understand and even harder to modify.”)

This article advocates a common approach to this problem: get programmers to train other programmers on the quirks of the codebase. I see this is little more than a bandaid. The real solution here is not to let software turn into quirky, only-one-or-two-people-know-how-it-works code. Cross-pollination of developers ensures that people are always writing code that is understandable to any other developer. The important thing is not having a certain number of developers that know the code, it’s making sure that you’re getting new eyes on the code from time to time.

A truly well-written piece of software can be easily understood and modified by any component developer familiar with the underlying tools. They need not “know” the code. Reading it - and especially reviewing the unit tests - will give them all the information needed to fix bugs or add features.

The Future of Source Navigation

Thursday, September 7th, 2006

I’ve been using TextMate on my Mac for a while now, trying to train myself back into using (gasp!) the arrow keys for navigating a text document. After a decade of using the mighty vim to edit source, emails, and all other manner of text, it’s not easy to make the transition, even though TextMate is very easy to use and learn.

Ultimately TextMate is not as powerful for editing text as vim - nothing is, except probably emacs. But vim is a very focused piece of software (in the fine UNIX tradition of small, sharp tools). It absolutely cannot be beat for actually editing text, especially code. But for anything beyond that scope, forget it. For example, something as simple as search-and-replace through multiple files is not included. Programmers used to development in IDEs would probably be shocked at this apparent omission.

So TextMate can’t compete on editing text, but it makes up the difference elsewhere, by providing a broader set of tools. The most immediately useful feature I’ve discovered is Apple-T, which allows you to open a file anywhere in your project tree by typing an abbreviation, learned by the editor based on your historic habits. For projects with many files and especially a multi-level directory hierarchy, this is a much faster method of getting to individual files, especially those you access frequently, than traditional approaches.

But using it has left me feeling that this is only the tip of the iceberg: editors could go much further with the same idea. For example, in the Rails directory layout, you have many files named index.rhtml. Each is in a subdirectory named for its component, e.g. views/account/index.rhtml or views/product/index.rhtml. But TextMate searches on the filename only, not the directory, so you’re reduced to using the arrow keys to scroll through the list of index.rhtml entries.

Why go by filename at all? Who really cares what source file our code resides in? Instead we should be thinking in class / object space. I want to go to Account::sendOverdueNotice(). Figuring out that that resides in classes/account.php, opening that file, and then searching for or scrolling to the method definition is a very indirect method of navigation. Extending the TextMate approach, this would work something like hitting Apple-T and then typing “acct:sendover”.

The astute reader might point out that ctags, a fairly standard UNIX tool, has offered something like this for decades. With a modern version like Exuberant Ctags (on Ubuntu: apt-get install exuberant-ctags), type “ctags `find . -name *.rb`” in the top level of your Rails project. Then you can use a command like “vi -t send_overdue_notice” at the shell prompt, or “:tag send_overdue_notice” from within vim to jump straight to that method. (The latter offers tab completion, which is very nice, though not quite as nifty as the TextMate abbreviation thing.)

Ctags covers about 75% of what I’m suggesting. But it hasn’t changed much in recent years, and that last 25% is pretty crucial for this to really be a pleasure to use. In some ways ctags is worse for the wear today, because (as near as I can tell) it doesn’t support any object space mapping. If you have two identically named methods on different classes you can’t access more than one. You also have to regenerate the file periodically by hand… which could be automated with a cron job, but long story short is that I’ve never found myself using ctags long-term, despite the incredible potential it promises.

Take my idea above and put it in a nice friendly editor like TextMate, along with the niceties that you’d expect from such an app, like automatically generating the index, mapping the class space as well as method names, and autolearning abbreviations. Then allow the user to use this same interface to create a new method. i.e. if you try to jump to one that doesn’t exist, it starts you on a blank slate in the right place.

With an interface like this, you won’t even care how your files are stored. They could be all in one giant directory, or spread out across a tree like Rails, or whatever. The editor would decide that according to some sort of basic criteria for the language, defined by some sort of internal scripting, similar to how syntax highlighting is handled.

Furthermore you won’t care about the layout within the file. Historically I’ve spent a lot of time worrying about how the methods should be ordered. This is not something that can be solved satisfactorily for any decently large object. Method interactions are in a web, two-dimensional at least, perhaps more. Files are one-dimensional, so trying to group like methods in a logical order is a pointless and frustrating exercise. Eventually I gave up on worrying about this much, but I wonder how many other coders waste brain cycles on this without even realizing that they are doing so.

What’s more, your editor should behave sort of like a web browser: every method call is a hyperlink (activated by the keyboard, of course) to its class definition. I realize that many IDEs have something like this already, but it never feels like a first-class citizen in the navigation tool family, more of a tack-on feature. Done right, and in concert with the class/method name abbreviation jump, I think that developers could be way more efficient in our day-to-day work of navigating source code.

Tabs Have All The Advantages

Wednesday, August 9th, 2006

Tabs vs. spaces is one of those arguments that has raged across software codebases since time immemorial. I am always astonished that this simple matter has not been cleared up yet. Let me make it easy for everyone and just provide the answer: tabs have every single advantage. There is every reason to use them, and no reason not to.

Tabs allow every programmer to configure their indents the way that they want. Just because you like 2-character indents doesn’t mean that I do. Why force me to use your indentation size? Let me make a comparison to throw further light on the issue.

Sometimes fledling web designers, realizing that HTML displays a little differently for everyone depending on their browser and screen settings, decide that they must have pixel-perfect control over what every user sees. So they make their HTML and then dump it to a giant JPG to display as the website. This is dumb, of course, but the argument is that they want it to look the same for everyone. Using spaces instead of tabs is the same sort of mentality: that everyone else MUST have the same size indents as you. The newbie web designer can be forgiven. Experienced programmers, on the other hand, should know better.

What it boils down to is that the tab character is a better representation of that actual data. A line indent is a different operation from a space. We have a standard character to represent this difference. They even map one-to-one with the keys on the keyboard: to indent code we press tab, to break apart tokens on the line we press the spacebar. It seems so utterly and completely obvious that we should have these map to a tab character and a space character, respectively, that I am astonished anyone ever thought otherwise.

The only reasonable argument I’ve heard for spaces is that some editors don’t handle tabs correctly. Please tell me, what editor that is good enough for programming can’t handle a simple tab? Every modern programmer’s editor I’ve encountered (i.e. vim, emacs, TextMate…) handles them correctly. Why are we making allowances for crappy editors that can’t do something as simple as process a tab character? If we thought that way, then we’d all still be writing table-based HTML so that we could fully support Netscape 4.0.

It should be noted that tabs, when used correctly, are only to be used for code indents. A tab is not a shortcut for pressing the spacebar a few times. Let me illustrate a common tab-use mistake, filling in the → character to indicate a tab character and a . for space:


→if (($this->some_value && $this->some_other_value) ||
→→→$this->third_value)
→→do_stuff();

The correct use of tabs in this case is:


→if (($this->some_value && $this->some_other_value) ||
→....$this->third_value)
→→do_stuff();

See here for a more detailed description of this technique.

The second line has a single layer of code indent (one tab), but then a few spaces to make it line up in an aesthetically pleasing way with the previous line. This will work correctly with any size tabs, and more correctly represents the data.

So please, I beg you. We gave up on tag soup and table based layouts in favor of CSS; we gave up on “This site optimized for 800×600″ in favor of liquid layouts; now please, please, stop expanding tabs into spaces in code.

On the Importance of Naming

Saturday, May 27th, 2006

Programmers typically don’t worry much about naming within the code.  After all, the user isn’t going to see what your variable are named.  You know what they do; so what if you’ve got variables named a, a2, and _a?

The beginning of wisdom is to call things by their right names.  Giving a proper name to a module, class, method, or variable is a sign that you truly understand what it’s supposed to do.  In fact, the very act of searching for a name can often help you figure this out.  It’s a process of determining the component’s true identity.

If you can’t think of a good name for the component, there’s a good chance you don’t really understand what it should be doing.  Time to rethink your design.

There’s also the perception that the internal names for things need not match their external names.  The problem with this is that you impair communication between users, designers, and developers.  Bug reports from users will come in with their external names.  There’s a cost in mapping one name to another every time you get a bug report, and if you’re writing any kind of serious applications then you’ll have an awful lot of bug reports.  Changing the name of the component need only be done once, but you’ll have to deal with the mapping every time.

Then there’s the matter of new developers coming onto a project.  The more obvious and concrete your component names are, the easier it will be for them to get up to speed.  Let’s say you’re exploring an unfamiliar codebase that you know is an e-commerce suite of some sort.  You come across a module named “Process.”  Quick, what do you think it does?  One guess is as good as another.  On the other hand, if the module were named “Checkout”, you’d have a pretty good idea what it does before even opening up the source file.

Good names should be short and distinctive.  Many programmer, once they get past the a, a2, _a approach to naming go the other extreme and start naming variables number_of_orders and products_to_be_saved_to_disk.  A name that is too long will clutter up the code and make it hard to read, which is only slightly better than being short and cryptic.  num_orders and queued_products are about the right length.

Poor names are often not the result of a poor initial name choice, but instead a result of the natural organic change of the code over time.  The module may evolve to take on a different role, or it might stay static and the rest of the application changes around it.  Either way, the result is to make a once-good name obsolete.

As soon as you realize the name is no longer appropriate, take a moment to change it.  It’s easier than you think.  With modern search and replace tools (or the old fallback “perl -p -i -e ’s/old/new/g’ *”), version control, and even IDEs with refactoring menus, there’s just no excuse for a bad name.