Saturday, September 26, 2009

URLEncoder Demonstrates API Design Hazards

I had to use yesterday. This experience was a great reminder of how annoying this API is. It demonstrates a couple of issues, so I thought I would comment on it here.

The first thing you can see here is how important it is to get the API right the first time, lest things go downhill from there. In its original form, URLEncoder provided only one static method which encoded a string as "application/x-www-form-urlencoded" (suitable for issuing an HTTP request). This method has the signature encode(String), and it used the platform's default encoding. Unfortunately, this is technically hazardous, because URL encodings are supposed to always be UTF-8. Therefore, wiser heads built and used replacement classes (such as URLCodec from Apache Commons).

Java 1.4 finally "fixed" this deficiency in the platform with a new encode(String,String) overload, the second parameter being the name of a character encoding to use. The old encode(String) method is marked as deprecated. The documentation itself states that: Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilites.

I don't know, but I'm guessing that the powers-that-be would have liked to make encode(String) do UTF-8 encoding. But they felt constrained by the legacy decision to use the platform default, in the name of not breaking any existing code. Never mind the substantial odds that any such existing code is already broken in this case wherever the default encoding is not UTF-8.

But wait, it gets worse. The encode(String,String) method also throws UnsupportedEncodingException whenever (surprise!) the second parameter does not refer to a valid encoding. I'm no rocket scientist, but it seems to me that if you're following the W3C recommendation and always passing UTF-8, you don't have to worry about this exception being thrown. And yet because the exception is checked, you have no choice but to handle it and clutter up your code, perhaps with a catch block that just does assert(false).

This all begs the question, if the second parameter should always be "UTF-8" for some definition of "always", why should the API require a second parameter at all? I submit that a better approach would have been to create, instead of or in addition to encode(String,String), an encodeUTF8(String) method that always did UTF-8 encoding without the added hassle of passing a (constant) parameter and dealing with a checked exception...

Sunday, September 13, 2009

"How to Think About Algorithms" Book

I have been gradually making my way through "How to Think About Algorithms" by Jeff Edmonds.

I have recently recommended this book to several people, and I wanted to mention it here, too. Edmonds uses what I'll call a more intuitive approach to algorithms that I think would be appealing to those with an engineering mindset. Here I'm constrasting it with the texts that I've used as a student in algorithms courses -- the classic Sedgewick and CLRS books.
(Maybe more importantly in this economy, though, the subject matter and exercises in this book are a rich source of review for those preparing for tech interviews!)

Wednesday, September 2, 2009

Tailing Inside Eclipse

I'm currently working on a project that writes to a log file inside my workspace when I test-execute it. Obviously it's useful to actually view that log file once in awhile, but to do so inside Eclipse requires regularly F5-refreshing it in the Project Explorer. I started to wish that Eclipse would prompt me to automatically reload the file when it detected modification, like some text editors do. But then it occurred to me how convenient it would be if I could actually tail the log from inside Eclipse itself. After all,

The less I leave the IDE, The more productive I can be.

(That rhymes.)

This seemed like an obvious enough desire that somebody would have built a plugin for it already, and sure enough, a search turned up a couple of different examples. The one that I went with is called NTail. It works like a champ. It also has a few extra nice features, like the ability to highlight lines in the file based on a regular expression. If this sounds useful to you, I'd encourage you to check it out.