Saturday, September 26, 2009

URLEncoder Demonstrates API Design Hazards

I had to use java.net.URLEncoder yesterday. This experience was a great reminder of how annoying this API is. It demonstrates a couple of issues, so I thought I would comment on it here.

The first thing you can see here is how important it is to get the API right the first time, lest things go downhill from there. In its original form, URLEncoder provided only one static method which encoded a string as "application/x-www-form-urlencoded" (suitable for issuing an HTTP request). This method has the signature encode(String), and it used the platform's default encoding. Unfortunately, this is technically hazardous, because URL encodings are supposed to always be UTF-8. Therefore, wiser heads built and used replacement classes (such as URLCodec from Apache Commons).

Java 1.4 finally "fixed" this deficiency in the platform with a new encode(String,String) overload, the second parameter being the name of a character encoding to use. The old encode(String) method is marked as deprecated. The documentation itself states that: Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilites.

I don't know, but I'm guessing that the powers-that-be would have liked to make encode(String) do UTF-8 encoding. But they felt constrained by the legacy decision to use the platform default, in the name of not breaking any existing code. Never mind the substantial odds that any such existing code is already broken in this case wherever the default encoding is not UTF-8.

But wait, it gets worse. The encode(String,String) method also throws UnsupportedEncodingException whenever (surprise!) the second parameter does not refer to a valid encoding. I'm no rocket scientist, but it seems to me that if you're following the W3C recommendation and always passing UTF-8, you don't have to worry about this exception being thrown. And yet because the exception is checked, you have no choice but to handle it and clutter up your code, perhaps with a catch block that just does assert(false).

This all begs the question, if the second parameter should always be "UTF-8" for some definition of "always", why should the API require a second parameter at all? I submit that a better approach would have been to create, instead of or in addition to encode(String,String), an encodeUTF8(String) method that always did UTF-8 encoding without the added hassle of passing a (constant) parameter and dealing with a checked exception...

No comments: