October 4, 2004

Alphabetization

There are, apparently, two competing schools of alphabetization at large in the English-speaking world. One of them you can recognize because it is stupid and wrong.

The correct form of alphabetization ignores spaces and punctuation. As Leonard Maltin writes in his Movie Guide:

[F]ilm titles are listed in strict letter-by-letter spelling sequence. Separation of words and punctuation is ignored; everything hinges on the letters. So Isadora is listed before I Shot Jesse James because "a" comes before "h".... Let the alphabet by your guide, letter-by-letter, and you'll find the title you're looking for.

The opposing, or "De Moroneris", school of alphabetization holds that a longer word comes after a shorter word, even when the shorter word is part of a phrase. E.g., "sea foam" comes before "seaborne". I don't know how this system handles hyphens and other punctuation; I would assume they work like spaces.

The most obvious reason this system sucks is that you can no longer look compound words up if you don't know how the compound is spaced. Is it "sea foam", "sea-foam", or "seafoam"? All three might be listed in different places according to this form of alphabetization.

This approach also favors the lazy, since it means that programmers can simply sort a list on the ASCII values of the characters in the list items. (Trust me; this is easy.) A space comes before any letter, automatically sorting "sea foam" before "seaborne". Sorting on just letter values would be harder. (Note that pure ASCII has the side effect of putting "coinage" between "carter" and "Carter".)

Because it is the correct method, the letter-by-letter version of alphabetization is the form you'll find in dictionaries and encyclopedias. Here's a listing of the first few "sea" entries in my Random House College Dictionary:

sea, sea anchor, sea anemone, sea bag, sea bass, seabed, Seabee, sea bird, sea biscuit, seaboard, Seaborg, sea-born, seaborne, sea bread, sea bream, sea breeze.

There are five columns of entries starting with "sea" in this dictionary. It would obviously be nonsensical to have "sea wrack" come immediately before "seabed", let alone trying to find "sea-maid" or "seat belt".

I just thought I should straighten folks out about this.

Posted by Greg at October 4, 2004 9:08 AM | TrackBack

Comments
#1 ::: Mason ::: October 4, 2004 12:47 PM ::: link

Mea culpa. Consider me straightened.

#2 ::: HWRNMNBSOL ::: October 4, 2004 4:43 PM ::: link

I consider it my duty to inform you about your wrongness on this issue. However, out of deference to you, I shall attempt to not be funny.

There are actually three separate schools of alphabetization, although the third is rather obscure and is only used in foreign dictionaries. Many German dictionaries alphabetize by context, putting compound words ahead of non-compound words that are alphabetically of higher order. Hence, 'affespiel', or 'ape play', would be sorted ahead of 'affekt', or 'affect', because the 'affe' portion of the word is significant unto itself in the compound word. In other words, they alphabetize using spaces without bothering with the actual spaces.

But this is the point: English lexicography does not stand by itself. Any practical lexicography must be usable internationally. And international standards of how words are joined together differ from those used by Americans.

Am I denying the utility of alphabetical-only sorting? no. But to call the alternative method of alphabetization stupid and wrong, even in a flip manner, ignores the fact that there are important reasons of precedence and international usage behind that convention.

I hope this comment has been satisfyingly humorless. I shall attempt in the future to be entirely dry and severe, thereby forestalling any ill will or bruised feelings.

Live long and prosper.

#3 ::: Greg Morrow ::: October 4, 2004 5:07 PM ::: link

(Being funny by not being funny is pretty funny, actually.)

I would argue that every language can, in principle, have its own alphabetization scheme. If my high school Spanish teacher was right, for example, 'rr' is treated as a separate letter even though it's a digraph. Thus, strict letter-by-letter order isn't alphabetization by definition in Spanish.

So German can have its own rules (although I would argue that the rule you cite is pretty dopey under the same argument as English).

Synthetic languages like Russian and polysynthetic languages like Mohawk rapidly render alphabetization conceptually difficult. Pictographic languages like Chinese and Japanese can easily support "alphabetization"--the characters are strictly ordered and numbered--but when you're dealing with thousands of characters, the result is of little direct use.

So my remarks are definitely meant only for English.

I'm not even sure why international usage would particularly matter. Back in high school, my French-English dictionary used an alphabetization order indistinguishable from English (by ignoring accent marks), but that could easily have been for the benefit of the FSL native English speakers it was aimed at. I wouldn't expect English alphabetization order to have any effect on French alphabetization order; there aren't compatbility issues to worry about.