Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't see anything here that hasn't been solved by ICU. Am I missing something?

http://site.icu-project.org/

Anecdote time: I once worked on a software project where all of the sorting was done using an internal library made by a Large Trustworthy International Corporation. We discovered halfway through that the transitive property was not being maintained during sorts mixing half-width and full-width numerals. (In other words, 1 < 2 < 1 > 1.) Switching to ICU left me ultra-impressed at its thoroughness.



ICU is very good, but even it isn't perfect, and they know it. http://userguide.icu-project.org/collation/customization:

"ICU provides a data-driven, flexible, and run-time-customizable mechanism called "tailoring". Tailoring overrides the default order of code points and the values of the ICU Collation Service attributes"

Also, you sometimes need context to properly sort strings. Examples:

Are you sorting phone book entries or items in a dictionary? In some languages, that does make a difference.

Are you sorting Swiss German or German German?

Given two 'obviously' Italian words, should you apply Italian collation rules? You probably/maybe shouldn't when both are words in an English-language dictionary.

Some library catalogs want(ed?) to sort "2001: a Space Odyssey" as "Two thousand and…" (http://en.m.wikipedia.org/wiki/Library_catalog#Sorting)

For the latter, even that ICU feature to customize sorting won't help you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: