"Use document.addEventListener('touchend', function(){}); to enable :active on mobile"
—
(Source: twitter.com)
Accent folding, or diacritics removal, aims to replace accented letters with their English alphabet base (which is different from normalized Unicode equivalence). It has several common applications. For example:
- sorting, simple implementation as well as certain collating sequences
- indexing and search, auto-complete, text expansion
- URL slugs, code names, tags…
Accent folding has been well covered with some use cases like auto-complete by Carlos Bueno in this article for A List Apart, so I’ll skip further “why”s and concentrate on the “how”.
I needed to remove diacritics for sorting list items. And I wanted a fast implementation that could run often (and/or on the client side, possibly in IE) on lists with a few hundred strings.
A search led me to this implementation which uses regexes. So I took his character map and rewrote my own which doesn’t, and ran some basic performance tests. As expected, avoiding regex performs better.
It is pretty basic, see the code on Gist:
- split the string into an array of characters
- for each characters, if there is a match in the map, then replace it with the corresponding value (and set a flag)
- if the flag has been set, join back the array into a string, else just return the input string
After, I found about the ALA article and realized that I had taken almost the same per-character approach as the author, except that he used a string in which he adds characters, while I use an array. For the sake of completeness, I added it in the performance tests (hint: adding to string is slower).
Also, in my implementation, I’m using hasOwnProperty to check the presence of the character in the map, thus avoiding the side-effect of (unlikely yet possible) third-party code messing with Object.prototype (more on that, see An Object is not a Hash).