Page 1 of 1

Diacritic-insensitive search in MySQL

Posted: Thu Mar 04, 2010 7:21 pm
by MarK (CZ)
Hi again!

The idea: Allow some nice url for a person, just as server.tld/peter-mayer
Then look through the database for a name that would fit this, e.g. WHERE REPLACE(LOWER(name), ' ', '-') = what_I_get_from_GET
There shouldn't even be any duplicate names, since my application doesn't allow that.

The problem: I want to do this for czech names. But the czech language has some diacritical marks like ˇ or ´. Obviously, characters with these diacritical marks cannot be used in the url, so I use my php function to remove them (all the diacritic letters are standard latin, just with the mark over it).
But then the problems arise when I want to check the database for the user. First, there may be more users found, since ě, é and e would all equal to e. But that's easily solved.
The main problem is how to build the query for MySQL. Is there even a way similar to the one above (using just REPLACE and LOWER all is done in MySQL)? I don't want to use url %-escaping, so I need to check it some other way. The only two possibilities I could think of are 1) matching the database records against all posible diacritic variants of the name (needs more computing time) or 2) add a new column containing the names stripped of diacritic (needs more db space).

Any better ideas?
Thanks in advance! :)

Martin

Re: Diacritic-insensitive search in MySQL

Posted: Thu Mar 04, 2010 7:50 pm
by Eran
1. save the URL friendly version of the username in a separate field in the table (and index it - should be a unique index)
2. Run urlencode() on it just to be safe - http://php.net/manual/en/function.urlencode.php