Diacritic-insensitive search in MySQL
Posted: Thu Mar 04, 2010 7:21 pm
Hi again!
The idea: Allow some nice url for a person, just as server.tld/peter-mayer
Then look through the database for a name that would fit this, e.g. WHERE REPLACE(LOWER(name), ' ', '-') = what_I_get_from_GET
There shouldn't even be any duplicate names, since my application doesn't allow that.
The problem: I want to do this for czech names. But the czech language has some diacritical marks like ˇ or ´. Obviously, characters with these diacritical marks cannot be used in the url, so I use my php function to remove them (all the diacritic letters are standard latin, just with the mark over it).
But then the problems arise when I want to check the database for the user. First, there may be more users found, since ě, é and e would all equal to e. But that's easily solved.
The main problem is how to build the query for MySQL. Is there even a way similar to the one above (using just REPLACE and LOWER all is done in MySQL)? I don't want to use url %-escaping, so I need to check it some other way. The only two possibilities I could think of are 1) matching the database records against all posible diacritic variants of the name (needs more computing time) or 2) add a new column containing the names stripped of diacritic (needs more db space).
Any better ideas?
Thanks in advance!
Martin
The idea: Allow some nice url for a person, just as server.tld/peter-mayer
Then look through the database for a name that would fit this, e.g. WHERE REPLACE(LOWER(name), ' ', '-') = what_I_get_from_GET
There shouldn't even be any duplicate names, since my application doesn't allow that.
The problem: I want to do this for czech names. But the czech language has some diacritical marks like ˇ or ´. Obviously, characters with these diacritical marks cannot be used in the url, so I use my php function to remove them (all the diacritic letters are standard latin, just with the mark over it).
But then the problems arise when I want to check the database for the user. First, there may be more users found, since ě, é and e would all equal to e. But that's easily solved.
The main problem is how to build the query for MySQL. Is there even a way similar to the one above (using just REPLACE and LOWER all is done in MySQL)? I don't want to use url %-escaping, so I need to check it some other way. The only two possibilities I could think of are 1) matching the database records against all posible diacritic variants of the name (needs more computing time) or 2) add a new column containing the names stripped of diacritic (needs more db space).
Any better ideas?
Thanks in advance!
Martin