Page 1 of 2

Mysql case insensitive and sensitive column

Posted: Mon Aug 20, 2007 1:54 am
by keenlearner
Hello, I want to build a dictionary table, with 3 columns, word_id, word, definition. Most of the words are case insensitive but some need to be case sensitive. E.g the word "hobby", if I SELECT the word from the database, it's ok to be either "HoBBy", but there are words that is very important to have case sensitivity such as "mm" means millimeter, "Mm" means megameter, so if my word column is case insensitive, and if I SELECT with the value Mm, the database can return either Mm or mm which is so much difference. So I was thinking of making extra one column, cased_word, just to put those word that is case sensitive. so cased_word and word column are mutually exclusive, both can't have at the same time. But is there a better way than this ? Note that, it's important to have the case sensitive and insensitive word sharing the same word_id, because I will be using their word_id in other tables. Thanks.

Posted: Mon Aug 20, 2007 2:11 am
by iknownothing
you need to set your column to a case sesitive collation.

Eg:

Add

Code: Select all

COLLATE latin1_general_cs
at the end of your SELECT statement,

Posted: Mon Aug 20, 2007 2:25 am
by keenlearner
Thanks, but the problem is when I am SELECTing a word, I don't know whether the word is case sensitive or insensitive. such as if user want to lookup a definition by writing the word "hoBBy", this word should be case insensitive, but if I SELECT using the collation COLLATE latin1_general_cs
it will not return the definition because it's stored as "hobby" in the database. The case sensitivity can be determined on the database only.

Posted: Mon Aug 20, 2007 2:32 am
by iknownothing
you should beable to access the COLLATION from phpMyAdmin also, for selected columns, same deal, look for postfix _cs (case sensitive).

Posted: Mon Aug 20, 2007 2:45 am
by keenlearner
I know what you mean, I can even use the binary column to have case sensitive column. But that is not my problem, please get my semantic meaning.

case sensitive word list
WHO
mm
Mm
K
Pa
Hgmm
s

case insensitive word list
who
me
you
I
they

see the first word for both case sensitive and insensitive, the "WHO" means World Health Organization but "who" is a pronoun for asking about a person. Same word but different capital letters gives different meanings. So if user type all lower case "who" I only check the case insensitive column, but if user type "WHo" I must check the case sensitive letter first and if not found then check the case insensitive column. So I want to know is there a way to mix case sensitive and insensitive in one column (i think it's not possible) So what are the alternatives ? Because I want them to share the same primary unique ID. Thanks.

Posted: Mon Aug 20, 2007 3:01 am
by keenlearner
Thanks, you have indirectly lighten me up. So I would have only one column word which is case insensitive, all case insensitive words will be in lower case, case sensitive words have at least one letter in upper case, so if there is a upper case letter in user lookup word such as "kilOMeter", I will SELECT with the case sensitive collation first and if not found then select again with case insensitive collation. But is there a better way than this ? Thanks

Posted: Mon Aug 20, 2007 2:38 pm
by superdezign
.. A simpler method would be

Code: Select all

SELECT `word` FROM `dictionary` WHERE BINARY `word` = 'foo' OR `word` = 'foo' LIMIT 1;
As long as your search is case-insensitive by default, you'll either get a exact case match or a case-insensitive match.

Posted: Mon Aug 20, 2007 2:53 pm
by VladSun
Hm, nice work superdezign. Just one notice - I think you need to specify the ORDER BY caluse in order to be sure which word (case sensitive or case insensitive) is listed first. I.e:

Code: Select all

SELECT `word` FROM `dictionary` WHERE BINARY `word` = 'foo' OR `word` = 'foo' ORDER BY `word` LIMIT 1;

Posted: Mon Aug 20, 2007 3:09 pm
by superdezign
I figured the precedence of the case-sensitivity in the WHERE clause would handle that.

Ordering it with ORDER BY `word` would seem to me to skew the results, as uppercase characters have a lower ASCII and UNICODE value than lowercased characters.

Posted: Mon Aug 20, 2007 3:27 pm
by VladSun
superdezign wrote:I figured the precedence of the case-sensitivity in the WHERE clause would handle that.
As far as I understand, you claim that this SQL query:

Code: Select all

select id from table where id>10 or id<10
would list the rows with id>10 and after that the ones with id<10? Is it correct?

Edit: I'm pretty sure you don't mean that but the second part of your topic still confuses me.
superdezign wrote:Ordering it with ORDER BY `word` would seem to me to skew the results, as uppercase characters have a lower ASCII and UNICODE value than lower cased characters.
keenlearner wrote:So if user type all lower case "who" I only check the case insensitive column, but if user type "WHo" I must check the case sensitive letter first and if not found then check the case insensitive column.
It is exactly what keenlearner needs.

In second thought, my last query is wrong. It must be:

Code: Select all

SELECT `word` FROM `dictionary` WHERE BINARY `word` = 'foo' OR `word` = 'foo' ORDER BY binary `word` [asc|desc] LIMIT 1;
in order to have asc/desc sorting.

In fact, my query and yours will have the same result.

Posted: Mon Aug 20, 2007 10:40 pm
by superdezign
Yes, I think the added BINARY would consistently produce the result we're after.

Posted: Tue Aug 21, 2007 6:00 am
by keenlearner
Thanks for the solutions, but if it return one row only, how can I differentiate if the word is a case insensitive or case sensitive ? I think I would create the Dictionary table with two word columns, case_sensitive and case_insensitive.

Dictonaray table

Code: Select all

<pre>
word_id	   insensitive_word
1		megameter	   
2		millimeter
3		mm
4		Mm
</pre>
Because the problem with your query will be if user looking for the word "mM" (if this should be a case sensitive word) , but this word is not in my database, but your query will still return one row for "mm", which might give wrong definition to the user.

So I would have to search two column for a word, the in_sensitive and sensitive column have the case insensitive and case sensitive collation respectively. So my query will have

Code: Select all

SELECT `in_sensitive`, `sensitive` FROM `dictionary` WHERE `sensitive` = 'mM' OR  `in_sensitive` = 'mM' LIMIT 1;
so this will not return any result, so not wrong definition given to the user. Or is there a better way than this ? thanks.

Dictionaray table

Code: Select all

<pre>
word_id	   in_sensitive	        sensitive         definition
1		megameter	   NULL              ......
2		millimeter	       NULL              ......
3		NULL                    Mm                 SI unit for megameter
4		NULL                    mm                 SI unit for millimeter
</pre>

Posted: Tue Aug 21, 2007 7:00 am
by VladSun
Hm, now I see that we gave you not the exact solution you want. Maybe this is closer to it:

Code: Select all

SELECT `word`, 1 FROM `dictionary` WHERE binary `word` = 'cM'
UNION
SELECT `word`, 2 FROM `dictionary` WHERE LOWER(`word`) = LOWER('cM');
It's the first row containing the true result. Using your second post word lists you will have:
WHO: WHO, 1
you: you, 2

So 1 is for case sensitive, while 2 is for case insensitive match.

Posted: Tue Aug 21, 2007 8:52 am
by keenlearner
Thanks for really2 helping out, I never know that we can put number at the selection. But do know for my previous post is there a way to get the column which is Non NULL only, rather than determine at the programming code ? Because I can collect those sensitive case word meaning for future usage.

Posted: Tue Aug 21, 2007 8:59 am
by VladSun

Code: Select all

SELECT IF (`sensitive` != '', `sensisitve`, `in_sensitive`) FROM `dictionary` WHERE `sensitive` = 'mM' OR  `in_sensitive` = 'mM';