Page 1 of 1

Title Capitalization

Posted: Tue Aug 22, 2006 3:09 pm
by Ollie Saunders
I'm writing a filter that transforms text into title case. I'm doing something more than just ucwords() here because I don't want to capitalize minor words.

I've been looking at this section of an article in Wikipedia on capitalization I've decided that I want to leave "internal articles, prepositions, conjunctions and forms of to be" uncapitalized. I need to find out what those things are. Can anybody tell me? I'll be amazed if anyone can. Or can anyone provide a good resource where I might find complete definitions of what they are.

Posted: Tue Aug 22, 2006 3:23 pm
by feyd
Articles seem a wee bit complicated, prepositions are a little bit tricky, conjunctions are pretty standard fair, to be doesn't appear to difficult.

Posted: Tue Aug 22, 2006 3:47 pm
by Ollie Saunders
Thanks Feyd that was great.

In case anyone else needs this here's the magic array

Code: Select all

$keyword = array('across' => true, 'after' => true, 'at'     => true, 
                 'before' => true, 'by'    => true, 'during' => true,
                 'from'   => true, 'in'    => true, 'into'   => true,
                 'of'     => true, 'on'    => true, 'to'     => true,
                 'under'  => true, 'with'  => true, 'without'=> true,
                 'and'    => true, 'but'   => true, 'or'     => true,
                 'the'    => true, 'a'     => true, 'an'     => true,
                 'that'   => true, 'is'    => true, 'are'    => true,
                 'be'     => true, 'am'    => true, 'being'  => true,
                 'was'    => true, 'were'  => true, 'been'   => true);
I've used keys because they are indexed and thus faster

Posted: Tue Aug 22, 2006 3:52 pm
by Ollie Saunders
err. I've just realised there is no way I can write this properly. English is just too complicated!
For instance this is output from my function:
The Great Green Man Has been
I'm pretty sure been should be capitalized here because its being used in a different form but I'm buggered if I'm going to check for that.

Posted: Tue Aug 22, 2006 4:00 pm
by feyd
"has been" is a to-be, if memory serves.

Posted: Tue Aug 22, 2006 4:06 pm
by Ollie Saunders
"has been" is a to-be, if memory serves.
Ugh i wish I had never started this.

Posted: Tue Aug 22, 2006 4:08 pm
by feyd
ole wrote:Ugh i wish I had never started this.
Image Now you know why I haven't done one... in a while.

Mine used to only fix conjunctions, it didn't care about anything else.

Posted: Tue Aug 22, 2006 4:28 pm
by Chris Corbyn
~ole... do this and you're basically writing AI code (depending upon how you do it).

Posted: Tue Aug 22, 2006 4:48 pm
by jayshields
Can I ask what's the point of making a script like this?

It's more hassle than it's worth. The amount of characters you'd have to type to make this script perfect (as good as a human could do it) is probably the same amount of characters it would take to type hundreds of thousands of correctly capitalized titles.

When would this script even be used? If the title is not fetched from a database, just type it correctly yourself. If the title is fetched from a database, just type it correctly in the first place.

Posted: Wed Aug 23, 2006 4:16 am
by Ollie Saunders
It's more hassle than it's worth.
Yes I realize that now.
When would this script even be used?
If someone had made a complete working one you would find use for it I'm sure. Its part of a library anyway so not for me specifically.