What is the best way to strip a pre-defined list of words from an aribtary string? What I want to do is remove all simple connecting words like "a", "the", "what", "who", "where", etc. (my English teacher probably has a better term to describes these ords than 'connecting words") from a string.
I imagine it's something like the following, though perhaps there os a common tool, function, code fragment, etc. that people use to accomplish this?
$words_to_exclude = {
"the",
"a ",
"what",
"who",
etc///
php_grep_function ($string, $words_to_exclude); (not sure what the php grep or string replace function is or how it works)
[As a follow-on question, does anyone know of a tool that can discern and extract likely subject words from a sentence? In other words, a tool that can guess which words in a sentence are its key words?]
Thanks
Stripping words from a string
Moderator: General Moderators
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
preg_replace()
that's untested..
Code: Select all
function pregProtect($a) {
return preg_quote($a,'#');
}
$words_to_exclude = '#\b('.implode('|',array_map('pregProtect',$words_to_exclude)).')\b#';
$text = preg_replace($words_to_exclude,'',$text);Brilliant!
Appears to work very well! Thanks feyd. Very elegant solution.
What do the "#"s and "/b"s do?
What do the "#"s and "/b"s do?