The following regex matches underscore separated words, where there are at least two but no more than four words and each word begins with a letter but may contain letters and digits. It is fully commented so it should be pretty easy to modify to meet your precise needs. The following command line script demonstrates how to extract the first letters and combine them into a shortened string.
Code: Select all
<?php // test.php 2010-10-19
$data = "this string has two_words three_words_here and four_words_in_here and more string";
$re = '/# match underscore separated multi_word (2-4 words)
\b # anchor start to word boundary
([A-Z]) # $1 first letter of 1st word
[A-Z0-9]* # remainder of 1st word
_ # underscore separates 1st and 2nd
([A-Z]) # $2 first letter of 2nd word
[A-Z0-9]* # remainder of 2nd word
(?: # third word is optional
_ # underscore separates 2nd and 3rd
([A-Z]) # $3 first letter of 3rd word
[A-Z0-9]* # remainder of 3rd word
)? # third word is optional
(?: # fourth word is optional
_ # underscore separates 3rd and 4th
([A-Z]) # $4 first letter of 4th word
[A-Z0-9]* # remainder of 4th word
)? # fouth word is optional
\b # anchor end to word boundary
/ix';
$count = preg_match_all($re, $data, $matches, PREG_SET_ORDER);
for ($i = 0; $i < $count; $i++) {
$letters = "";
for ($j = 1; isset($matches[$i][$j]); $j++)
$letters .= $matches[$i][$j];
printf("Match %d = \"%s\", first letters = \"%s\"\n",
$i + 1, $matches[$i][0], $letters);
}
?>
Here is the output from the script:
[text]Match 1 = "two_words", first letters = "tw"
Match 2 = "three_words_here", first letters = "twh"
Match 3 = "four_words_in_here", first letters = "fwih"[/text]
Hope this helps!
