I pass on a file, in each line I need to catch ALL the expressions made of words starting with capitals and put them as a sub array in an array.
For example if on line 2 I have the next text (the text is tokenized, ie all punctuation marks are padded with spaces):
Hello world ! Is there anybody named John Doe Spartacus here ?
I need the corresponding cell in the $results array to be (order doesn't really matters):
$results[2] =>
[0] => Hello
[1] => Is
[2] => John
[3] => John Doe
[4] => John Doe Spartacus
[5] => Doe
[6] => Doe Spartacus
[7] => Spartacus
So far the best I got is (Using preg_match_all):
$results[2] =>
[0] => Hello
[1] => Is
[2] => John Doe Spartacus
[3] => Spartacus
as this isn't exactly what I was looking for I moved to using preg_match with the offset flag, but still no good. I'm handling the array thing fine, I need help with the regex. Here's the code:
Code: Select all
$inh = fopen($inf, "r");
while(!feof($inh)) {
$line = fgets($inh);
$matches = array();
$tmatches = array();
$offset = 0;
do {
$ret = preg_match("/([A-Z][A-Za-z\-]+)( [A-Z][A-Za-z\-]+)*/", $line, $tmatches, PREG_OFFSET_CAPTURE, $offset);
if($ret == 1) {
$offset = $tmatches[0][1] + strlen($tmatches[0][0]);
$matches[$i][] = $tmatches[0][0];
}
}
while($ret == 1);
print_r($matches[$i]);
$i++;
}
edit: added "Is" to the arrays
2nd edit: added "Doe" as a singular match