Page 1 of 1

help with regular expression would be welcomed

Posted: Fri Dec 31, 2004 7:48 am
by jasongr
Hello

I am quite new to regular expression and I am faced with a challange that I feel could be solved quite easily using regular expression.
I would appreciate if someone could help me with the solution and if possible to explain how he/she solved it so I will pick up my regular expression knowledge.

Here is the problem:
I am given a PHP string like so:
$phrase = 'Jo?n* Mil??er'
Where * means zero or more occurances of any letter (except whitespace)
and ? means a single occurance of any letter (except whitespace)

I then get another sentence (in another PHP string) in which the phrase was found. For example: 'Johnny Miller was here'
I need to use regular expressions to highlight the phrase in the sentence (using the <b> tag).

So in the example above, the highlighted expression will be:
'<b>Johnny Miller</b> was here'

Here is another example:
$phrase = '*a*'
$sentence = 'The tall man walked down the street'
The highlighted sentecne will be:
'The <b>tall</b> <b>man</b> <b>walked</b> down the street''
or in HTML:
'The tall man walked down the street'
I would appreciate any help
regards
Jason

Posted: Fri Dec 31, 2004 9:39 am
by feyd

Code: Select all

<?php

	function phraseHighlight($phrase, $text)
	{
		$convertFrom = array('#(?<!\\\\\\\\)\\\\?#', '#(?<!\\\\\\\\)\\\\*#', '#@#');
		$convertTo = array('[a-z]','[a-z]*?','\\\\@');
		
		$phrase = preg_replace($convertFrom, $convertTo, $phrase);
		
		$matches = array();
		preg_match_all('@\\b' . $phrase . '\\b@i', $text, $matches);
		$matches = array_unique($matches[0]);
		foreach($matches as $match)
			$text = preg_replace('@\\b' . preg_quote( $match, '@' ) . '\\b@', '<b>\\\\0</b>', $text);
		
		return $text;
	}

	echo phraseHighlight( 'Jo?n* Mil??er', 'Johnny Millner was here.' )."\\n";
	echo phraseHighlight( '*a*', 'The tall man walked down the street' )."\n";
	echo phraseHighlight( '*a*', '<a href="blahblah kitty">test have ten base</a>');

?>

Code: Select all

<b>Johnny Millner</b> was here.
The <b>tall</b> <b>man</b> <b>walked</b> down the street
<<b>a</b> href="<b>blahblah</b> kitty">test <b>have</b> ten <b>base</b></<b>a</b>>
note that having html in the text can royally screw it up. Also note that your original phrase doesn't match Johnny Miller. (location of the question marks)

Posted: Fri Dec 31, 2004 10:19 am
by jasongr
I get a strange PHP error when I try your code:

Code: Select all

Warning: preg_replace() &#1111;function.preg-replace]: Compilation failed: missing ) at offset 8 in C:\www\test.php on line 5
where line 5 is the line:

Code: Select all

$phrase = preg_replace($convertFrom, $convertTo, $phrase);
what could be the problem?

Posted: Fri Dec 31, 2004 10:25 am
by feyd
oops.. forgot to fix the backslashes.. fixed.

Posted: Fri Dec 31, 2004 10:28 am
by jasongr
what?
what do you mean by fixed?
where can I see the fix you mentioned?

Posted: Fri Dec 31, 2004 10:30 am
by feyd
I edited my previous post.

Posted: Fri Dec 31, 2004 10:33 am
by jasongr
thanks a lot for the help
I will now go look at some regular expression tutorials to understand how you solved it

thanks again

Posted: Fri Dec 31, 2004 10:53 am
by feyd
(?!...) (?=...) (?<!...) (?<=...) are forward (first 2) and back (last 2) references, or look-forwards and look-backs. They check the space around them for the contained pattern.
[] are metacharacters that denote a character class beginning and end, respectively. The a-z in them tells the pattern to match letters a through z.
* is a metacharacter that denotes a match against zero or more of the preceeding character or grouping.
? is a metacharacter that denotoes a match against zero or one of the preceeding character or grouping.
*? is a metacharacter combination that tells the match to be as small as possible.

Posted: Fri Dec 31, 2004 11:17 am
by jasongr
what if I wanted to modify the defintion of the '?' character to mean
zero or 1 occurances instead of single occurance as I initialy requested?
what change do I need to make?

Posted: Sat Jan 01, 2005 12:38 am
by feyd

Code: Select all

$convertTo = array('&#1111;a-z]?','&#1111;a-z]*?','\\@');