help with regular expression would be welcomed

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
jasongr
Forum Contributor
Posts: 206
Joined: Tue Jul 27, 2004 6:19 am

help with regular expression would be welcomed

Post by jasongr »

Hello

I am quite new to regular expression and I am faced with a challange that I feel could be solved quite easily using regular expression.
I would appreciate if someone could help me with the solution and if possible to explain how he/she solved it so I will pick up my regular expression knowledge.

Here is the problem:
I am given a PHP string like so:
$phrase = 'Jo?n* Mil??er'
Where * means zero or more occurances of any letter (except whitespace)
and ? means a single occurance of any letter (except whitespace)

I then get another sentence (in another PHP string) in which the phrase was found. For example: 'Johnny Miller was here'
I need to use regular expressions to highlight the phrase in the sentence (using the <b> tag).

So in the example above, the highlighted expression will be:
'<b>Johnny Miller</b> was here'

Here is another example:
$phrase = '*a*'
$sentence = 'The tall man walked down the street'
The highlighted sentecne will be:
'The <b>tall</b> <b>man</b> <b>walked</b> down the street''
or in HTML:
'The tall man walked down the street'
I would appreciate any help
regards
Jason
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

<?php

	function phraseHighlight($phrase, $text)
	{
		$convertFrom = array('#(?<!\\\\\\\\)\\\\?#', '#(?<!\\\\\\\\)\\\\*#', '#@#');
		$convertTo = array('[a-z]','[a-z]*?','\\\\@');
		
		$phrase = preg_replace($convertFrom, $convertTo, $phrase);
		
		$matches = array();
		preg_match_all('@\\b' . $phrase . '\\b@i', $text, $matches);
		$matches = array_unique($matches[0]);
		foreach($matches as $match)
			$text = preg_replace('@\\b' . preg_quote( $match, '@' ) . '\\b@', '<b>\\\\0</b>', $text);
		
		return $text;
	}

	echo phraseHighlight( 'Jo?n* Mil??er', 'Johnny Millner was here.' )."\\n";
	echo phraseHighlight( '*a*', 'The tall man walked down the street' )."\n";
	echo phraseHighlight( '*a*', '<a href="blahblah kitty">test have ten base</a>');

?>

Code: Select all

<b>Johnny Millner</b> was here.
The <b>tall</b> <b>man</b> <b>walked</b> down the street
<<b>a</b> href="<b>blahblah</b> kitty">test <b>have</b> ten <b>base</b></<b>a</b>>
note that having html in the text can royally screw it up. Also note that your original phrase doesn't match Johnny Miller. (location of the question marks)
Last edited by feyd on Fri Dec 31, 2004 10:26 am, edited 1 time in total.
jasongr
Forum Contributor
Posts: 206
Joined: Tue Jul 27, 2004 6:19 am

Post by jasongr »

I get a strange PHP error when I try your code:

Code: Select all

Warning: preg_replace() &#1111;function.preg-replace]: Compilation failed: missing ) at offset 8 in C:\www\test.php on line 5
where line 5 is the line:

Code: Select all

$phrase = preg_replace($convertFrom, $convertTo, $phrase);
what could be the problem?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

oops.. forgot to fix the backslashes.. fixed.
jasongr
Forum Contributor
Posts: 206
Joined: Tue Jul 27, 2004 6:19 am

Post by jasongr »

what?
what do you mean by fixed?
where can I see the fix you mentioned?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I edited my previous post.
jasongr
Forum Contributor
Posts: 206
Joined: Tue Jul 27, 2004 6:19 am

Post by jasongr »

thanks a lot for the help
I will now go look at some regular expression tutorials to understand how you solved it

thanks again
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

(?!...) (?=...) (?<!...) (?<=...) are forward (first 2) and back (last 2) references, or look-forwards and look-backs. They check the space around them for the contained pattern.
[] are metacharacters that denote a character class beginning and end, respectively. The a-z in them tells the pattern to match letters a through z.
* is a metacharacter that denotes a match against zero or more of the preceeding character or grouping.
? is a metacharacter that denotoes a match against zero or one of the preceeding character or grouping.
*? is a metacharacter combination that tells the match to be as small as possible.
jasongr
Forum Contributor
Posts: 206
Joined: Tue Jul 27, 2004 6:19 am

Post by jasongr »

what if I wanted to modify the defintion of the '?' character to mean
zero or 1 occurances instead of single occurance as I initialy requested?
what change do I need to make?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

$convertTo = array('&#1111;a-z]?','&#1111;a-z]*?','\\@');
Post Reply