find in file_get_contents

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
bimo
Forum Contributor
Posts: 100
Joined: Fri Apr 16, 2004 11:18 pm
Location: MD

find in file_get_contents

Post by bimo »

does 'nobr' or 'o' have any special meaning in regular expressions? I'm trying to use preg_match_all to pull out chunks that look like

Code: Select all

<p class=g>(<a href=http://...</font></nobr>
but every time I get to the 'o' in nobr it stops (I'm using a regular expression tool to build it called Regex Coach which shows me what a given pattern will match in a given string).
Here's my pattern thus far

Code: Select all

<p class=g>(<a href=http://&#1111;a-z1-9./?_=()&]*) onmousedown=&#1111;a-z1-9"' (),]*(>&#1111;a-z1-9./?_=()& -á-ú]*)


when I use it on a page like:
ousedown="return clk(this,'res',29)">iPodGeneration : Le podcasting est partout</a><font size=-1> - [ <a href=http://translate.google.com/translate?h ... D%26sa%3DN class=fl>Translate this page</a> ]</font><br><font size=-1><b>...</b> 27] Bonjour, Il ya des podcast francophones (au delà de la simple diffusion de fichiers<br>
musicaux): http://blog.saint-elie.com http://www. Voir <b>...</b>
<br><font color=#008000>www.ipodgeneration.com/fr/actu/805/ - 29k - </font><nobr> <a class=fl href="http://64.233.161.104/search?q=cache:hU ... >Cached</a> - <a class=fl href="/search?hl=en&lr=&q=related:www.ipodgeneration.com/fr/actu/805/">Si ... obr></font> <p class=g><a href=http://www.mesblogs.com/syndication.php3?id_syndic=114 onmousedown="return clk(this,'res',28)">mesblogs.com - Articles de Le blog à Ollie</a><font size=-1> - [ <a href=http://translate.google.com/translate?h ... D%26sa%3DN class=fl>Translate this page</a> ]</font><br><font size=-1><b>...</b> 4 février • Voeux chinois 2005 - 4 février • Interview - 4 février • WP 1.5 Gamma -<br>
4 février • Gouranga - 4 février • <b>Podcasteur</b>#7 - 3 février <b>...</b>
<br><font color=#008000>www.mesblogs.com/syndication.php3?id_syndic=114 - 41k - </font><nobr> <a class=fl href="http://64.233.161.104/search?q=cache:sE ... >Cached</a> - <a class=fl href="/search?hl=en&lr=&q=related:www.mesblogs.com/syndication.php3%3Fid_ ... obr></font>

<p class=g><a href=http://www.ipodgeneration.com/fr/actu/805/ onmousedown="return clk(this,'res',29)">iPodGeneration : Le podcasting est

it matches only the red part and I want it to go to the </nobr>.

Does anyone know what the problem is?

Thanks
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

you want to tell us which modifiers you are using with it?

the pattern you posted will not match the example you want to match. Modifiers inside a pattern only work under very certain circumstances. Your expression does not meet any of those circumstances.
User avatar
bimo
Forum Contributor
Posts: 100
Joined: Fri Apr 16, 2004 11:18 pm
Location: MD

Post by bimo »

I don't think that I'm using any modifiers. If you mean things like PREG_PATTERN_ORDER then I'm not using any.

The expression that I showed above is not formatted for php yet. Had it been put into my php script, I would have done something like,

Code: Select all

$pattern = addslashes('#<p class=g>(<a href=http://&#1111;a-z1-9./?_=()&]*) onmousedown=&#1111;a-z1-9"' (),]*(>&#1111;a-z1-9./?_=()& -á-ú]*)#i');
(I want to make theexpressions in the '()''s get put into the array as two separate elements - I'm not positive that you can pull multiple patterns out of a target and make a 2-d array but I could have sworn that I read that you could)

I am writing this pattern because right now I am using four different ones (going from including all links to select links by weeding out ones that I don't want). Yesterday I did some reading and found out that the first pattern of the four was way too greedy so now I'm going back to the beginning and writing it so that it is more "picky" from the start.

Here's the code I'm changing:

Code: Select all

<form method="get" action="pod_search5.php" name="pod_search">
	<input type="text" name="terms" id="terms" />
	<input type="hidden" name="target" id="target" value="http://www.google.com/search" />
	<input type="submit" value="find mph-cast" />
</form>

<?php
$search_terms = $_GET&#1111;'terms'];
$target_engine = $_GET&#1111;'target'];
$search_terms = str_replace(" ", "+", $search_terms);

$guy = web_search($search_terms, $target_engine);

function web_search($terms, $target) 
&#123;
   
	if($terms) 
	&#123;		
		$query = array();
		
		$query = "$target?hl=en&num=100&lr=&q=$terms";
		print($query . "<br>");
		
		$result = file_get_contents($query);
		//print($result);
		
		// gets all anchor tags on page
		// preg_match(pattern, string, container)
		$pattern = addslashes('#<a .*</a*>#i');
		$pattern2 = addslashes('#(<a .*href="http://.*</a>)#i');
		$pattern3 = addslashes('#(&nbsp;&nbsp;&nbsp;&nbsp;<a (0|&#1111;a-z1-9= -_\/"'':?.+&])*)>#i');
		print("pattern" . $pattern3 . "<br>");
		$pattern4 = addslashes('#<a(0|&#1111;a-z1-9= -_\/"'':?.+&])*Translate this page</a>#i');		
		preg_match_all($pattern, $result, $links);
		
		$pagelinks = array();
		$num = 0;
		
		for($i=0;$i<count($links&#1111;0]);$i++)
		&#123;		
			//print($links&#1111;0]&#1111;$i]);
			//if(preg_grep($pattern2 ,$links))
			if(!strpos($links&#1111;0]&#1111;$i], "http://")) continue;
			else &#123;
				$temp = preg_replace($pattern3, '', $links&#1111;0]&#1111;$i]);
				$temp2 = preg_replace('/<a.*Similar&nbsp;pages<\/a>.*<\/nobr>.*<\/font\>/i', '', $temp);
				//$temp3 = preg_replace($pattern4, '', $temp2);
				$pagelinks&#1111;$num] = preg_replace('/<a.*Cached.*<\/a>&#1111;^<]/i', '', $temp2);				 				
				print("<br />" . $num . " " . $pagelinks&#1111;$num]); //($pagelinks&#1111;$num] . "<br />"); 
				$num++; 
			&#125;
		&#125;
		
		//print($links&#1111;1]&#1111;1]);		
	&#125;		
	else print("enter search term");
&#125;
?>
[/i]
Post Reply