Page 1 of 1

Help with regex

Posted: Mon May 02, 2005 10:19 am
by mjseaden
Hi,

I'm reading an HTML document into $html using fread(). I know this is working, because the $html var I have checked to contain the code of this page. I also know it contains the string,

http://www.spanish-property-partnership.biz

that I am searching for.

I've tried using the following regex, but it doesn't seem to be matching:

Code: Select all

preg_match( '/www\.spanish-property-partnership\.biz/', $html, $matches );
$matches is empty.

Can anyone tell me where I'm going wrong with this regex?

Many thanks

Mark

Posted: Mon May 02, 2005 11:17 am
by mjseaden
Hi - i've tried the above regex with '#' instead of '/' but it still doesn't detect the string.

Mark

Posted: Mon May 02, 2005 11:18 am
by Chris Corbyn
Try escaping your hyphens (Shouldn't need to though).

Also, regex is case sensitive so you may need the "i" flag.

I see nothing wrong with the regex itself. Perhaps you could show exactly how you're using it ?

Posted: Mon May 02, 2005 11:35 am
by mjseaden
Hi,

It's spidering a URL stored in a MySQL database as so:

Code: Select all

while( $row = mysql_fetch_array( $result ) )
{
    $fp = fopen( $row['TargettedLinkReciprocalURL'], 'r' );

    // Assume HTML won't be more than 30K, otherwise chances are there's too many links
    // on the page to be worth it anyway
    $html = fread( $fp, 30000 );

    if( preg_match( '/www\.spanish-property-partnership\.biz/', $html, $matches, 'i' ) == 0 )
    {
        // Remove link
        echo 'Link remove: '.$row['TargettedLinkURL'];
    }
    else
    {
        echo 'Link left alone - present';
    }
    
    fclose( $fp );
}
I know that $html is populated properly as I've echo'd it. I've tried using escape for '-' but it still doesn't seem to work. I also know it's all read into $html as it is less than 30K in size.

Mark

Posted: Mon May 02, 2005 12:34 pm
by Chris Corbyn
So what happens if you print_r($matches) ?

Posted: Mon May 02, 2005 12:36 pm
by timvw
as we've mentionned more than enough: get that book on regular expressions....

btw, there is nothing wrong, because you didn't ask for a chunk to be matched in the regular expression.... so there are not matches expected...

Posted: Mon May 02, 2005 1:58 pm
by mjseaden
Right, so does someone care to put my lazy, unlearning, forum undisciplined ass to account and tell me what is wrong with this regex? I've been amazed over the last few days the pompousness of some members of this forum in answer to some pretty basic queries - we've got the usual Forum/Totalitarian state mentality occurring here, which is a shame because this is one of the few forums where I haven't seen it before.

Mark

Posted: Mon May 02, 2005 3:57 pm
by Chris Corbyn
timvw.... just because he didn't specify a chunk to extract doesn't mean nothing is extracted (the zeroth index is the whole string)...

Example:

Code: Select all

$string = '1234567 Words 89';
preg_match('/[a-z]+/i', $string, $matches);

echo '<pre>';
print_r($matches);
echo '</pre>';
Ouputs:

Code: Select all

Array (
    &#1111;0] =&gt; Words
)
mjseaden What happened in the print_r() ???

Posted: Mon May 02, 2005 4:31 pm
by shiznatix
why not just do...

Code: Select all

while($row = mysql_fetch_assoc($result))
{
    $fp = fopen( $row['TargettedLinkReciprocalURL'], 'r' );
 
    // Assume HTML won't be more than 30K, otherwise chances are there's too many links
    // on the page to be worth it anyway
    $html = fread( $fp, 30000 );
 
    if (preg_match('/www\.spanish-property-partnership\.biz/', $html))//matches found so remove link
    {
        // Remove link
        echo 'Link remove: '.$row['TargettedLinkURL'];
    }
    else//no matches found for that url so leave the link alone
    {
        echo 'Link left alone - present';
    }
    
    fclose( $fp );
}

Posted: Mon May 02, 2005 5:03 pm
by timvw
d11wtq wrote:timvw.... just because he didn't specify a chunk to extract doesn't mean nothing is extracted (the zeroth index is the whole string)...

Code: Select all

$string = '1234567 Words 89';
preg_match('/[a-z]+/i', $string, $matches);
print_r($matches);
My mistake sorry. This is even clearly mentionned in the php manual too... (After a first glance, i couldn't find it in the PCRE manual...)
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
A bit further in that manual i also read:
preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1 time because preg_match() will stop searching after the first match

So, in the case something is matched, preg_match returns something != 0
This explains why the == 0 branch is not executed.

But shiznatix noticed that already :)