Help with regex

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
mjseaden
Forum Contributor
Posts: 458
Joined: Wed Mar 17, 2004 5:49 am

Help with regex

Post by mjseaden »

Hi,

I'm reading an HTML document into $html using fread(). I know this is working, because the $html var I have checked to contain the code of this page. I also know it contains the string,

http://www.spanish-property-partnership.biz

that I am searching for.

I've tried using the following regex, but it doesn't seem to be matching:

Code: Select all

preg_match( '/www\.spanish-property-partnership\.biz/', $html, $matches );
$matches is empty.

Can anyone tell me where I'm going wrong with this regex?

Many thanks

Mark
mjseaden
Forum Contributor
Posts: 458
Joined: Wed Mar 17, 2004 5:49 am

Post by mjseaden »

Hi - i've tried the above regex with '#' instead of '/' but it still doesn't detect the string.

Mark
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Try escaping your hyphens (Shouldn't need to though).

Also, regex is case sensitive so you may need the "i" flag.

I see nothing wrong with the regex itself. Perhaps you could show exactly how you're using it ?
mjseaden
Forum Contributor
Posts: 458
Joined: Wed Mar 17, 2004 5:49 am

Post by mjseaden »

Hi,

It's spidering a URL stored in a MySQL database as so:

Code: Select all

while( $row = mysql_fetch_array( $result ) )
{
    $fp = fopen( $row['TargettedLinkReciprocalURL'], 'r' );

    // Assume HTML won't be more than 30K, otherwise chances are there's too many links
    // on the page to be worth it anyway
    $html = fread( $fp, 30000 );

    if( preg_match( '/www\.spanish-property-partnership\.biz/', $html, $matches, 'i' ) == 0 )
    {
        // Remove link
        echo 'Link remove: '.$row['TargettedLinkURL'];
    }
    else
    {
        echo 'Link left alone - present';
    }
    
    fclose( $fp );
}
I know that $html is populated properly as I've echo'd it. I've tried using escape for '-' but it still doesn't seem to work. I also know it's all read into $html as it is less than 30K in size.

Mark
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

So what happens if you print_r($matches) ?
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

as we've mentionned more than enough: get that book on regular expressions....

btw, there is nothing wrong, because you didn't ask for a chunk to be matched in the regular expression.... so there are not matches expected...
mjseaden
Forum Contributor
Posts: 458
Joined: Wed Mar 17, 2004 5:49 am

Post by mjseaden »

Right, so does someone care to put my lazy, unlearning, forum undisciplined ass to account and tell me what is wrong with this regex? I've been amazed over the last few days the pompousness of some members of this forum in answer to some pretty basic queries - we've got the usual Forum/Totalitarian state mentality occurring here, which is a shame because this is one of the few forums where I haven't seen it before.

Mark
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

timvw.... just because he didn't specify a chunk to extract doesn't mean nothing is extracted (the zeroth index is the whole string)...

Example:

Code: Select all

$string = '1234567 Words 89';
preg_match('/[a-z]+/i', $string, $matches);

echo '<pre>';
print_r($matches);
echo '</pre>';
Ouputs:

Code: Select all

Array (
    &#1111;0] =&gt; Words
)
mjseaden What happened in the print_r() ???
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

why not just do...

Code: Select all

while($row = mysql_fetch_assoc($result))
{
    $fp = fopen( $row['TargettedLinkReciprocalURL'], 'r' );
 
    // Assume HTML won't be more than 30K, otherwise chances are there's too many links
    // on the page to be worth it anyway
    $html = fread( $fp, 30000 );
 
    if (preg_match('/www\.spanish-property-partnership\.biz/', $html))//matches found so remove link
    {
        // Remove link
        echo 'Link remove: '.$row['TargettedLinkURL'];
    }
    else//no matches found for that url so leave the link alone
    {
        echo 'Link left alone - present';
    }
    
    fclose( $fp );
}
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

d11wtq wrote:timvw.... just because he didn't specify a chunk to extract doesn't mean nothing is extracted (the zeroth index is the whole string)...

Code: Select all

$string = '1234567 Words 89';
preg_match('/[a-z]+/i', $string, $matches);
print_r($matches);
My mistake sorry. This is even clearly mentionned in the php manual too... (After a first glance, i couldn't find it in the PCRE manual...)
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
A bit further in that manual i also read:
preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1 time because preg_match() will stop searching after the first match

So, in the case something is matched, preg_match returns something != 0
This explains why the == 0 branch is not executed.

But shiznatix noticed that already :)
Post Reply