Regular expression turns invalid once it enters PHP

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
dahwan
Forum Newbie
Posts: 3
Joined: Thu Aug 20, 2009 12:10 pm

Regular expression turns invalid once it enters PHP

Post by dahwan »

Code: Select all

<?php
    $link = $_POST["link"];
    $foo = file($link);
    $data = "";
 
    $data = preg_grep("[\w-.]+?@([\w-]+?\.)+[\w]{2,}", $foo);
    
    print_r($data)
?>
This script is designed to extract the email adresses from any page. The expression is valid and i have tested it in several regex testers, f.inst. http://gskinner.com/RegExr/

But when i try to use it in php i get this error message:
Warning: preg_grep() [function.preg-grep]: Unknown modifier '+' in /home/dahwan/public_html/emailextractor/emailextractor.php on line 6
Am i using it wrong?

Any help appreciated
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: Regular expression turns invalid once it enters PHP

Post by jackpf »

With PCRE functions in PHP, you need to start and end your expression with a non alphanumeric character.

Like:

Code: Select all

<?php
    $link = $_POST["link"];
    $foo = file($link);
    $data = "";
 
    $data = preg_grep("/[\w-.]+?@([\w-]+?\.)+[\w]{2,}/", $foo);
   
    print_r($data)
?>
dahwan
Forum Newbie
Posts: 3
Joined: Thu Aug 20, 2009 12:10 pm

Re: Regular expression turns invalid once it enters PHP

Post by dahwan »

Thanks for the quick reply. At least php doesn't crash now :P But i'm getting unexpected results. When i tried this in the regex tester, i found all the emails perfectly. But in php, it seems, it grabs the whole line. This is the result i got
<td width="50">&nbsp;<a href="mailto:fjernlaan-nbo@nb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:innlaan@nb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:musikk-oslo@nb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:samkat@nb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:dag.t.henriksen@uis.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:filmbibliotek@nb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:depot-fjernlaan@nb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:bib-hald@hiof.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:bib.krsund@himolde.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:bib-sarp@hiof.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:bib-fred@hiof.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:bib-figur@hiof.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:post@ostfoldforskning.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:firmapost@ij.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:biblioteket@so-hf.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:info@frambu.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:len@sormarka.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:biblioteket@umb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:karin.lyngmo@umb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:library.noragric@umb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:bibliotek@skogoglandskap.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:anne.ombustvedt@umb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:liv.korslund@umb.no">E-post</a>&nbsp;</td>
<br /><td width="50">&nbsp;<a href="mailto:plantehelse.bibl@bioforsk.no">E-post</a>&nbsp;</td>
<br />
I was expecting a list with the emails only. Could anyone shed some light on this?

Thanks

EDIT: On second thought, the function of preg_grep is probably to return the array elements that contains a match. Any idea how i can cut the extra html out of there?
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: Regular expression turns invalid once it enters PHP

Post by jackpf »

Well, normally strip_tags(), but that would strip out the link, which includes the email address, which is what I presume you want to keep...

What does this output?

Code: Select all

preg_match_all('/\a href\=\"mailto\:(.*?)\"/', $data, $matches);
 
print_r($matches);
dahwan
Forum Newbie
Posts: 3
Joined: Thu Aug 20, 2009 12:10 pm

Re: Regular expression turns invalid once it enters PHP

Post by dahwan »

Actually i got it working.

Code: Select all

<?php
    $link = $_POST["link"];
    $foo = file($link);
    $pattern = "/[\w-.]+?@([\w-]+?\.)+[\w]{2,}/";
 
    $data = preg_grep($pattern, $foo);
    
    $formattedstring = "";
    
    foreach($data as $piece)
    {
        $matches[] = 0;
        
        preg_match_all($pattern, $piece, $matches);
        
        $formattedstring .= $matches[0][0] . "<br />\n";
    }
    
    echo $formattedstring;
?>
I know it's a little messy, and it wont work if there are several email addresses pr line, but I'll cross that bridge if it comes, and return to this forum. Thanks for priceless help!
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: Regular expression turns invalid once it enters PHP

Post by jackpf »

Cool, no problem.
Post Reply