Help extracting email address from .html file

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
talfstad
Forum Newbie
Posts: 2
Joined: Mon Mar 02, 2009 2:00 pm

Help extracting email address from .html file

Post by talfstad »

I am trying to create a php script which will read in a file (.html) and then echo only the email addresses onto the screen.

Here's an example of what the .html file looks like:
**********************************************************

Code: Select all

<tr valign="top">
                      <td>African Student Drama Association </td>
 
                      <td>Through the common interest of art, foster unity among students and scholars at SDSU. </td>
                      <td>Adeyinka Glover </td>
                      <td>afdeyinkaglover2005@yahoo.com</td>
      </tr>
                    <tr valign="top">
                      <td><span style="font-family:times new roman;font-size:16px;">Air Force ROTC, Detachment 075 Honor Guard "The Nighthawks"</span> </td>
*********************************************************

I am trying to ideally echo only the "afdeyinkaglover2005@yahoo.com back onto the screen.

Here is the code I've created:
***********************************************************

Code: Select all

<?php
$file = "./test2.txt";
$handle = @fopen($file, "r");
$reg = '/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/';
 
if ($handle) 
{
    
    while (!feof($handle)) {
        $buffer = fgetss($handle,4096); 
    }
    
    
if(preg_match_all($reg, $buffer, $matches)) {   
    foreach( $matches as $val => $i) {
            echo $val[$i];          
    }
                        
        } else {
            echo "no emails in file";
        }   
    
    fclose($handle);
}
?>
**************************************************
This code returns "no emails in file". I am new to PHP, but am a coder.. just feel a little lost. Can anyone please help?


Thank you
atonalpanic
Forum Commoner
Posts: 29
Joined: Mon Mar 02, 2009 10:20 pm

Re: Help extracting email address from .html file

Post by atonalpanic »

Hello,

Try this as an version of your code that will work for the problem you gave:
$regex = "/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b/";

I've added a-z ranges because this PCRE is case sensitive. Alternatively
you could try:

$regex = "/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i";

If you want a complete solution, I recommend something along the lines of this. I think this is the correct article:
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
talfstad
Forum Newbie
Posts: 2
Joined: Mon Mar 02, 2009 2:00 pm

Re: Help extracting email address from .html file

Post by talfstad »

Thanks! I'm new to the regx syntax and also php. I ended up getting this to work well enough for my application.. you can see my code at:

http://www.phpfreaks.com/forums/index.p ... 204.0.html

I appreciate the help.


Trevor
Post Reply