Page 1 of 1

Extract unknown text from a table?

Posted: Wed Nov 01, 2006 6:04 pm
by SmokyBarnable
I am trying to get customers shipping address that are contained in emails imported into my database. I managed to parse the email and create a table 'message' that contains all the text of the email. So now I am wondering how to extract just the shipping address and nothing else? Is this possible? The only idea I have had so far is that the shipping address always appears under a line 'Buyer's shipping address'. Is there a way to just pull x amount of characters after that line to get the shipping address?

In other words I don't know exactly what to extract but I know exactly where it is.

Thanks for any help.

Posted: Wed Nov 01, 2006 7:06 pm
by feyd
strpos() + substr(), or preg_match().

Most addresses have a regular pattern that can be built into a regular expression.

Posted: Thu Nov 02, 2006 6:55 am
by SmokyBarnable
The email text for the message is in an array. I assume I need to convert to a string to use the functions you mentioned. Could I use the serialize command to do this?

Thanks.

Posted: Thu Nov 02, 2006 7:21 am
by feyd
implode() may be a better choice.

Posted: Thu Nov 02, 2006 10:21 pm
by SmokyBarnable
ok I got my address in a string using implode and used trim to get rid of white space. The string starts with a name so I am trying to get the first name. I used strpos to get the first space. Now I am trying to use substr to get every character to the left of the space. Am I on the right track or could there be a better way?

Thanks.

Posted: Thu Nov 02, 2006 10:33 pm
by feyd
A bit of math should get you there.

Posted: Fri Nov 03, 2006 4:28 am
by SmokyBarnable
My math was working for the first name however it seemed a little off when i tried to start from the end of the first name and go to the next space just after the last name. When I look at the string in zend it looks like there isn't a space between the last character of the first line and the first character of the second line. I then looked at the address as it appears in the email and notice the shipping address is 3 lines and there is a <br /> after each line. Is this causing the lines to merge without a space? Here is the code I am using to parse the email message.

Code: Select all

function parse_email ($email) { 
    // Split header and message 
    $header = array(); 
    $message = array(); 

    $is_header = true; 
    foreach ($email as $line) { 
        if ($line == '<HEADER> ' . "\r\n") continue; 
        if ($line == '<MESSAGE> ' . "\r\n") continue; 
        if ($line == '</MESSAGE> ' . "\r\n") continue; 
        if ($line == '</HEADER> ' . "\r\n") { $is_header = false; continue; } 

        if ($is_header == true) { 
            $header[] = $line; 
        } else { 
            $message[] = $line; 
        } 
    } 

    // Parse headers 
    $headers = array(); 
    foreach ($header as $line) { 
        $colon_pos = strpos($line, ':'); 
        $space_pos = strpos($line, ' '); 

        if ($colon_pos === false OR $space_pos < $colon_pos) { 
            // attach to previous 
            $previous .= "\r\n" . $line; 
            continue; 
        } 

        // Get key 
        $key = substr($line, 0, $colon_pos); 

        // Get value 
        $value = substr($line, $colon_pos+2); 
        $headers[$key] = $value; 
        
        $previous =& $headers[$key]; 
    }

Posted: Fri Nov 03, 2006 1:26 pm
by feyd
It would appear your data is in XML style formatting. It may be easier/better to use one of the facilities PHP offers for such an occasion.

http://php.net/ref.simplexml
http://php.net/ref.dom
http://php.net/ref.domxml