Page 1 of 1
Extract unknown text from a table?
Posted: Wed Nov 01, 2006 6:04 pm
by SmokyBarnable
I am trying to get customers shipping address that are contained in emails imported into my database. I managed to parse the email and create a table 'message' that contains all the text of the email. So now I am wondering how to extract just the shipping address and nothing else? Is this possible? The only idea I have had so far is that the shipping address always appears under a line 'Buyer's shipping address'. Is there a way to just pull x amount of characters after that line to get the shipping address?
In other words I don't know exactly what to extract but I know exactly where it is.
Thanks for any help.
Posted: Wed Nov 01, 2006 7:06 pm
by feyd
strpos() +
substr(), or
preg_match().
Most addresses have a regular pattern that can be built into a regular expression.
Posted: Thu Nov 02, 2006 6:55 am
by SmokyBarnable
The email text for the message is in an array. I assume I need to convert to a string to use the functions you mentioned. Could I use the serialize command to do this?
Thanks.
Posted: Thu Nov 02, 2006 7:21 am
by feyd
implode() may be a better choice.
Posted: Thu Nov 02, 2006 10:21 pm
by SmokyBarnable
ok I got my address in a string using implode and used trim to get rid of white space. The string starts with a name so I am trying to get the first name. I used strpos to get the first space. Now I am trying to use substr to get every character to the left of the space. Am I on the right track or could there be a better way?
Thanks.
Posted: Thu Nov 02, 2006 10:33 pm
by feyd
A bit of math should get you there.
Posted: Fri Nov 03, 2006 4:28 am
by SmokyBarnable
My math was working for the first name however it seemed a little off when i tried to start from the end of the first name and go to the next space just after the last name. When I look at the string in zend it looks like there isn't a space between the last character of the first line and the first character of the second line. I then looked at the address as it appears in the email and notice the shipping address is 3 lines and there is a <br /> after each line. Is this causing the lines to merge without a space? Here is the code I am using to parse the email message.
Code: Select all
function parse_email ($email) {
// Split header and message
$header = array();
$message = array();
$is_header = true;
foreach ($email as $line) {
if ($line == '<HEADER> ' . "\r\n") continue;
if ($line == '<MESSAGE> ' . "\r\n") continue;
if ($line == '</MESSAGE> ' . "\r\n") continue;
if ($line == '</HEADER> ' . "\r\n") { $is_header = false; continue; }
if ($is_header == true) {
$header[] = $line;
} else {
$message[] = $line;
}
}
// Parse headers
$headers = array();
foreach ($header as $line) {
$colon_pos = strpos($line, ':');
$space_pos = strpos($line, ' ');
if ($colon_pos === false OR $space_pos < $colon_pos) {
// attach to previous
$previous .= "\r\n" . $line;
continue;
}
// Get key
$key = substr($line, 0, $colon_pos);
// Get value
$value = substr($line, $colon_pos+2);
$headers[$key] = $value;
$previous =& $headers[$key];
}
Posted: Fri Nov 03, 2006 1:26 pm
by feyd
It would appear your data is in XML style formatting. It may be easier/better to use one of the facilities PHP offers for such an occasion.
http://php.net/ref.simplexml
http://php.net/ref.dom
http://php.net/ref.domxml