Page 1 of 1
extract part of an email body
Posted: Wed Aug 19, 2009 9:30 am
by aaaaaaaa
Hello,
I would like to extract only the newest written part of an email's body, and not the older messages. An e-mail looks like :
Code: Select all
new part of the message
>part of the message
>that comes from the previous email
>that we don't want
. However, the email can have other way to put old messages, depending of the client and the user's config.
Is there any function from a library that can do that ?
------------ ------------ ------------
You' ll have some extra points if you could give me wich IRC channel do you use for php or web-related topics ?
Beside, what news tech site do you check ?
Thank in advance,
Bye,
Cedric
Re: extract part of an email body
Posted: Wed Aug 19, 2009 9:39 am
by Ollie Saunders
Is there any function from a library that can do that ?
Probably yes. You
may be able to write it from scratch quicker than you can find and integrate a library to do it. Difficult call.
You could use preg_replace() and /^>.*$/ to remove the lines beginning with ">".
Re: extract part of an email body
Posted: Wed Aug 19, 2009 9:39 am
by lord_webby
Code: Select all
<?php
//get the position of part of the message
$pos = strpos($message, ">part of the message");
function truncate($text, $limit = 25, $ending = '...') {
if (strlen($text) > $limit) {
$text = strip_tags($text);
$text = substr($text, 0, $limit);
$text = substr($text, 0, -(strlen(strrchr($text, ' '))));
$text = $text . $ending;
}
return $text;
}
//cut the end off
$text = truncate ($message, $position, "");
echo $text;
?>
Think that'll work - got a feeling there's another php function to do it though.
Re: extract part of an email body
Posted: Wed Aug 19, 2009 9:46 am
by Ollie Saunders
lord_webby, that won't work where:
Code: Select all
$message = "I can't believe he actually gave this massive argument why 4 is actually > 9 and then told me my maths degree was worthless. What a douche!";
Sorry!
Re: extract part of an email body
Posted: Wed Aug 19, 2009 10:17 am
by lord_webby
You need to have a common string at the start of every message to search for.
Re: extract part of an email body
Posted: Wed Aug 19, 2009 10:19 am
by lord_webby
you could try searching for three newline characters followed by a ">" for example
Re: extract part of an email body
Posted: Wed Aug 19, 2009 10:34 am
by Ollie Saunders
lord_webby wrote:You need to have a common string at the start of every message to search for.
Why?
Re: extract part of an email body
Posted: Wed Aug 19, 2009 10:45 am
by lord_webby
Ollie Saunders wrote:lord_webby wrote:You need to have a common string at the start of every message to search for.
Why?
To use the function above to chop off the old messages - you need some way of discerning what bits of the message are old. The strpos function requires a "needle".
Re: extract part of an email body
Posted: Tue Sep 01, 2009 12:15 pm
by aaaaaaaa
Thank you for your help Ollie Saunders and lord_webby.
However, it seems I haven't been clear enough. So, here is an other explanation with further examples.
What I would like to do is to extract different parts of an e-mail's body : the newest answer to the message, the answer that has been replied to,... In other word, I would like to split an e-mail, that is the different messages from a discussion between two people.
Sometime, answers look like that :
Code: Select all
Hi
bla bla
----- Original Message -----
From: <mail@mail.com>
To: "Name" <name@ploc.co.uk>
Sent: Saturday, August 32, 2019 11:56 PM
Subject: Re: where is my mind ?
Hi again dear you,
BLA !
I would then like to extract
But the problem is that depending on the language, the client, and the specific configuration of the user, answers can have other layouts, like
Code: Select all
here is the answer
Le 6 août 2015 21:47, <mee@mail.fr> a écrit :
Ah ben trop tard O_o
And the first message
and sometime, the newest part of the message is underneath, and not on the top of the e-mail.
It might be useful to know that every message that is received is stored in a database.
Re: extract part of an email body
Posted: Tue Sep 01, 2009 4:38 pm
by Ollie Saunders
It might be useful to know that every message that is received is stored in a database.
Yes, it is. Take the message you want to split up; retrieve from the database the message that it replies to; save both messages as files; use the filenames as arguments to the UNIX diff command and retain the output it produces. To get the two parts you're after, process that output selecting only the lines that begin with '>' and then, the same this time selecting only lines that begin with '<'. Delete the files you created. There are standard PHP function for creating and managing temporary files, splitting and searching strings, and executing shell commands.
If you don't like this method you could use a
diff library for PHP. My preference is for UNIX diff because I know it works and it's already there.
Re: extract part of an email body
Posted: Wed Sep 02, 2009 3:47 am
by aaaaaaaa
Thank you Ollie Saunders.
I will indeed try this.
Just a practical question : if I use unix commands, won't it be slower than the php functions (assuming that they do the same thing) ?
Re: extract part of an email body
Posted: Wed Sep 02, 2009 4:12 am
by lord_webby
The difference should be negligible. And which is faster depends on the command - but I imagine linux is probably faster in general (php runs on linux so in general I think it should be slower) - but unless you working with a hundred thousand files I think you'll be alright.
