Page 1 of 1

Invalid HTML - I dont get it??? :-(

Posted: Mon Dec 27, 2004 7:03 am
by Chris Corbyn
Hi,

What's this weirdness with putting and " = " then a new line in a HTML document when it should all be on the same line.

Example (just a cut and paste random section):
<body marginwidth=3D"4" marginheight=3D"4" topmargin=3D"4" leftmargin=3D"4"=
bgcolor=3D"white" vlink=3D"#0000ff" link=3D"#0000ff">
<table cellspacing=3D"0" cellpadding=3D"0" width=3D"600" border=3D"0">
<tr>
<td colspan=3D"2" height=3D"2" bgcolor=3D"#D6DCFE"><img src=3D"http://pics.=
ebaystatic.com/aw/pics/x.gif" width=3D"1" height=3D"2"></td>
</tr>
<tr valign=3D"top">
<td width=3D"600" bgcolor=3D"#D6DCFE">
<font size=3D"3" face=3D"Arial, Helvetica, sans-serif=09=09=09"><img src=3D=
"http://pics.ebaystatic.com/aw/pics/x.gif" width=3D"2" height=3D"1" alt=3D"=
" title=3D""><strong>Question about your item</strong></font>
I'm making an email app and some emails have the body encoded like this so the email doesn't display correctly.

I tried

Code: Select all

str_replace("=\r\n","",$email_body)
But I get the same output (also tried \n).

I got the correct layout with

Code: Select all

str_replace("=","",$email_body)
but thants no good since all attributes and image sources etc are broken.

Any clues or decoding functions?

Thanks :-)

Posted: Mon Dec 27, 2004 8:11 am
by feyd
that sure looks like invalid HTML to me....

Posted: Mon Dec 27, 2004 8:26 am
by Chris Corbyn
There's a lot of emails which look like it.

Look closer there are also " 3D "'s all over the place preceeding any " = " in attributes.

I seem to have fixed it by using this multiple str_replace

Code: Select all

str_replace("=3D","=",str_replace("=\r\n","",imap_fetchbody($mbox,$p,$HTML)))
But please please please.... If anybody knows why this is happening with some emails please advise and explain so I can solve properly. I am viewing my method as a scrappy fix.

Posted: Mon Dec 27, 2004 9:22 am
by feyd
the 3D's are what I was referring to as invalid html (when combined with your quoted values...)

Posted: Mon Dec 27, 2004 10:26 am
by Chris Corbyn
Sorry I'm being dumb here, is "invlaid HTML" a common problem whereby certain chars are replced with weird code?

I don't really understand if it's something I should be stumbling on or if there's a fix for it?

There are masses of code problems. Example

newline (\r\n) becomes =20
wrapped lines seem to have = at the end
gaps before attribute values show as =3D
Pound sign (£) shows as =A3

There must be a tonne of others. Looks like hex to me.

Do i need to do lots of laborious str_replaces to get rid of these chars or is there another, somewhat more foolproof way?

Cheers.

Posted: Mon Dec 27, 2004 11:45 am
by magicrobotmonkey
how are you building $email_body? I'd say its probably got something to do with that.

Posted: Mon Dec 27, 2004 12:15 pm
by Weirdan
3Ds are there because the message was encoded as 'quoted printable'. Use [php_man]quoted_printable_decode[/php_man] to get rid of these char sequences.

Posted: Mon Dec 27, 2004 12:16 pm
by Chris Corbyn
I'm using the imap_fetchbody() function and doing nothing else to it.

It gets the body but the html looks wrong.

As I say.... it only does it on a handful of emails.

Somebody in another forum has mentioned it could be to do with the transfer-encoding of the email. I don't know anything about that.

Code: Select all

echo imap_fetchbody($mbox,$msg,$part); //Part is 2 which corresponds to the html part and the rest is ok

Posted: Mon Dec 27, 2004 12:18 pm
by Chris Corbyn
Nice one thanks very much Weirdan!

I'll try that out and let you know.

How do I return the content-transfer encoding since I'll have to re-write to ensure individual messages are deocded with the correct method?

Posted: Mon Dec 27, 2004 12:23 pm
by Chris Corbyn
Ok I can confirm that this work thanks. Does anybody know if there are other encoding methods I need to be aware of in order to display messages correctly? 8O

Posted: Mon Dec 27, 2004 12:53 pm
by Weirdan
d11wtq wrote:Ok I can confirm that this work thanks. Does anybody know if there are other encoding methods I need to be aware of in order to display messages correctly? 8O
imap extension detects 6 encoding methods:

Code: Select all

0	7BIT
1	8BIT
2	BINARY
3	BASE64
4	QUOTED-PRINTABLE
5	OTHER
check out [php_man]imap_fetchstructure[/php_man] for more info

Posted: Mon Dec 27, 2004 1:28 pm
by Chris Corbyn
Brilliant thanks. I could see the number for the encoding but I had no idea what they corresponded to.

I'll make a little modification to get around it :-)

Posted: Mon Dec 27, 2004 1:30 pm
by Chris Corbyn
Oh, just spotted it in the manual.