Encoding Problems [message #186334] |
Wed, 02 July 2014 15:53 |
stef_204
Messages: 1 Registered: July 2014
Karma: 0
|
Junior Member |
|
|
Hi,
Newbie at php; please bear with me.
I am trying to use a script which basically creates an xml feed for email
messages found in an IMAP account; one can then either subscribe to feed
via an rss reader/aggregator or use browser to read html page.
Here is the script: <http://linuxtrove.com/wp/?p=209>
(imap2rss.php)
My problem: almost all of the messages (emails) show garbled text, I
believe due to encoding problems.
Here is a picture of how it looks both using a browser like Firefox or an
rss news feed reader.
<http://imagebin.org/314843>
The emails are legible using an email client but not via the xml and html
page created by php script.
This is what I first thought:
It looks like the emails are plain text but using utf-8 and when the feed
is created (and the html page); there is no recognition of the encoding
and it all gets garbled, etc.
Server (shared) is running PHP 5.2.17 on Apache/2.2.22.
PHP config --> default_charset: iso-8859-1
I cannot change the config on the server since it is shared hosting but I
can certainly modify the script to hopefully fix this issue.
I have tried to insert in the script:
ini_set( 'default_charset', 'UTF-8' );
htmlentities( $string, ENT_COMPAT, "UTF-8" );
header('Content-type: text/plain; charset=utf-8');
but no joy.
I have also tried to add to .htaccess IndexOptions +Charset=UTF-8 No joy
there either.
I could be completely wrong about the utf-8 issue; and perhaps it has
more something to do with decoding base64 in general.
This is what I now think is the problem, more specifically:
I have just tried a base64 online decoder and pasted the garbled text in
it and used the "decode" online feature, and the result is perfectly
legible once decoded, whether I choose utf-8 or ascii as charset (but I
should use utf-8).
So, looks like the feed is "echoed" or "printed" in base64 format....
It doesn't look like charset is the problem but decoding base64.
I can see the base64_decode function here
<http://www.php.net/manual/en/function.base64-decode.php>
but not sure if this is the right way to go about this; or how to apply
it in this script.
I am going about this fairly blind, I must say, and doing trial and error
which is just wrong....
Again, this is the script.
<http://linuxtrove.com/wp/?p=209>
Any pointers?
Tx.
|
|
|
Re: Encoding Problems [message #186335 is a reply to message #186334] |
Wed, 02 July 2014 19:15 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 7/2/2014 11:53 AM, stef_204 wrote:
> Hi,
>
> Newbie at php; please bear with me.
>
> I am trying to use a script which basically creates an xml feed for email
> messages found in an IMAP account; one can then either subscribe to feed
> via an rss reader/aggregator or use browser to read html page.
>
> Here is the script: <http://linuxtrove.com/wp/?p=209>
> (imap2rss.php)
>
> My problem: almost all of the messages (emails) show garbled text, I
> believe due to encoding problems.
>
> Here is a picture of how it looks both using a browser like Firefox or an
> rss news feed reader.
>
> <http://imagebin.org/314843>
>
> The emails are legible using an email client but not via the xml and html
> page created by php script.
>
> This is what I first thought:
> It looks like the emails are plain text but using utf-8 and when the feed
> is created (and the html page); there is no recognition of the encoding
> and it all gets garbled, etc.
>
> Server (shared) is running PHP 5.2.17 on Apache/2.2.22.
> PHP config --> default_charset: iso-8859-1
>
> I cannot change the config on the server since it is shared hosting but I
> can certainly modify the script to hopefully fix this issue.
>
> I have tried to insert in the script:
> ini_set( 'default_charset', 'UTF-8' );
> htmlentities( $string, ENT_COMPAT, "UTF-8" );
> header('Content-type: text/plain; charset=utf-8');
> but no joy.
>
> I have also tried to add to .htaccess IndexOptions +Charset=UTF-8 No joy
> there either.
>
> I could be completely wrong about the utf-8 issue; and perhaps it has
> more something to do with decoding base64 in general.
>
> This is what I now think is the problem, more specifically:
> I have just tried a base64 online decoder and pasted the garbled text in
> it and used the "decode" online feature, and the result is perfectly
> legible once decoded, whether I choose utf-8 or ascii as charset (but I
> should use utf-8).
>
> So, looks like the feed is "echoed" or "printed" in base64 format....
> It doesn't look like charset is the problem but decoding base64.
>
> I can see the base64_decode function here
> <http://www.php.net/manual/en/function.base64-decode.php>
> but not sure if this is the right way to go about this; or how to apply
> it in this script.
>
> I am going about this fairly blind, I must say, and doing trial and error
> which is just wrong....
>
> Again, this is the script.
> <http://linuxtrove.com/wp/?p=209>
>
> Any pointers?
>
> Tx.
>
Stef,
Just looking at the output, my immediate thought was "this is base64
encoded". That would match if it works in an email reader as a base64
encoded attachment.
I don't have the time to go through 300+ LOC to try to figure out what
you code is doing, but when sending as either XML or HTML, you need to
first base64_decode() the text. Once you've done that, apply
htmlentities() to the string to encode the HTML entities. Then send it
to the RSS or HTML feed.
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex(at)attglobal(dot)net
==================
|
|
|
|
Re: Encoding Problems [message #186337 is a reply to message #186334] |
Wed, 02 July 2014 20:25 |
Christoph Michael Bec
Messages: 207 Registered: June 2013
Karma: 0
|
Senior Member |
|
|
stef_204 wrote:
> Here is the script: <http://linuxtrove.com/wp/?p=209>
> (imap2rss.php)
>
> My problem: almost all of the messages (emails) show garbled text, I
> believe due to encoding problems.
Indeed, that seems to be a encoding problem. Not so much a character
encoding problem but rather a body transfer encoding problem. You will
most likely have to take into account the "encoding" property of
imap_fetchstructure()'s return value.
Please heed Thomas' advice. It's always nice to "know" who you're
talking to and to be able to send a private reply, if appropriate.
--
Christoph M. Becker
|
|
|
Re: Encoding Problems [message #186339 is a reply to message #186334] |
Thu, 03 July 2014 12:08 |
bill
Messages: 310 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
On 7/2/2014 11:53 AM, stef_204 wrote:
Are you the stef that translates HTML-kit into Dutch?
If you are, nice seeing you here.
--bill
|
|
|
|
Re: Encoding Problems [message #186349 is a reply to message #186334] |
Sat, 05 July 2014 01:16 |
Arno Welzel
Messages: 317 Registered: October 2011
Karma: 0
|
Senior Member |
|
|
stef_204, 2014-07-02 17:53:
> Hi,
>
> Newbie at php; please bear with me.
>
> I am trying to use a script which basically creates an xml feed for email
> messages found in an IMAP account; one can then either subscribe to feed
> via an rss reader/aggregator or use browser to read html page.
>
> Here is the script: <http://linuxtrove.com/wp/?p=209>
> (imap2rss.php)
>
> My problem: almost all of the messages (emails) show garbled text, I
> believe due to encoding problems.
>
> Here is a picture of how it looks both using a browser like Firefox or an
> rss news feed reader.
>
> <http://imagebin.org/314843>
This is base64 and as to be decoded.
[...]
> I could be completely wrong about the utf-8 issue; and perhaps it has
> more something to do with decoding base64 in general.
Yep - that's the point.
> This is what I now think is the problem, more specifically:
> I have just tried a base64 online decoder and pasted the garbled text in
> it and used the "decode" online feature, and the result is perfectly
> legible once decoded, whether I choose utf-8 or ascii as charset (but I
> should use utf-8).
>
> So, looks like the feed is "echoed" or "printed" in base64 format....
> It doesn't look like charset is the problem but decoding base64.
Yep.
>
> I can see the base64_decode function here
> <http://www.php.net/manual/en/function.base64-decode.php>
> but not sure if this is the right way to go about this; or how to apply
> it in this script.
Well - unfortunately there is no easy "put it in there and it works".
First of all: Just testing for the subtype "PLAIN" (is not enough. You
also have to check for the encoding.
As far i see, this should be added here:
if($msgStructure->subtype=="PLAIN")
$body = renderPlainText($body);
So extend that for the encoding:
if($msgStructure->subtype=="PLAIN")
{
switch($msgStructure->encoding)
{
case 4:
// Body text is quoted-printable encoded
$body = quoted_printable_decode($body);
break;
case 3:
// Body text is base64 encoded
$body = base64_decode($data);
break;
}
$body = renderPlainText($body);
}
Also see <http://php.net/manual/en/function.imap-fetchstructure.php> and
the comments there.
--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
|
|
|
Re: Encoding Problems [message #186350 is a reply to message #186349] |
Sat, 05 July 2014 01:19 |
Arno Welzel
Messages: 317 Registered: October 2011
Karma: 0
|
Senior Member |
|
|
Arno Welzel, 2014-07-05 03:16:
> stef_204, 2014-07-02 17:53:
>
>> Hi,
>>
>> Newbie at php; please bear with me.
>>
>> I am trying to use a script which basically creates an xml feed for email
>> messages found in an IMAP account; one can then either subscribe to feed
>> via an rss reader/aggregator or use browser to read html page.
>>
>> Here is the script: <http://linuxtrove.com/wp/?p=209>
>> (imap2rss.php)
>>
>> My problem: almost all of the messages (emails) show garbled text, I
>> believe due to encoding problems.
>>
>> Here is a picture of how it looks both using a browser like Firefox or an
>> rss news feed reader.
>>
>> <http://imagebin.org/314843>
>
> This is base64 and as to be decoded.
>
> [...]
>> I could be completely wrong about the utf-8 issue; and perhaps it has
>> more something to do with decoding base64 in general.
>
> Yep - that's the point.
>
>> This is what I now think is the problem, more specifically:
>> I have just tried a base64 online decoder and pasted the garbled text in
>> it and used the "decode" online feature, and the result is perfectly
>> legible once decoded, whether I choose utf-8 or ascii as charset (but I
>> should use utf-8).
>>
>> So, looks like the feed is "echoed" or "printed" in base64 format....
>> It doesn't look like charset is the problem but decoding base64.
>
> Yep.
>
>>
>> I can see the base64_decode function here
>> <http://www.php.net/manual/en/function.base64-decode.php>
>> but not sure if this is the right way to go about this; or how to apply
>> it in this script.
>
> Well - unfortunately there is no easy "put it in there and it works".
>
> First of all: Just testing for the subtype "PLAIN" (is not enough. You
> also have to check for the encoding.
>
> As far i see, this should be added here:
>
> if($msgStructure->subtype=="PLAIN")
> $body = renderPlainText($body);
>
> So extend that for the encoding:
>
> if($msgStructure->subtype=="PLAIN")
> {
> switch($msgStructure->encoding)
> {
> case 4:
> // Body text is quoted-printable encoded
> $body = quoted_printable_decode($body);
> break;
>
> case 3:
> // Body text is base64 encoded
> $body = base64_decode($data);
Ups - sorry for the c&p typo. Of course it should be:
$body = base64_decode($body);
> break;
> }
>
> $body = renderPlainText($body);
> }
>
> Also see <http://php.net/manual/en/function.imap-fetchstructure.php> and
> the comments there.
--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
|
|
|
Re: Encoding Problems [message #186352 is a reply to message #186349] |
Sat, 05 July 2014 01:28 |
Christoph Michael Bec
Messages: 207 Registered: June 2013
Karma: 0
|
Senior Member |
|
|
Arno Welzel wrote:
> So extend that for the encoding:
>
> if($msgStructure->subtype=="PLAIN")
> {
> switch($msgStructure->encoding)
> {
> case 4:
> // Body text is quoted-printable encoded
> $body = quoted_printable_decode($body);
> break;
>
> case 3:
> // Body text is base64 encoded
> $body = base64_decode($data);
> break;
> }
>
> $body = renderPlainText($body);
> }
What about a default clause, at least triggering a notice/warning that
the encoding is not understood?
--
Christoph M. Becker
|
|
|
Re: Encoding Problems [message #186353 is a reply to message #186352] |
Sat, 05 July 2014 23:36 |
Arno Welzel
Messages: 317 Registered: October 2011
Karma: 0
|
Senior Member |
|
|
Christoph Michael Becker, 2014-07-05 03:28:
> Arno Welzel wrote:
>
>> So extend that for the encoding:
>>
>> if($msgStructure->subtype=="PLAIN")
>> {
>> switch($msgStructure->encoding)
>> {
>> case 4:
>> // Body text is quoted-printable encoded
>> $body = quoted_printable_decode($body);
>> break;
>>
>> case 3:
>> // Body text is base64 encoded
>> $body = base64_decode($data);
>> break;
>> }
>>
>> $body = renderPlainText($body);
>> }
>
> What about a default clause, at least triggering a notice/warning that
> the encoding is not understood?
Good Point. But which other encoding except no encoding at all, base64
and or quoted printable may be used?
--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
|
|
|
Re: Encoding Problems [message #186354 is a reply to message #186353] |
Sun, 06 July 2014 02:16 |
Denis McMahon
Messages: 634 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On Sun, 06 Jul 2014 01:36:20 +0200, Arno Welzel wrote:
> Christoph Michael Becker, 2014-07-05 03:28:
>> What about a default clause, at least triggering a notice/warning that
>> the encoding is not understood?
> Good Point. But which other encoding except no encoding at all, base64
> and or quoted printable may be used?
multipart/form-data
It might not be expected in an email, but then the email might be
generated by someone looking to target some notional email application
which had an exploitable vulnerability in that it would try and decode
multipart/form-data in a manner that further allowed a carefully crafted
invalid data sequence to trigger arbitrary code execution.
Now that exploit might not even be targeting the code being discussed
here, but when script kiddie spam house sends out billions of emails in
an attempt to exploit that vulnerability, the chances are that one of
them will make its way into this processing chain.
Detecting and cleanly handling both unrecognised declared encoding types
and malformed encoded data is therefore probably good practice.
--
Denis McMahon, denismfmcmahon(at)gmail(dot)com
|
|
|
Re: Encoding Problems [message #186355 is a reply to message #186354] |
Sun, 06 July 2014 03:19 |
Richard Damon
Messages: 58 Registered: August 2011
Karma: 0
|
Member |
|
|
On 7/5/14, 10:16 PM, Denis McMahon wrote:
> On Sun, 06 Jul 2014 01:36:20 +0200, Arno Welzel wrote:
>
>> Christoph Michael Becker, 2014-07-05 03:28:
>
>>> What about a default clause, at least triggering a notice/warning that
>>> the encoding is not understood?
>
>> Good Point. But which other encoding except no encoding at all, base64
>> and or quoted printable may be used?
>
> multipart/form-data
>
> It might not be expected in an email, but then the email might be
> generated by someone looking to target some notional email application
> which had an exploitable vulnerability in that it would try and decode
> multipart/form-data in a manner that further allowed a carefully crafted
> invalid data sequence to trigger arbitrary code execution.
>
> Now that exploit might not even be targeting the code being discussed
> here, but when script kiddie spam house sends out billions of emails in
> an attempt to exploit that vulnerability, the chances are that one of
> them will make its way into this processing chain.
>
> Detecting and cleanly handling both unrecognised declared encoding types
> and malformed encoded data is therefore probably good practice.
>
multipart/form-date would be a value for Content-Type, not a value for
Content-Transfer-Encoding.
The defined vauls (By RFC 2045) of Content-Transfer-Encoding are:
quoted-printable
base64
binary
8bit
7bit
(Binary and 8bit are only allowed if the receiving server indicates it
is capable of handling it).
Binary, 8bit, and 7bit imply not transform should be performed on the
data to decode it.
|
|
|
Re: Encoding Problems [message #186356 is a reply to message #186355] |
Sun, 06 July 2014 05:21 |
gordonb.defz8
Messages: 1 Registered: July 2014
Karma: 0
|
Junior Member |
|
|
> multipart/form-date would be a value for Content-Type, not a value for
> Content-Transfer-Encoding.
>
> The defined vauls (By RFC 2045) of Content-Transfer-Encoding are:
> quoted-printable
> base64
> binary
> 8bit
> 7bit
Never forget that viruses and malware aren't required to obey the
rules, and you could very well end up with Content-Type: spam/virus
and Content-Transfer-Encoding: rot13 .
|
|
|
Re: Encoding Problems [message #186358 is a reply to message #186356] |
Sun, 06 July 2014 11:14 |
Richard Damon
Messages: 58 Registered: August 2011
Karma: 0
|
Member |
|
|
On 7/6/14, 1:21 AM, Gordon Burditt wrote:
>> multipart/form-date would be a value for Content-Type, not a value for
>> Content-Transfer-Encoding.
>>
>> The defined vauls (By RFC 2045) of Content-Transfer-Encoding are:
>> quoted-printable
>> base64
>> binary
>> 8bit
>> 7bit
>
> Never forget that viruses and malware aren't required to obey the
> rules, and you could very well end up with Content-Type: spam/virus
> and Content-Transfer-Encoding: rot13 .
>
And if you don't properly "decode" that payload there is a problem?
Yes, you don't want you program to crash on an improper value, and
somehow rejecting malformed messages is preferable, but ignoring the
error isn't bad (assuming your ultimate processing of the message is
well controlled)
|
|
|
Re: Encoding Problems [message #186359 is a reply to message #186353] |
Sun, 06 July 2014 13:30 |
Christoph Michael Bec
Messages: 207 Registered: June 2013
Karma: 0
|
Senior Member |
|
|
Arno Welzel wrote:
> Christoph Michael Becker, 2014-07-05 03:28:
>
>> Arno Welzel wrote:
>>
>>> So extend that for the encoding:
>>>
>>> if($msgStructure->subtype=="PLAIN")
>>> {
>>> switch($msgStructure->encoding)
>>> {
>>> case 4:
>>> // Body text is quoted-printable encoded
>>> $body = quoted_printable_decode($body);
>>> break;
>>>
>>> case 3:
>>> // Body text is base64 encoded
>>> $body = base64_decode($data);
>>> break;
>>> }
>>>
>>> $body = renderPlainText($body);
>>> }
>>
>> What about a default clause, at least triggering a notice/warning that
>> the encoding is not understood?
>
> Good Point. But which other encoding except no encoding at all, base64
> and or quoted printable may be used?
The PHP manual documents 6 values for the transfer encodings[1].
Particularly 2 (BINARY) and 5 (OTHER) seem to demand some further
handling (if only to ignore the body in these cases, what might be
necessary to avoid potential vulnerabilities).
[1] <http://www.php.net/manual/en/function.imap-fetchstructure.php>
--
Christoph M. Becker
|
|
|
Re: Encoding Problems [message #186360 is a reply to message #186359] |
Sun, 06 July 2014 19:46 |
Arno Welzel
Messages: 317 Registered: October 2011
Karma: 0
|
Senior Member |
|
|
Christoph Michael Becker, 2014-07-06 15:30:
> Arno Welzel wrote:
>
>> Christoph Michael Becker, 2014-07-05 03:28:
>>
>>> Arno Welzel wrote:
>>>
>>>> So extend that for the encoding:
>>>>
>>>> if($msgStructure->subtype=="PLAIN")
>>>> {
>>>> switch($msgStructure->encoding)
>>>> {
>>>> case 4:
>>>> // Body text is quoted-printable encoded
>>>> $body = quoted_printable_decode($body);
>>>> break;
>>>>
>>>> case 3:
>>>> // Body text is base64 encoded
>>>> $body = base64_decode($data);
>>>> break;
>>>> }
>>>>
>>>> $body = renderPlainText($body);
>>>> }
>>>
>>> What about a default clause, at least triggering a notice/warning that
>>> the encoding is not understood?
>>
>> Good Point. But which other encoding except no encoding at all, base64
>> and or quoted printable may be used?
>
> The PHP manual documents 6 values for the transfer encodings[1].
> Particularly 2 (BINARY) and 5 (OTHER) seem to demand some further
> handling (if only to ignore the body in these cases, what might be
> necessary to avoid potential vulnerabilities).
>
> [1] <http://www.php.net/manual/en/function.imap-fetchstructure.php>
Thanks for the clarification - that's the URL I also referred to
originally ;-)
--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
|
|
|
Re: Encoding Problems [message #186361 is a reply to message #186360] |
Sun, 06 July 2014 20:21 |
Richard Damon
Messages: 58 Registered: August 2011
Karma: 0
|
Member |
|
|
On 7/6/14, 3:46 PM, Arno Welzel wrote:
> Christoph Michael Becker, 2014-07-06 15:30:
>
>> Arno Welzel wrote:
>>
>>> Christoph Michael Becker, 2014-07-05 03:28:
>>>
>>>> Arno Welzel wrote:
>>>>
>>>> > So extend that for the encoding:
>>>> >
>>>> > if($msgStructure->subtype=="PLAIN")
>>>> > {
>>>> > switch($msgStructure->encoding)
>>>> > {
>>>> > case 4:
>>>> > // Body text is quoted-printable encoded
>>>> > $body = quoted_printable_decode($body);
>>>> > break;
>>>> >
>>>> > case 3:
>>>> > // Body text is base64 encoded
>>>> > $body = base64_decode($data);
>>>> > break;
>>>> > }
>>>> >
>>>> > $body = renderPlainText($body);
>>>> > }
>>>>
>>>> What about a default clause, at least triggering a notice/warning that
>>>> the encoding is not understood?
>>>
>>> Good Point. But which other encoding except no encoding at all, base64
>>> and or quoted printable may be used?
>>
>> The PHP manual documents 6 values for the transfer encodings[1].
>> Particularly 2 (BINARY) and 5 (OTHER) seem to demand some further
>> handling (if only to ignore the body in these cases, what might be
>> necessary to avoid potential vulnerabilities).
>>
>> [1] <http://www.php.net/manual/en/function.imap-fetchstructure.php>
>
> Thanks for the clarification - that's the URL I also referred to
> originally ;-)
Looking at your original code, your base64 path is converting $data to
$body, while the other paths are $body to $body.
The RFC defines binary as a raw encoding, meaning the message holds the
desired byte stream. The difference between it and 7bit and 8bit is that
in addition to using all values similar to 8bit, but also is allowed to
have nulls(0), and CR(13) and LF(10) don't delimit lines (which no
longer have the 998 byte length limit). If renderPlainText can't handle
that sort of data, maybe you should discard encoding binary, but then
just because the message doesn't say it is binary, doesn't force it to
obey (unless your MTA checks and enforces this), so renderPlainText
should do something "valid" for these cases anyway (even if it is just
outputting nothing).
Similarly, "Other" probably means that the encoding wasn't validly
specified, so you might want to reject, but you don't need to (as you
should be able to handle in some manner what ever "garbage" is sent to
you, even if it be rejecting or outputting nothing).
|
|
|
Re: Encoding Problems [message #186363 is a reply to message #186358] |
Mon, 07 July 2014 01:24 |
Denis McMahon
Messages: 634 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On Sun, 06 Jul 2014 07:14:06 -0400, Richard Damon wrote:
> On 7/6/14, 1:21 AM, Gordon Burditt wrote:
>>> multipart/form-date would be a value for Content-Type, not a value for
>>> Content-Transfer-Encoding.
>>> The defined vauls (By RFC 2045) of Content-Transfer-Encoding are:
>>> quoted-printable base64 binary 8bit 7bit
>> Never forget that viruses and malware aren't required to obey the
>> rules, and you could very well end up with Content-Type: spam/virus and
>> Content-Transfer-Encoding: rot13 .
> And if you don't properly "decode" that payload there is a problem?
No, but doing something sensible and clean in the face of unexpected data
values is better than just bombing out.
My original points are and remain that (a) the values you get for the
content-transfer-encoding might not be in your list, and (b) that the
actual content-transfer-encoding might not match the declared content-
transfer-encoding.
This combination of factors means that having a default that assumes "if
it didn't match anything else it must be x" is a bad idea. It also means
that it's a good idea to expect and try and cleanly detect any content
decoding errors, on the assumption that at some point malformed content
will arrive and you want to handle it in a manner that, at the very
least, doesn't create a vulnerability.
--
Denis McMahon, denismfmcmahon(at)gmail(dot)com
|
|
|
Re: Encoding Problems [message #186365 is a reply to message #186361] |
Mon, 07 July 2014 01:49 |
Arno Welzel
Messages: 317 Registered: October 2011
Karma: 0
|
Senior Member |
|
|
Richard Damon, 2014-07-06 22:21:
> On 7/6/14, 3:46 PM, Arno Welzel wrote:
>> Christoph Michael Becker, 2014-07-06 15:30:
>>
>>> Arno Welzel wrote:
>>>
>>>> Christoph Michael Becker, 2014-07-05 03:28:
>>>>
>>>> > Arno Welzel wrote:
>>>> >
>>>> >> So extend that for the encoding:
>>>> >>
>>>> >> if($msgStructure->subtype=="PLAIN")
>>>> >> {
>>>> >> switch($msgStructure->encoding)
>>>> >> {
>>>> >> case 4:
>>>> >> // Body text is quoted-printable encoded
>>>> >> $body = quoted_printable_decode($body);
>>>> >> break;
>>>> >>
>>>> >> case 3:
>>>> >> // Body text is base64 encoded
>>>> >> $body = base64_decode($data);
>>>> >> break;
>>>> >> }
>>>> >>
>>>> >> $body = renderPlainText($body);
>>>> >> }
>>>> >
>>>> > What about a default clause, at least triggering a notice/warning that
>>>> > the encoding is not understood?
>>>>
>>>> Good Point. But which other encoding except no encoding at all, base64
>>>> and or quoted printable may be used?
>>>
>>> The PHP manual documents 6 values for the transfer encodings[1].
>>> Particularly 2 (BINARY) and 5 (OTHER) seem to demand some further
>>> handling (if only to ignore the body in these cases, what might be
>>> necessary to avoid potential vulnerabilities).
>>>
>>> [1] <http://www.php.net/manual/en/function.imap-fetchstructure.php>
>>
>> Thanks for the clarification - that's the URL I also referred to
>> originally ;-)
>
> Looking at your original code, your base64 path is converting $data to
> $body, while the other paths are $body to $body.
Yep - that's why I corrected this fault in my follow-up in
<53B7529D(dot)9050002(at)arnowelzel(dot)de>.
This was just meant as suggestion how one could handle the diferent
content-transfer-encodings, not tested code ready to use. Therefore I
also mentioned the PHP manual for further reading.
Of course one should add a default case to handle "unknown"
content-transfer-encodings.
--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
|
|
|
Re: Encoding Problems [message #186387 is a reply to message #186334] |
Fri, 11 July 2014 16:47 |
Arno Welzel
Messages: 317 Registered: October 2011
Karma: 0
|
Senior Member |
|
|
stef_204, 2014-07-10 15:57:
> On Sat, 05 Jul 2014 03:19:25 +0200, Arno Welzel wrote:
>
>>> So extend that for the encoding:
>>>
>>> if($msgStructure->subtype=="PLAIN")
>>> {
>>> switch($msgStructure->encoding)
>>> {
>>> case 4:
>>> // Body text is quoted-printable encoded $body =
>>> quoted_printable_decode($body); break;
>>>
>>> case 3:
>>> // Body text is base64 encoded $body = base64_decode($data);
>>
>> Ups - sorry for the c&p typo. Of course it should be:
>>
>> $body = base64_decode($body);
>>
>>> break;
>>> }
>>>
>>> $body = renderPlainText($body);
>>> }
>
> Arno,
>
> The above seems to work. Thanks.
> I still get a little bit of garbled text due to charset utf-8 (I believe)
> but we are now 99% better on the $body.
Just keep in mind, that my example is not complete and just a suggestion
how to start - there should also be a case to handle text with transfer
encoding which does not need decoding at all and a default case to
handle unknown encodings.
--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
|
|
|
Re: Encoding Problems [message #186396 is a reply to message #186334] |
Fri, 18 July 2014 09:29 |
Tim Streater
Messages: 328 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
In article <lqao35$8v5$1(at)dont-email(dot)me>, stef_204 <notvalid(at)nomail(dot)nul>
wrote:
> Sorry to ask but I am struggling with the "subject" part of the email.
>
> I tried to find a fix but not joy, yet.
>
> I can decode the base64 encoded subject of each email individually by
> adding a: mb_decode_mimeheader as follows, but that's really just a
> "hack" and not proper, IMHO.
>
> And that only decodes $subject on the html page produced for individual
> emails, not the top level html page/rss feed which lists all of the
> emails.
>
> The subjects there are still reading:
> "=?UTF-8?B?"InsertGarbledText (base64) here"=?=
These are called "encoded words". You can read about it in RFC2047, or
look up the WikiPedia article on MIME (in caps). You'll have to write
some PHP to decode those. In general the format is:
introducer: =?
charset: UTF-8 (in this case)
separator: ?
coding: B for base64, Q for quoted printable
separator: ?
encoded text follows
terminator: ?=
That will allow you to pick the item apart and know what to do with it.
BTW, if you are doing stuff with emails, there's no substitute for
reading the RFCs and understanding how emails are put together. That's
what I did as part of the process for writing my own email client. RFCs
2045, 2046, 2047, 2048, and 2049 are a good place to start.
--
"People don't buy Microsoft for quality, they buy it for compatibility
with what Bob in accounting bought last year. Trace it back - they buy
Microsoft because the IBM Selectric didn't suck much" - P Seebach, afc
|
|
|
Re: Encoding Problems [message #186397 is a reply to message #186334] |
Fri, 18 July 2014 10:05 |
Tim Streater
Messages: 328 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
In article <lqapur$mnb$1(at)dont-email(dot)me>, stef_204 <notvalid(at)nomail(dot)nul>
wrote:
> On Fri, 18 Jul 2014 10:29:32 +0100, Tim Streater wrote:
>
>> BTW, if you are doing stuff with emails, there's no substitute for
>> reading the RFCs and understanding how emails are put together. That's
>> what I did as part of the process for writing my own email client. RFCs
>> 2045, 2046, 2047, 2048, and 2049 are a good place to start.
>
> I agree with you--I'm just really pressed at the moment on this work
> project unfortunately and need to get it up and running ASAP.
> Not ideal at all, obviously, but coding/programming is only incidental to
> my work and not my main work.
Um, I see the problem. Would it help if I emailed you the function I
put together for the purpose?
--
"People don't buy Microsoft for quality, they buy it for compatibility
with what Bob in accounting bought last year. Trace it back - they buy
Microsoft because the IBM Selectric didn't suck much" - P Seebach, afc
|
|
|
Re: Encoding Problems [message #186398 is a reply to message #186334] |
Fri, 18 July 2014 13:32 |
Tim Streater
Messages: 328 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
In article <lqb17k$9bb$1(at)dont-email(dot)me>, stef_204 <notvalid(at)nomail(dot)nul>
wrote:
> On Fri, 18 Jul 2014 11:05:47 +0100, Tim Streater wrote:
>
>> Um, I see the problem. Would it help if I emailed you the function I put
>> together for the purpose?
>
> Sure, let's give it a shot and see if I am able to integrate it in to the
> script to resolve the issue.
OK - on its way, let me know if you don't receive it.
--
"People don't buy Microsoft for quality, they buy it for compatibility
with what Bob in accounting bought last year. Trace it back - they buy
Microsoft because the IBM Selectric didn't suck much" - P Seebach, afc
|
|
|
Re: Encoding Problems [message #186401 is a reply to message #186334] |
Sat, 19 July 2014 07:36 |
Arno Welzel
Messages: 317 Registered: October 2011
Karma: 0
|
Senior Member |
|
|
stef_204, 2014-07-18 11:06:
> On Fri, 11 Jul 2014 18:47:55 +0200, Arno Welzel wrote:
>
>> Just keep in mind, that my example is not complete and just a suggestion
>> how to start - there should also be a case to handle text with transfer
>> encoding which does not need decoding at all and a default case to
>> handle unknown encodings.
>
> Arno,
>
> Sorry to ask but I am struggling with the "subject" part of the email.
>
> I tried to find a fix but not joy, yet.
>
> I can decode the base64 encoded subject of each email individually by
> adding a: mb_decode_mimeheader as follows, but that's really just a
> "hack" and not proper, IMHO.
>
> And that only decodes $subject on the html page produced for individual
> emails, not the top level html page/rss feed which lists all of the
> emails.
>
> The subjects there are still reading:
> "=?UTF-8?B?"InsertGarbledText (base64) here"=?=
imap_mime_header_decode() may help.
See <http://php.net/manual/en/function.imap-mime-header-decode.php>
--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
|
|
|