Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF attachment is corrupt, is missing CR (0D) #9728

Open
2 tasks done
ner00 opened this issue Dec 12, 2024 · 9 comments
Open
2 tasks done

PDF attachment is corrupt, is missing CR (0D) #9728

ner00 opened this issue Dec 12, 2024 · 9 comments

Comments

@ner00
Copy link

ner00 commented Dec 12, 2024

Prerequisites

  • I have searched for duplicate or closed issues
  • I can recreate the issue with all plugins disabled

Describe the issue

Note: I did find this seemingly similar yet old discussion: https://www.roundcubeforum.net/index.php?topic=8742.0

The problem I'm facing with a particular incoming e-mail has to do with Roundcube producing a corrupt PDF attachment, wheather in the preview window or downloaded. The only difference between the proper PDF and the one outputted by Roundcube is that the latter is missing all carriage returns (0D), only has line feeds. What's also particularly striking about this particular e-mail is that the PDF is embedded in the EML with quoted-printable encoding instead of base64. Nevertheless, Thunderbird and SnappyMail don't seem to have an issue parsing the attachment correctly.

Example of the encoded attachment

------=_NextPart_000_0021_01DB4C75.CE93A9F0
Content-Type: application/pdf;
	name="=?iso-8859-1?Q?Filename.PDF?="
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="=?iso-8859-1?Q?Filename.PDF?="

%PDF-1.7
%=E2=E3=CF=D3
%generated by XFRX 17.0.0002, (c) 2002-2016 www.eqeus.com
15 0 obj
<<
/Type /Page
/Parent 2 0 R
/MediaBox [0 0 595.274 841.889]

I don't want to share the e-mail because the PDF in it has sensitive data, but I could share it privately if necessary.

What browser(s) are you seeing the problem on?

Firefox

What version of PHP are you using?

v8.1

What version of Roundcube are you using?

v1.6.9

JavaScript errors

No response

PHP errors

No response

@alecpl
Copy link
Member

alecpl commented Dec 12, 2024

Try $config['imap_disabled_caps'] = ['BINARY'];.

@ner00
Copy link
Author

ner00 commented Dec 12, 2024

Try $config['imap_disabled_caps'] = ['BINARY'];.

That didn't change anything, unfortunately.

@alecpl
Copy link
Member

alecpl commented Dec 12, 2024

We'll need a sample message. Indeed the problem might be in how we decode quoted-printable.

@ner00
Copy link
Author

ner00 commented Dec 12, 2024

We'll need a sample message. Indeed the problem might be in how we decode quoted-printable.

Can I send it to you privately somehow? It's a company document, I wouldn't want to post it here.

@ner00
Copy link
Author

ner00 commented Dec 13, 2024

It appears that the replacement regex might be the issue, at least that's the case with the e-mail I have:

$chunk = preg_replace('/[\t\r\0\x0B]+\n/', "\n", $chunk);

I can do either of these:

          $chunk = preg_replace('/[\t\0\x0B]+\n/', "\n", $chunk);
          $chunk = preg_replace('/[\t\r\0\x0B]+\n/', "\r\n", $chunk);

Not using preg_replace at all also worked for this example, although I'm not sure if any of these tests could create a regression in other cases.

@alecpl
Copy link
Member

alecpl commented Dec 17, 2024

All right. Using quoted-printable for pdf files is unusual, but maybe not necessarily invalid. Therefore we should fix it, either by removing that replacement, or doing this only for text/* parts.

@alecpl alecpl added this to the later milestone Dec 17, 2024
@ner00
Copy link
Author

ner00 commented Dec 17, 2024

Using quoted-printable for pdf files is unusual

It sure is unusual. I tried to replicate it using the oldest client I had at hand (WinXP's Outlook Express), but even that one used base64 to encode the attachment.

I also did grep over a bunch of EML files, from multiple mailboxes, for the string %PDF-1, and got matches for 224 e-mails among a total of 356.360 emails - all emails were received in the past 12 months. Upon closer inspection, most of those had a pdf Content-Transfer-Encoding: 7bit instead of Content-Transfer-Encoding: quoted-printable. As far as sender mail clients, they range from Outlook 2007 to Outlook 365.

What most quoted-printable pdfs have in common is that they were generated by EQEUS XFRX (Visual FoxPro report library used by some old ERPs) and embedded as attachment in a new email (probably through MAPI). Also, not all pdfs are visibly corrupt when parsed by RC, but that's a hit and miss depending on how much CRLF gets discarded.

@alecpl
Copy link
Member

alecpl commented Dec 26, 2024

I looked back into the code changes. Back in the day it was just rtrim($line, "\t\r\0\x0B"); . It got changed in 57a285d. So, it looks like it might be a fresh regression. I did not investigate yet whether rtrim()-based solution would work, not necessarily.

@ner00
Copy link
Author

ner00 commented Dec 30, 2024

Interesting. For the e-mail in question, as far as the attachment is concerned, $chunk = rtrim($chunk, "\t\r\0\x0B"); seems to work fine, although I'm unable to test a situation where it could fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants