Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF image file processing issue: bad RST marker error #15

Closed
gunnsth opened this issue Aug 3, 2017 · 8 comments
Closed

PDF image file processing issue: bad RST marker error #15

gunnsth opened this issue Aug 3, 2017 · 8 comments
Labels
bug Something isn't working encoding/image

Comments

@gunnsth
Copy link
Contributor

gunnsth commented Aug 3, 2017

Reported by Peter Williams:

I get an invalid JPEG format: bad RST marker error on this

This looks like an error in the Go JPEG library.

MFP Scan.pdf

@gunnsth
Copy link
Contributor Author

gunnsth commented Apr 18, 2019

Confirmed that this is still an issue in golang 1.12.4
Potentially related to golang/go#28717
which has a proposed fix and has been marked for Go1.13 milestone.

@gunnsth gunnsth transferred this issue from unidoc/unidoc May 23, 2019
@borud
Copy link

borud commented Aug 12, 2019

The Go project is in no particular hurry to fix this. Would it be possible to make a workaround for this? For instance to add a version of ExtractPageImages() which doesn't decode the images so that the developer can pass the data on to a JPEG decoder that actually works?

@peterwilliams97
Copy link
Contributor

Coincidentally, I am away from PDF this month working on image processing. Which Go JPEG decoder(s) do you recommend?

@borud
Copy link

borud commented Aug 13, 2019

In my current situation: any decoder that would do the job. :-). I don't have a particular preference.

The reason I think extracting the raw data might be a good way to do this is so the user can choose what to do. Ideally you would always want to use the standard library rather than add another dependency to maintain. But I've seen very old bug reports on the RST marker issue, and as I mentioned, it doesn't look like anyone is in any hurry to fix it in the standard library.

By adding the ability to extract images as just raw data you would broaden the API, but on the other hand, you wouldn't have to choose between JPEG decoders - potentially having to add another dependency.

If it is an easy fix one might be able to detect the problem and repair the JPEG prior to parsing it, but I haven't investigated this. It might take longer to do?

@gunnsth
Copy link
Contributor Author

gunnsth commented Aug 18, 2019

@borud There is already a way to change the image handling. By setting the model package model.ImageHandling (and model.SetImageHandler) to one that implements the model.ImageHandler interface. If you find a way to address this, please submit a PR.

We may at some point consider addressing those image handling issues, however, we have not really had any complaints from our customers at this point. The reason we had the interface implementation design is because we felt the performance of the standard library is rather poor in many cases in comparison to some C libraries.

It might also be a good idea to extract the problematic images and submit issues on the golang repository to encourage that those issues get addressed.

@gunnsth
Copy link
Contributor Author

gunnsth commented Jun 2, 2020

Might be fixed in 1.15 ? Issue golang/go#28717 has been closed

@gunnsth gunnsth added bug Something isn't working encoding/image labels Jun 2, 2020
@3ace
Copy link

3ace commented Aug 6, 2024

@gunnsth This issue is not yet fixed even on the Go version 1.22.5. But it should be fixed with Go 1.23, I've tested it using version 1.23rc2 and it worked fine.

This is the output of using pdf_extract_images.go:

using Go 1.22.5

Input file: MFP.Scan.pdf
PDF Num Pages: 1
-----
Page 1:
Error: invalid JPEG format: bad RST marker
exit status 1

using Go 1.23rc2

Input file: MFP.Scan.pdf
PDF Num Pages: 1
-----
Page 1:
1 Images
Image 1 - X: 4.80 Y: 3.54, Width: 852.48, Height: 587.52
Total: 1 images

Here's the extraction result mfp.zip

@gunnsth
Copy link
Contributor Author

gunnsth commented Aug 6, 2024

OK great, probably related to golang/go#40130 which appears fixed in 1.23

@gunnsth gunnsth closed this as completed Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working encoding/image
Projects
None yet
Development

No branches or pull requests

4 participants