Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when accessing PdfDocument.AcroForm.Fields #213

Open
eZprava opened this issue Nov 22, 2024 · 3 comments
Open

Error when accessing PdfDocument.AcroForm.Fields #213

eZprava opened this issue Nov 22, 2024 · 3 comments
Labels
Cannot Reproduce https://xkcd.com/583/

Comments

@eZprava
Copy link

eZprava commented Nov 22, 2024

I tried to load the attached file and iterate over pdf.AcroForm.Fields. At first I encountered an error:

No appropriate constructor found for type: PdfAcroFieldCollection at PdfSharp.Pdf.PdfDictionary.DictionaryElements.CreateArray(Type type, PdfArray oldArray)

So I added a constructor it was looking for:
public PdfAcroFieldCollection(PdfDocument document): base(document){ }

Error "solved", but next one appeared:
'Object already in table.' at PdfSharp.Pdf.Advanced.PdfCrossReferenceTable.Add(PdfObject value)

I tried to change Add to ObjectTable[value.ObjectID]=value.ReferenceNotNull;, then there was no exception, but pdf.AcroForm.Fields were empty.

21a352e0-dd03-4855-a1e5-82fb3690493c.pdf

Steps to reproduce:
dotnet new console -n PdfSharpBug
cd PdfSharpBug
dotnet add package PdfSharp
code .

using System.Net;
using PdfSharp.Pdf.Advanced;
using PdfSharp.Pdf.IO;

using (var ms = new MemoryStream(new WebClient().DownloadData("https://github.com/user-attachments/files/17873789/21a352e0-dd03-4855-a1e5-82fb3690493c.pdf")))
using (var pdf = PdfReader.Open(ms, PdfDocumentOpenMode.Import))
{
    foreach (PdfReference fieldReference in pdf.AcroForm.Fields)
    {
        Console.WriteLine(fieldReference.ToString());
    }
}

dotnet run

Observed result:
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
at PdfSharp.Pdf.PdfDictionary.DictionaryElements.CreateArray(Type type, PdfArray oldArray)
at PdfSharp.Pdf.PdfDictionary.DictionaryElements.GetValue(String key, VCF options)
at PdfSharp.Pdf.AcroForms.PdfAcroForm.get_Fields()
at Program.

$(String[] args) in C:\git\PdfSharpBug\Program.cs:line 8

@ThomasHoevel ThomasHoevel added the Cannot Reproduce https://xkcd.com/583/ label Nov 25, 2024
@ThomasHoevel
Copy link
Member

Please consider using the Issue Submission Template:
https://docs.pdfsharp.net/General/Issue-Reporting.html

@eZprava
Copy link
Author

eZprava commented Nov 25, 2024

I added steps needed to reproduce the NullReferenceException, hope it helps.

@packdat
Copy link

packdat commented Dec 15, 2024

When i added the attached document to my test-files and run my usual tests on it, I noticed this message in PDFsharp's log-output:

Error [0]: Object '43 0' already exists in xref table’s references, referring to position 160538. The latter one referring to position 159770 is used. This should not occur. If somebody came here, please send us your PDF file so that we can fix it (issues (at) pdfsharp.net.

So there you have your document, as requested ! 😉

I did some digging and found that object 43 0 is indeed defined twice in the document:

  • a PdfAcroForm that was added by an incremental update
  • an XRef-stream in the previous version of the document (before the update)

As the file is read from back to front, the AcroForm-reference is read first.
When the XRef-stream is read, the library detects an already existing reference with this ID and (falsely) assumes, the existing one must also be an XRef-stream.
But in this case the existing reference is an AcroForm !
The library nonetheless overwrites the file-position of the AcroForm with the position of the XRef-stream.
When finally resolving the AcroForm-reference, the XRef-stream is read instead, resulting in an AcroForm with no fields.

I would say, this issue is two-fold:

  • some software used to apply an incremental update that assigns incorrect object-ids
  • PDFsharp making wrong assumptions

packdat added a commit to packdat/PDFsharp-net6 that referenced this issue Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cannot Reproduce https://xkcd.com/583/
Projects
None yet
Development

No branches or pull requests

3 participants