You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am a freshman from Hong Kong and currently trying to find a way to read tables from PDF and work with its data.
I tried the following code with the PDF attached and obtained the results stored in the .txt file which I have also attached.
pdf = pdfquery.PDFQuery('Amazon_CF.pdf')
pdf.load()
pdf.tree.write('test.xml', pretty_print=True)
My questions are:
How are the index determined? It appears that the index order does not follow line-by-line order.
Are their any methods to re-arrange the index? Preferably in the order of line-by-line and left-to-right.
Hopefully my explanation is clear enough.
Any help would be greatly appreciated!
Cheers,
Simon
The text was updated successfully, but these errors were encountered:
SalmonTT
changed the title
How the pdfquery determine the index?
How does pdfquery determine the index?
Jun 13, 2018
Amazon_CF.pdf
Amazon.txt
Hi jcushman!
I am a freshman from Hong Kong and currently trying to find a way to read tables from PDF and work with its data.
I tried the following code with the PDF attached and obtained the results stored in the .txt file which I have also attached.
pdf = pdfquery.PDFQuery('Amazon_CF.pdf')
pdf.load()
pdf.tree.write('test.xml', pretty_print=True)
My questions are:
Hopefully my explanation is clear enough.
Any help would be greatly appreciated!
Cheers,
Simon
The text was updated successfully, but these errors were encountered: