-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in .csv version of some of the largest pdfs. #12
Comments
Hmmm. On reflection, one easier solution might be to just take the entries that we rejected from the OCR because they were too large and give them a manual check as to whether they would in fact OCR. I'd guess that most of the really large files are large because they are images, dirty faxes, etc. But a small minority would be the SumofUs, ColorofChange and CREDO petitions. (I don't have CREDO's .csv.) And then we could just let the OCR run on those huge files.... |
On Sat, Jun 29, 2013 at 08:29:12AM -0700, Peter Wagner wrote:
There are about 80 documents that failed to OCR for various reasons. I'll need
Depending on it's structure, we may be able to do it. Could you send me a copy -Gyepi |
On Sat, Jun 29, 2013 at 08:45:00AM -0700, Peter Wagner wrote:
Yes that would work. Unfortunately, we're not rejecting any documents; they -Gyepi |
I'll send these files to Gyepi via email. |
The SumOfUs and ColorofChange filings were 1000+ pages with ~50,000 comments. Those may have been too large to try to OCR as they aren't showing up in the search.
However, I have these in .csv format, and the ability to filter out the redundant comments from the ~5,000 original ones. That said, I'm not sure how to link that back to the individual pages of our system...
The text was updated successfully, but these errors were encountered: