Skip to content
This repository has been archived by the owner on Jan 20, 2021. It is now read-only.

Spreadsheet extractor: failing case #42

Closed
jazzido opened this issue Dec 7, 2013 · 4 comments
Closed

Spreadsheet extractor: failing case #42

jazzido opened this issue Dec 7, 2013 · 4 comments
Assignees

Comments

@jazzido
Copy link
Contributor

jazzido commented Dec 7, 2013

https://www.dropbox.com/s/0i6ae5kgtcy0frb/s-013163.pdf

There seem to be invisible lines, or lines close together.

@ghost ghost assigned jeremybmerrill Dec 7, 2013
@jeremybmerrill
Copy link
Member

There are three little L-shaped things, two pixels to a side, at various rotations on the table on page 5. These appear to cause two problems:

  1. These throw off the fancy computational geometry and we end up with a too-small cell. i.e. in the diagram below, we get only a and not something like b (or both or something). (The content of b is totally ignored right now.)
 _________
|_a_|  b       |
|_________ |
  1. These throw off add_merged_cells! for some reason that I'm investigating. It's not a case I had really considered, I guess, since Excel doesn't allow you to make cells like that.

I don't have a good way to measure width (until #40), but I do wonder if the lines' width is 2px, so the extra lines are "invisible".

In the short term, excluding those lines makes everything work.

@jazzido
Copy link
Contributor Author

jazzido commented Dec 9, 2013

I started this branch
a few days ago, with the idea of using the maximum character size as a
cut-off value for what an usable line would be.

The rationale behind that is that a ruling line shouldn't be
shorter/narrower than the smaller character in the area of interest.

I don't know if that would work. Ideas?

@jeremybmerrill
Copy link
Member

Sounds plausible, for sure. We'd want to discard them after collapse_.*_lines, in case small lines make up what appear to be a larger line.

@jazzido
Copy link
Contributor Author

jazzido commented Dec 18, 2013

Works now.

@jazzido jazzido closed this as completed Dec 18, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants