- implement CMap parsing for Encoding CMaps
- add "default" as a synonym of badly-named "user" space
- update
pdfplumber
branch and runpdfplumber
tests in CI- reimplement on top of ContentObject
- Fix ToUnicode CMaps for CID fonts (file bug against pdfminer)
-
decode_text
is remarkably slow -
render_char
andrender_string
are also quite slow - add something inbetween
chars
and full bbox for TextObject (what do you actually need for heuristic or model-based extraction? probably justadv
?) - remove the rest of the meaningless abuses of
cast
- document how to transform bbox attributes on StructElement, Destination, etc (but you should just use "default" space)
- deprecate LayoutDict
- make the structure tree lazy
- support ExtGState (submit PR to pdfminer)
- better API for document outline, destinations, links, etc
- test coverage and more test coverage
- support matching ActualText to text objects when possible
- if the text object is a single MCS (LibreOffice will do this)