Skip to content

Latest commit

 

History

History
12 lines (12 loc) · 437 Bytes

TODO.md

File metadata and controls

12 lines (12 loc) · 437 Bytes

PAVÉS 0.1.x

  • Reimplement PLAYA LayoutDict here using lazy API
  • Implement pdfminer.six LTTextBox compatibility and support its "customers"
    • sycamore
    • unstructured.io
    • OCRmyPDF
  • Implement and demonstrate CRF-based text extraction and segmentation
    • Find the friendliest and least dependency-encumbered Python CRF package
  • Implement pypdfium2-based visualization