- Reimplement PLAYA LayoutDict here using lazy API
- Implement pdfminer.six LTTextBox compatibility and support its
"customers"
-
sycamore
-
unstructured.io
-
OCRmyPDF
-
- Implement and demonstrate CRF-based text extraction and
segmentation
- Find the friendliest and least dependency-encumbered Python CRF package
- Implement pypdfium2-based visualization