You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PowerPoint file text extraction leaves a lot to be desired. It's a little over simplified and doesn't find text that isn't directly in the ppt/slides/ directory. Should it do this?
The text was updated successfully, but these errors were encountered:
Thanks for opening this issue! Yeah, I agree that it should find text anywhere on the slides. I also found an example of a slide with text directly on it that the text extraction feature doesn't find: test4.pptx
I don't know how to go about this though. I'm an even bigger noob, having just learned Rust this spring semester in one of my classes. But I would guess we need to find out where in a PPTX text can legally be located. It seems really daunting though.
yeah the open-xml specification is absurd. The powerpoint extractor would probably have to read the actual documentation for PresentationML to really figure it all out.
Realistically, the extractor will just have to be incrementally updated as the crate gets updated to parse it better and better
The PowerPoint file text extraction leaves a lot to be desired. It's a little over simplified and doesn't find text that isn't directly in the
ppt/slides/ directory
. Should it do this?The text was updated successfully, but these errors were encountered: