Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partial updates #63

Open
codetheweb opened this issue Jun 13, 2021 · 4 comments
Open

Support partial updates #63

codetheweb opened this issue Jun 13, 2021 · 4 comments

Comments

@codetheweb
Copy link
Contributor

My goal is to have a folder of auto-updating ebooks (cron job). I saw that there's a --cache flag, but even running with the cache on there's a lot of unnecessary processing for any download after the initial one. Would it be possible to add some kind of partial update mode, where if an .epub already exists it checks for and downloads just 1-2 chapters instead of re-assembling the whole book?

I would be happy to add this myself, just want to hear your thoughts and where to begin implementing this.

(Also, you should setup GitHub sponsors / Buy Me a Coffee or something. Would be happy to throw a few bucks your way and I'm sure other folks would too. 😄)

@mathiasfoster
Copy link

+1

@kemayo
Copy link
Owner

kemayo commented Jan 2, 2022

I have been holding off on this because I think it's sort of complicated. To brain-dump what I think the complexities are:

  • I'd need to write new code to parse an existing epub. This isn't hard, but it's something I don't cover currently).
    • This isn't just getting the chapter list out -- there's also book-level metadata that needs to be rebuilt.
    • Specifically I'm thinking of footnotes as being a pain. They're currently built with ids just based on the number of footnotes found so far; that'd probably need to be changed to a GUID system like many other ids, and then we could just append new footnotes onto the end of an existing file.
  • I'd maybe need to write new code to alter the existing epub file rather than creating it from scratch. Alternately, document and warn that any edits you've made to metadata we don't explicitly handle will be overwritten.
  • Some sites wouldn't be compatible with this. A site that's using the crawler method (JSON following the next-chapter links through a story, rather than a table of contents) won't be able to pick up where it left off, unless we store more metadata in the ebooks to cover this case.
  • We'd need to decide how to handle matching up chapters on the server and locally. Do we just trust the numbering, or do we match on something more specific?
    • If the former, we'd have trouble when a chapter is deleted (e.g. some people put up placeholder chapters to announce delays/hiatuses).
    • If the latter, we'd have to decide what to do when a chapter vanishes from the server -- do we delete it from the local copy?
  • Probably an edge case, but this wouldn't fetch edits to an existing chapter.

@codetheweb
Copy link
Contributor Author

Yeah, after opening this I realized it would probably be a lot more complicated than I thought at first. I think using some kind of intermediate storage like an SQLite database or something might work better than trying to read back data from the generated epub.

Given that I really only need to update books once a day at most, I think just using the cache for now / scraping directly from the web works well enough for now.

@mathiasfoster
Copy link

I've put together a scraper that works off RSS feeds — downloads the content, turns into MOBI, and emails to my Kindle.
Still needs a bit of work before it's ready to be open sourced unfortunately!

From a user perspective (if this was ever to be integrated into leech) it would make more sense (for my use case) for each chapter to be converted into a new EPUB, rather than altering the combined EPUB to integrate the new chapter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants