Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Issue #2 #89

Closed
wants to merge 17 commits into from
Closed

Fix for Issue #2 #89

wants to merge 17 commits into from

Conversation

Jemeni11
Copy link
Contributor

@Jemeni11 Jemeni11 commented Feb 22, 2023

EDIT: Fix for Issue #2
Here's a snippet of the new README


Images support

Leech creates EPUB 2.01 files, which means that Leech can only save images in the following
format:

  • JPEG (JPG/JFIF)
  • PNG
  • GIF

See the Open Publication Structure (OPS) 2.0.1 for more information.

Leech can not save images in SVG because it is not supported by Pillow.

Leech uses Pillow for image manipulation and conversion. If you want to use a different
image format, you can install the required dependencies for Pillow and you will probably have to tinker with Leech. See the Pillow documentation for more information.

By default, Leech will try and save all non-animated images as JPEG.
The only animated images that Leech will save are GIFs.

To configure image support, you will need to create a file called leech.json. See the section below for more information.

Configuration

A very small amount of configuration is possible by creating a file called leech.json in the project directory. Currently you can define login information for sites that support it, and some options for book covers.

Example:

{
    "logins": {
        "QuestionableQuesting": ["username", "password"]
    },
    "images": true,
    "image_format": "png",
    "compress_images": true,
    "max_image_size": 100000,
    "cover": {
        "fontname": "Comic Sans MS",
        "fontsize": 30,
        "bgcolor": [20, 120, 20],
        "textcolor": [180, 20, 180],
        "cover_url": "https://website.com/image.png"
    },
    "output_dir": "/tmp/ebooks",
    "site_options": {
        "RoyalRoad": {
            "output_dir": "/tmp/litrpg_isekai_trash"
        }
    }
}

Note: The images key is a boolean and can only be true or false. Booleans in JSON are written in lowercase.
If it is false, Leech will not download any images.
Leech will also ignore the image_format key if images is false.

Note: If the image_format key does not exist, Leech will default to jpeg.
The three image formats are jpeg, png, and gif. The image_format key is case-insensitive.

Note: The compress_images key tells Leech to compress images. This is only supported for jpeg and png images.
This also goes hand-in-hand with the max_image_size key. If the compress_images key is true but there's no max_image_size key,
Leech will compress the image to a size less than 1MB (1000000 bytes). If the max_image_size key is present, Leech will compress the image
to a size less than the value of the max_image_size key. The max_image_size key is in bytes.
If compress_images is false, Leech will ignore the max_image_size key.

Warning: Compressing images might make Leech take a lot longer to download images.

Warning: Compressing images might make the image quality worse.

Warning: max_image_size is not a hard limit. Leech will try to compress the image to the size of the max_image_size key, but Leech might
not be able to compress the image to the exact size of the max_image_size key.

Warning: max_image_size should not be too small. For instance, if you set max_image_size to 1000, Leech will probably not be able to
compress the image to 1000 bytes. If you set max_image_size to 1000000, Leech will probably be able to compress the image to 1000000 bytes.

Warning: Leech will not compress GIFs, that might damage the animation.


Old:

Partial Fix for Issue #2

Thanks to @IdanDor for this pull request.

Specifically, added image_selector for arbitrary sites that allows selecting img tags from chapters, downloading them and embedding them within the resulting epub.
In the case of Pale, this means that the character banners and extra materials do not require an internet connection to view.
Also made the two pale.json's more consistent (pale.json now correctly includes the title of the chapters).
#84 (comment)

This doesn't work for other sites (like fiction.live) so I did this:

else:
  soup = BeautifulSoup(chapter.contents, 'html5lib')
  for count, img in enumerate(soup.find_all('img')):
    img_contents = get_image_from_url(img['src']).read()
    chapter.images.append(Image(
      path=f"images/ch{i}_leechimage_{count}.png",
      contents=img_contents,
      content_type='image/png'
    ))
    img['src'] = f"../images/ch{i}_leechimage_{count}.png"
    if not img.has_attr('alt'):
      img['alt'] = f"Image {count} from chapter {i}"   

It builds up on @IdanDor code as well since it adds all the images it can find to the chapter.images list:

# Add all pictures on this chapter as well.
for image in chapter.images:
  # For/else syntax, check if the image path already exists, if it doesn't add the image.
  # Duplicates are not allowed in the format.
  for other_file in chapters:
     if other_file.path == image.path:
          break
      else:
          chapters.append(EpubFile(path=image.path, contents=image.contents, filetype=image.content_type))

I only tested this with stories from fiction.live but they've all worked fine.
I also ran the epubs made through epubcheck and there were no fatals only minor errors.

Just like you wrote in the linked issue, I thought it should something one can somehow disable.
And the selector simply matches in my mind what the codebase does with every other "choice".
#84 (comment)

I would not even know where to start with making images an option which is why I called this a partial fix

IdanDor and others added 7 commits January 25, 2021 21:02
Specifically, added image_selector for arbitrary sites that allows
selecting img tags from chapters, downloading them
and embedding them within the resulting epub.

In the case of Pale, this means that the character banners and
extra materials do not require an internet connection to view.

Also made the two pale.json's more consistent (pale.json now correctly
includes the title of the chapters).
…here is no way of disabling this option and this was only tested with stories from fiction.live

BREAKING CHANGE:
@Jemeni11
Copy link
Contributor Author

Jemeni11 commented Feb 25, 2023

Ah there's a problem with this
The png format is huge so in a story with many images, you can end up with a massive epub file.
So maybe some image compression is needed as well?
And conversion to jpg/jpeg which is a lot smaller?

EDIT: No really, I accidentally downloaded a story that was 1.5 GB in size so be careful 😆

@Jemeni11
Copy link
Contributor Author

Turns out on fiction.live, you can have an empty image tag. Just <img /> no src. Crazy!

…se apparently that's a thing).

feat(ebook/__init__.py): Leech print out more information about the images it is downloading. The number of images in each chapter and the image downloading currently.
@TheMetalCenter
Copy link

TheMetalCenter commented Mar 6, 2023

I tested this out on the Wandering Inn (https://wanderinginn.com/table-of-contents/), which has images in, for example, the Cover page, title page, and chapter 1.02, but it fails to detect all but one of the images. I imagine this has to do with how images are embedded in the HTML on this Wordpress site, but I'm still parsing it out.

Edit: Ah, the issue was I was my json filter selector was preventing them from being read. All of the pictures are detected now, but most fail to load for some reason. It may be an issue with my ebook viewer, however (Calibre).

Edit2: Confirmed the images show up on my Kindle, so it's a Calibre issue that they are broken in their e-book viewer. Thank you and @IdanDor for your work on adding this feature!

@Jemeni11
Copy link
Contributor Author

There's this weird image-hosting site called filepicker.io that's causing problems when you try to download from it.
This new commit should fix it.
The fix:
JimmXinu/FanFicFare#933 (comment)

@Jemeni11
Copy link
Contributor Author

Jemeni11 commented Apr 3, 2023

These new updates work for me but I only tested them on one site (fiction.live)

@Jemeni11 Jemeni11 changed the title Partial Fix for Issue #2 Fix for Issue #2 Apr 3, 2023
@Jemeni11
Copy link
Contributor Author

Jemeni11 commented Apr 9, 2023

This code doesn't download images in xenforo spoilers yet. This will be fixed soon.

EDIT: These xenforo spoiler images are weird. The images get downloaded twice for some reason.

Jemeni11 added 2 commits April 9, 2023 18:01
…de-spoilers` tag has to be added for Leech to download images in spoilers.
Fiction.Live seems to have changed how they host images
@Jemeni11 Jemeni11 closed this by deleting the head repository Jun 7, 2024
@kemayo
Copy link
Owner

kemayo commented Nov 23, 2024

I merged this with a number of additions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants