Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving PNG images with PIL is 4 times slower than saving them with OpenCV #5986

Open
apacha opened this issue Jan 25, 2022 · 22 comments
Open

Comments

@apacha
Copy link

apacha commented Jan 25, 2022

What did you do?

I want to save an image to disk and noticed a severe performance bottleneck for the part of my code that used PIL for saving images compared to a similar part in my codebase that uses OpenCV to save the images.

What did you expect to happen?

I expected both methods to be somewhat similar in performance.

What actually happened?

PIL was at least four times slower than converting the PIL.Image into an numpy array and storing the array using cv2.imwrite.

What are your OS, Python and Pillow versions?

  • OS: MacOS 12.1
  • Python: 3.9
  • Pillow: 9.0.0

Here is the benchmark code that I used:

import time
import cv2
import numpy
from PIL import Image
from tqdm import tqdm
from PIL.ImageDraw import ImageDraw

if __name__ == '__main__':
    image = Image.new("RGB", (4000, 2800))
    image_draw = ImageDraw(image)
    image_draw.rectangle((10, 20, 60, 120), fill=(230, 140, 25))
    trials = 50

    t1 = time.time()
    for i in tqdm(range(trials)):
        image.save("tmp1.png")
    t2 = time.time()
    print(f"Total time for PIL: {t2 - t1}s ")

    t1 = time.time()
    for i in tqdm(range(trials)):
        image_array = numpy.array(image)
        image_array = cv2.cvtColor(image_array, cv2.COLOR_RGB2BGR)
        cv2.imwrite("tmp2.png", image_array)
    t2 = time.time()
    print(f"Total time for OpenCV: {t2 - t1}s ")

    img1 = cv2.imread("tmp1.png")
    img2 = cv2.imread("tmp2.png")
    print(f"Images are equal: {numpy.all(img1 == img2)}")

which produced

100%|██████████| 50/50 [00:26<00:00,  1.91it/s]
Total time for PIL: 26.21s 
100%|██████████| 50/50 [00:06<00:00,  8.00it/s]
Total time for OpenCV: 6.24s 
Images are equal: True

The produced images are slightly different in file-size so potentially there is a more sophisticated compression being used.

Here are the two (black) images I obtained
tmp1.png (PIL-image, 33KB)
tmp2.png (OpenCV-image, 38KB)

My questions are:

  • Why is PIL so much slower?
  • Is there some way how I can speed up PIL to match the performance of OpenCV in this scenario (other than converting to a numpy array and using OpenCV), e.g., by providing extra parameters to the save method?

Interestingly, if I switch from PNG to JPG, the results are flipped and PIL is faster than OpenCV:

# saving as "tmp1.jpg" and "tmp2.jpg" instead
100%|██████████| 50/50 [00:04<00:00, 11.46it/s]
Total time for PIL: 4.37s 
100%|██████████| 50/50 [00:08<00:00,  6.11it/s]
Total time for OpenCV: 8.17s 

could this be a problem in the PNG encoding library?

@radarhere
Copy link
Member

Here are some thoughts for you.

We allow the compression level to be set when saving PNGs - if I change your code to image.save("tmp1.png", compress_level=1) on my machine, Pillow is almost as fast as OpenCV.

We also allow setting the compression type when saving PNGs - image.save("tmp1.png", compress_type=3) on my machine, Pillow is evenly matched with OpenCV, sometimes faster, sometimes slower.

@apacha
Copy link
Author

apacha commented Jan 28, 2022

Thanks for the hints, I tried to adapt my code and I'm getting the following results:

# image.save("tmp1.png", compress_type=3)
100%|██████████| 50/50 [00:16<00:00,  2.98it/s]
Total time for PIL: 16.7s 
100%|██████████| 50/50 [00:05<00:00,  9.34it/s]
Total time for OpenCV: 5.35s 

and

# image.save("tmp1.png", compress_level=1)
100%|██████████| 50/50 [00:15<00:00,  3.15it/s]
Total time for PIL: 15.90s 
100%|██████████| 50/50 [00:05<00:00,  9.64it/s]
Total time for OpenCV: 5.19s 

which both is still pretty far away from the OpenCV results and leading to resulting image sizes of 33KB (PIL - compress_type=3), and 147KB (PIL - compress_level=1) vs 38KB (OpenCV). Maybe it's a MacOS specific issue?

@radarhere
Copy link
Member

No, it's not macOS specific. I am also a macOS user.

@animetosho
Copy link

I recall looking into this a while ago, so my memory may be a bit sketchy, but something I noticed.

For whatever reason, PIL implements its own PNG filtering (as opposed to use something like libpng).
The code lacks any SIMD, and looks relatively unoptimised (e.g. multiple passes over the data without any cache blocking), so even if you set zlib compression to 0, it's still awfully slow, because most of the CPU time is spent on filtering.

It'd be nice if PNG encoding could use a more speed optimised library.

@animetosho
Copy link

May have spoke too soon in my previous comment. Considering how long PNG has been around, I thought that the popular libraries would be reasonably well optimised, however, looking into this, it seems like they're predominantly optimised for decode only, not encode. Somewhat surprising to me, but not completely unreasonable I guess.

So I'm not sure why OpenCV is faster here - they appear to be using libpng for PNG creation, which doesn't use SIMD for encoding. Maybe the compiler's auto-vectorizer just happens to work there?

Regardless, I did find two speed focused encoders, fpng and fpnge, which only surfaced relatively recently. I made some changes to the latter to make it more usable and made a quick-and-dirty Python module for it.
Being a speed focused encoder, which sacrifices some compression for performance, it's likely not suitable for integrating into Pillow. But if it helps, there's direct support for exporting a PIL.Image to PNG, if great compression isn't a high priority.

Whilst writing this comment, I came across Python bindings for fpng. I haven't tried this myself, but it may also be worth checking out.

@radarhere
Copy link
Member

Doing a basic investigation, I found that

err = deflate(&context->z_stream, Z_NO_FLUSH);
is where most of the time is spent.

I thought maybe changing a setting in deflateInit2 could be improve the situation, but it has the maximum memLevel, the largest windowBits without using gzip encoding, and we've already discussed changing the compression level and type.

@animetosho
Copy link

Thanks for looking into it. If you set compression level to 0, is that still where most of the time is spent?

@radarhere
Copy link
Member

Yes.

@animetosho
Copy link

That indeed is very surprising. Would be interesting to know what it's spending all its time on, even when it's doing no compression.

@MathijsNL
Copy link

MathijsNL commented Jan 19, 2023

Not sure if anything has changed in the meantime, but for me PIL is waaaay faster than OpenCV.

Python 3.10.6
cv2 4.6.0
PIL version 9.2.0

Using compression level 9, and I moved the image_array outside of the loop to make the benchmark more fair.

import time
import cv2
import numpy
from PIL import Image
from PIL.ImageDraw import ImageDraw

if __name__ == '__main__':
    image = Image.new("RGB", (4000, 2800))
    image_draw = ImageDraw(image)
    image_draw.rectangle((10, 20, 60, 120), fill=(230, 140, 25))
    trials = 20

    t1 = time.time()
    for i in range(trials):
        image.save("tmp1.png", compress_level=9)
    t2 = time.time()
    print(f"Total time for PIL: {t2 - t1}s ")

    compression_level = [cv2.IMWRITE_PNG_COMPRESSION, 9]
    image_array = numpy.array(image)
    image_array = cv2.cvtColor(image_array, cv2.COLOR_RGB2BGR)

    t1 = time.time()
    for i in range(trials):
        cv2.imwrite("tmp2.png", image_array, compression_level)
    t2 = time.time()
    print(f"Total time for OpenCV: {t2 - t1}s ")

    img1 = cv2.imread("tmp1.png")
    img2 = cv2.imread("tmp2.png")
    print(f"Images are equal: {numpy.all(img1 == img2)}")

Total time for PIL: 5.534168481826782s
Total time for OpenCV: 9.936758279800415s
Images are equal: True

Compress level 3 (which is OpenCV standard) gives me this:
Total time for PIL: 2.8487443923950195s
Total time for OpenCV: 7.241240978240967s
Images are equal: True

@radarhere
Copy link
Member

Pillow is indeed faster in the code from the previous comment - but that isn't because anything changed, but just because the previous comment is using compression, whereas the original post isn't using compression.

If the previous comment shows an acceptable comparison for compression, then Pillow slower without compression, but faster with.

@apacha
Copy link
Author

apacha commented Aug 7, 2023

When I'm running the code from #5986 (comment), I still obtain the following result (averaged the numbers from three runs):

Total time for PIL: 10.6s 
Total time for OpenCV: 6.6s 
Images are equal: True

would be interesting to understand, why the results are so contradicting, when running the same code on different machines.

@MathijsNL
Copy link

MathijsNL commented Aug 7, 2023

whereas the original post isn't using compression.

That is simply not true. Doing so would result in a file size as big as a BMP image. I modified the drawing code slightly so it isn't just almost black only.

The table below shows the file size for each compression option. Using the default option gives a file size that is 2.7x smaller with PIL. Also interesting to see that the default value (so not specifying compression at all) you can see clearly that PIL uses compression 6 by default because the file size is the same. OpenCV on the other hand is somewhere in between compression level 3 and 4.

Compressing with level 0 gives an image that is about the size of the image (4000x2800x3=33600000 + some png headers).

Compress Level OpenCV Size PIL Size
Default 113301 42045
0 33656757 33613677
1 160752 160484
2 160233 159916
3 159575 159291
4 41927 41745
5 42023 41840
6 42226 42045
7 42212 42030
8 41786 41578
9 41792 41585
    image = Image.new("RGB", (4000, 2800))
    image_draw = ImageDraw(image)
    image_draw.rectangle((10, 20, 60, 120), fill=(230, 140, 25))

    num_squares = 100

    for _ in range(num_squares):
        x1 = random.randint(0, 3950)
        y1 = random.randint(0, 2750)
        x2 = x1 + random.randint(10, 150)
        y2 = y1 + random.randint(10, 150)
        
        fill_color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
        image_draw.rectangle((x1, y1, x2, y2), fill=fill_color)

would be interesting to understand, why the results are so contradicting, when running the same code on different machines.

As for the save times, there does seem to be a lot of speed difference between Pillow 9.0 and 9.2! I just noticed that I used that version in my earlier tests and that is probably the cause of the difference in time measurement.

So my conclusion so far is that Pillow 9.0 does indeed have something odd going on with PNG save times, which can be resolved by updating to Pillow 9.1.0 or higher, just tested that as well.

@radarhere
Copy link
Member

whereas the original post isn't using compression.

That is simply not true. Doing so would result in a file size as big as a BMP image.

Perhaps I could have said that the code in the original post didn't specify compression.

@NeedsMoar
Copy link

Just for another data point, when I was still using Adobe Photoshop last year, their "best compression" option which would try multiple compression parameters to try to find the optimal (or at least the best within some amount of time) for an image took something like 10s for relatively small (~2560x1600ish) images. Lossless JPEG2000 was actually a decent amount faster aside from being smaller. I remember reading a long time ago that finding the optimal PNG compression involved trying multiple parameters for compression within a range since they couldn't be predicted accurately in advance but things may have changed since then.

@dofuuz
Copy link

dofuuz commented Oct 24, 2024

Replacing zlib with zlib-ng makes huge speedup.

Here are my quick and dirty benchmark.

Original (with zlib):

PIL.__version__='11.1.0.dev0'
read PNG: 0.092544 (sec)
on-memory PNG: 1.034144 (sec)
write PNG: 1.054776 (sec)
PNG size 1352945 bytes

With zlib-ng:

PIL.__version__='11.1.0.dev0'
read PNG: 0.089149 (sec)
on-memory PNG: 0.515073 (sec)
write PNG: 0.527402 (sec)
PNG size 1394511 bytes

It's about 2x speed-up with almost no effort.

Another candidate is zlib-cloudflare. But it does not support 32-bit CPUs.
It may be adopted if Pillow devs decides to drop 32-bit platform support.

@dofuuz
Copy link

dofuuz commented Oct 24, 2024

I made PR #8495 for this.

@nulano
Copy link
Contributor

nulano commented Oct 24, 2024

Another candidate is zlib-cloudflare. But it does not support 32-bit CPUs.
It may be adopted if Pillow devs decides to drop 32-bit platform support.

We did drop 32-bit Windows wheels for a few releases, but there were too many people still depending on them: #7443 (comment)

But if there is a benefit to using zlib-cloudflare over zlib-ng, I don't see a reason that we couldn't use zlib-cloudflare in the 64-bit wheels and zlib-ng (or even just zlib) in the 32-bit wheels.

@dofuuz
Copy link

dofuuz commented Oct 25, 2024

@nulano
Later, I found that zlib-ng readme says "Includes improvements from Cloudflare and Intel forks". So I think we should just use zlib-ng for both 32-bit and 64bit.

@MathijsNL
Copy link

It's about 2x speed-up with almost no effort.

It does come at the cost of about 3% increased file size. Do you know if there are tweaks / settings for zlib-ng to get comparable file sizes?

@nulano
Copy link
Contributor

nulano commented Oct 28, 2024

Do you know if there are tweaks / settings for zlib-ng to get comparable file sizes?

You can use the compress_level setting: https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#png-saving

The Pillow default is 5, but even using the maximum 9 is faster with zlib-ng than the default 5 with regular zlib. See the benchmarks in my PR (comments at the end of the collapsed block): #8500

@MathijsNL
Copy link

@nulano Thanks for linking your test results, impressive gains!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants