Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request: Pillow-simd w/ libjpeg-turbo conda/pip package #44

Open
stas00 opened this issue Dec 14, 2018 · 8 comments
Open

request: Pillow-simd w/ libjpeg-turbo conda/pip package #44

stas00 opened this issue Dec 14, 2018 · 8 comments

Comments

@stas00
Copy link

stas00 commented Dec 14, 2018

Since you have been making this amazing effort to port more and more things into SIMD, I was wondering if perhaps you have a modified conda recipe that builds Pillow-simd w/ built-in libjpeg-turbo library.

Currently, Pillow-simd is already a problem with conda, since any update of any package relying on Pillow will wipe out Pillow-simd. And if one wants to build either Pillow or Pillow-SIMD with libjpeg-turbo to include SIMD compression/decompression it becomes even more complicated package management-wise. Plus there is a potential conflict of libjpeg-turbo and jpeg should they both have the same libjpeg.so.x.y - that would be the worst!

I spent the last few days trying to sort it out, resulting in this documentation https://docs.fast.ai/performance.html#faster-image-processing and python-pillow#3493 kindly contributed by @radarhere, which I hope you will sync into your branch soon. The main effort was made to detect the SIMD versions of Pillow and libjpeg because they are so volatile in the pypi/conda environments.

We really need a reliable binary package that has all the SIMD goodness and which won't get swapped out and replaced by the default-limited versions of the same. And so that other packages (fastai in our case) could depend on it in their pypi/conda dependencies.

I propose a new version of pillow-simd called pillow-simd-jpeg-turbo which will be a binary build of Pillow-SIMD, built with libjpeg-turbo's libjpeg.so at a unique location under PIL/, rather than environment's libs, so that one doesn't need to install the turbo package (and no need for any other files from the libjpeg-turbo package). And once installed it'll reliable work, no matter what other packages get installed or updated.

And for our use, linux-only build would be a great start.

Thoughts?

edit: re-reading again your README file, I now see that pre-compiling won't work. What about making a source pip package then that will be compiled by each user, but once compiled it'll be left alone and not swapped out by conda/pip updates of other packages.

and moreover my idea won't work unless it'll replace PIL with PIL-foo, so it's no longer a drop-in replacement, but a different namespace altogether. Which perhaps could be aliased back to PIL, and it'd go something like:

try:
    import PIL_SIMD
    PIL = PIL_SIMD # alias to PIL
    ...etc...
except:
    import PIL

What a mess.

Finally, libjpeg-turbo is distributed as a binary, and it says AVX2 is supported. Sorry, I am new to SIMD, you say Pillow-SIMD has to be built from source, but libjpeg-turbo doesn't? Please kindly explain.

@homm
Copy link
Collaborator

homm commented Dec 14, 2018

python-pillow#3493 which I hope you will sync into your branch soon

This will happen after the release 5.4 of Pillow. No additional action is required.

I propose a new version of pillow-simd called pillow-simd-jpeg-turbo which will be a binary build of Pillow-SIMD built with libjpeg-turbo's libjpeg.so

Pillow-SIMD is not a binary package, it could build with both libjpeg and libjpeg-turbo depending on which is installed in your system. Installing Pillow-SIMD is the same thing as installing Pillow from sources. Pillow-SIMD doesn't aim to change Pillow's build system. If you have problems when building Pillow from sources, you'd better fix it in Pillow, not in Pillow-SIMD.

What about making a source pip package then that will be compiled by each user

This is exactly what happens now.

but once compiled it'll be left alone and not swapped out by conda/pip updates of other packages.

Pillow and Pillow-SIMD don't control pip and conda. If you want conda/pip to behave in some particular way, you probably need to ask them about it.

replace PIL with PIL-foo, so it's no longer a drop-in replacement, but a different namespace altogether.

Pillow doesn't work this way right now: It has many "from PIL" import in codebase. So before using alternative namespace, you need to fix Pillow.

@homm
Copy link
Collaborator

homm commented Dec 14, 2018

Sorry, I am new to SIMD, you say Pillow-SIMD has to be built from source, but libjpeg-turbo doesn't? Please kindly explain.

libjpeg-turbo detects CPU features and choose different branches of code at runtime. Pillow-SIMD doesn't have such feature.

@stas00
Copy link
Author

stas00 commented Dec 14, 2018

Thank you for answering my questions, @homm.

Here are proposed solutions for building Pillow-SIMD from scratch by users and not getting Pillow-SIMD wiped out on conda/pip install/update for Pillow or any package that depends on Pillow.

1. Solution for pip

That's an easy one:

Apply this patch:

diff --git a/setup.py b/setup.py
index 6c2e78e7..84611207 100755
--- a/setup.py
+++ b/setup.py
@@ -134,7 +134,7 @@ except (ImportError, OSError):
     # pypy emits an oserror
     _tkinter = None

-NAME = 'Pillow-SIMD'
+NAME = 'Pillow' if 'REPLACEPILLOW' in os.environ else 'Pillow-SIMD'
 PILLOW_VERSION = get_version()
 JPEG_ROOT = None
 JPEG2K_ROOT = None

Now build Pillow-SIMD with:

REPLACEPILLOW=1 CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile .

Now we get a true drop-in replacement and not half-baked. Observe:

$ REPLACEPILLOW=1 CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile .
Processing /home/stas/github/00images/pillow-simd
Installing collected packages: Pillow
  Found existing installation: Pillow 5.3.0
    Uninstalling Pillow-5.3.0:
      Successfully uninstalled Pillow-5.3.0
  Running setup.py install for Pillow ... done
Successfully installed Pillow-5.3.0.post0

$ pip install pillow
Requirement already satisfied: pillow in /home/stas/anaconda3/envs/pytorch-dev/lib/python3.6/site-packages (5.3.0.post0)

Since we need a user to build this from scratch this works

If a user is not using the conda environment, this problem is now solved for her.

2. Solution for Conda

While pip will respect conda's installs (i.e. won't try to reinstall a package with the same version installed by conda), conda doesn't respect pip's packages - it'll not look at whether the package has already been installed. So anybody using the conda environment will have a constant problem of losing their custom-build Pillow-SIMD on any update/install of conda packages requiring Pillow.

Since you say we can't have a binary release and users need to rebuild from scratch, we need to provide them a way to build a conda package on their machine from scratch.

First, unzip the conda recipe folder attached to this comment.

So:

$ git checkout https://github.com/uploadcare/pillow-simd
$ cd pillow-simd
$ conda install conda-build
# 1. apply the patch from Solution 1
# 2. unzip the conda recipe folder attached to this comment inside the pillow-simd folder
$ REPLACEPILLOW=1 conda-build recipe/meta.yaml
# conda-build is churning here with resulting:
Automatic uploading is disabled
If you want to upload package(s) to anaconda.org later, type:
anaconda upload /home/stas/anaconda3/envs/pytorch-dev/conda-bld/linux-64/pillow-5.3.0-py36h14c3975_1000.tar.bz2
$ conda install -c ${CONDA_PREFIX}/conda-bld/ pillow

Now the user has their own Pillow-SIMD conda package, but named Pillow and it'll not get removed by conda install/update of other packages.

I'm getting intermittent problems with the overriding, it works with:

conda config --add channels ${CONDA_PREFIX}/conda-bld/

but need to do more testing.

So, we will just need you to include the patch in solution #1 and the conda recipe.

Someone could please repeat my solutions and see that I haven't made any mistakes. I have made many in the process of writing this comment.

recipe.zip

@stas00
Copy link
Author

stas00 commented Dec 14, 2018

Sorry, I am new to SIMD, you say Pillow-SIMD has to be built from source, but libjpeg-turbo doesn't? Please kindly explain.

libjpeg-turbo detects CPU features and choose different branches of code at runtime. Pillow-SIMD doesn't have such feature.

Could Pillow-SIMD to do the same or will that have a negative impact on the performance?

Also the README of this project isn't going into details why Pillow-SIMD needs to be built from scratch.
Would it be helpful to have just 2 versions for optimal performance on linux-64 (a. SSE4 b. AVX2)? If more than two, please, explain why? Thank you.

@stas00
Copy link
Author

stas00 commented Jan 10, 2019

Any chance you could follow up on the last comment, @homm? I'm not asking you to commit to build any binary packages. Just needing to know whether it's enough to build 2 versions for optimal performance on linux-64 (a. SSE4 b. AVX2) to cover all bases. I don't have the expertise in this domain, and I was hoping that you could share your insights. Thank you.

@rgommers
Copy link

rgommers commented Jun 2, 2019

The solution at https://docs.fast.ai/performance.html#conda-packages does work, but is (as the description says) very fragile. Conda immediately starts to display warnings like:

The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/linux-64::scikit-image==0.15.0=py36he6710b0_0
  - defaults/linux-64::imageio==2.5.0=py36_0
  - defaults/linux-64::bokeh==1.2.0=py36_0

and will indeed wipe out pillow-simd as soon as you try to update anything else.

Just needing to know whether it's enough to build 2 versions for optimal performance on linux-64 (a. SSE4 b. AVX2) to cover all bases.

This sounds correct to me, SSE4 and AVX2 versions do seem to be enough.

Could Pillow-SIMD to do the same or will that have a negative impact on the performance?

This is possible without significant performance impact. NumPy, OpenCV and other packages do the same.

The long-term solution for this packaging problem seems to be to add runtime selection, and then upstream everything into Pillow. But of course that may be a lot of work ....

Thanks for the work on this packaging issue @stas00. And of course thanks @homm for making Pillow faster - it does make a noticeable difference to me in some deep learning pipelines.

@stas00
Copy link
Author

stas00 commented Jun 2, 2019

Thank you for your comments, @rgommers. I'm with you that ideally this would be part of upstream.

Thomas Brandon, has expanded on my initial work, and created pillow-accel-sse3, pillow-accel-avx2, pillow-accel conda packages. Please, see the middle of discussion here (but there are posts preceding that one).

homm pushed a commit that referenced this issue May 8, 2020
@IamGianluca
Copy link

I second this request. I've been trying to install pillow-simd with libjpeg-turbo in a conda environment for the last couple of days and can't find a reproducible way to make it work. This is quite a shame as computer vision learning practitioners would benefit enormously by using pillow-simd with libjpeg-turbo instead of PIL/pillow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants