Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribution: Dockerfile for cellxgene VIP #102

Open
Neah-Ko opened this issue Jan 15, 2024 · 12 comments
Open

Contribution: Dockerfile for cellxgene VIP #102

Neah-Ko opened this issue Jan 15, 2024 · 12 comments

Comments

@Neah-Ko
Copy link

Neah-Ko commented Jan 15, 2024

Hello VIP team,

I have been tasked by my structure @bag-cnag to make VIP plugging work for cellxgene.

I am sharing the result here, and am offering to contribute it to the repo via a pull request:
https://github.com/bag-cnag/cxg_on_k8/blob/main/docker/Dockerfile_cellxgene_VIP_slim

Notes:

  • Function as a standalone: can be built from anywhere
    • .yml conda env file is stored in a gist
  • Size optimized, for a result just above 2.1GB
  • Micromamba based, but the final layer doesn't contain the executable (I kept apt and pip though)
  • Builds cellxgene from sources, with a couple of homebrew fixes, in particular it bumps quite a bit of the dependencies.
    • v1.1.2 - Latest is 1.2.0. However when looking at the last commits on cellxgene main repo, nothing but dependencies is changing
  • Notebooks are still a little buggy. However some of those bugs seem to come from missing annotations in the datasets I have access to. If someone from the development team could share one that works with all Vignettes with me, then I could tackle potential remaining bugs.

Let me know what you think.

Best,
Etienne

@z5ouyang
Copy link
Collaborator

z5ouyang commented Feb 3, 2024

Hi Etienne,
Thank you very much for considering/doing this. Currently we don't have the bandwidth to do (or maintain) your proposal. but it is a good resource. We (@baohongz) could put your repo on the list if some user would like to use Docker.

@Neah-Ko
Copy link
Author

Neah-Ko commented Feb 5, 2024

Hello @z5ouyang ,
Sure, half the point was to make it visible in case someone is looking for such a Dockerfile.

Best,

@rohitrrj
Copy link

rohitrrj commented Feb 19, 2024

@Neah-Ko Thanks for sharing the Dockerfile! Looks great and seems a lot of effort kudos.
However it looks like the build fails while installing the "rpy2" dependancy. Do you have any pointers on resolving this issue?

@Neah-Ko
Copy link
Author

Neah-Ko commented Feb 20, 2024

Hi @rohitrrj,

@Neah-Ko Thanks for sharing the Dockerfile! Looks great and seems a lot of effort kudos. However it looks like the build fails while installing the "rpy2" dependancy. Do you have any pointers on resolving this issue?

So yeah it's an issue I encountered while crafting the Dockerfile, most likely due to r2py not finding R install path. It wasn't occuring on my last runs however, I guess micromamba solver can be inconsistent with package ordering.

In principle you should set the R_HOME environment variable just above the environment creation part of the Dockerfile like this:

ENV R_HOME=/env/lib/R/

Could you please post a log of your failing build, so that I could make sure ?

Also, if you are building on MacOS, I know it can introduce some side effects. In that case, let me know your chip model as well as it can be important.

Best,

@rohitrrj
Copy link

rohitrrj commented Feb 20, 2024

@Neah-Ko Thanks for the suggestion. Unfortunately, that doesn't seem to solve the issue. The build still fails with the same error. I have attached my build log. I am building on MacOS with Intel chip. Following excerpt from the log file seems to be where the build breaks.

#25 379.9       In file included from build/temp.linux-x86_64-cpython-38/_rinterface_cffi_api.c:57:0:
#25 379.9       /env/include/python3.8/Python.h:44:10: fatal error: crypt.h: No such file or directory
#25 379.9        #include <crypt.h>
#25 379.9                 ^~~~~~~~~
#25 379.9       compilation terminated.
#25 379.9       error: command '/env/bin/x86_64-conda-linux-gnu-cc' failed with exit code 1
#25 379.9       [end of output]

cxgVIP_build.log

@rohitrrj
Copy link

@Neah-Ko I was able to resolve the above issue by following the documentation in rpy2 repo described here.
Along with your suggestion above adding the following seems to have resolved it.

RUN export LD_PATH=$(python -m rpy2.situation LD_LIBRARY_PATH)
ENV LD_LIBRARY_PATH=$LD_PATH:${LD_LIBRARY_PATH}

The build does finish without errors.
Most of the functions seem to work as expected.
Only exception was the Single Gene Violin plot which does not seem to populate the actual plot, although the Get Data does seem to export the underlying matrix.

@mohammed-hussain1259
Copy link

Hi @Neah-Ko

Thank you so much for all your work on this, I was just wondering if in this docker container you can build cellxgene using the custom tiledb_version of cellxgene you built also.

@Neah-Ko
Copy link
Author

Neah-Ko commented Feb 26, 2024

Hi @Neah-Ko

Thank you so much for all your work on this, I was just wondering if in this docker container you can build cellxgene using the custom tiledb_version of cellxgene you built also.

Hi @mohammed-hussain1259,

Yeah so I've tried to craft a dockerfile to get both CXG VIP AND the TileDb backend. It is possible to build such an image, however, if TileDb backend is used, then it breaks VIP functionalities.

For a very simple reason: VIP codebase is retrieving data by referencing the AnnData object. I invite you to check out this createData function that performs the job.

def createData(data):

From that it means that to have a unified product we would need to either:

  • Adapt VIP codebase for the case where TileDb backend is used.
  • Find a way to have the backend spit out a TileDb object as an AnnData
    • I have advised CXG team to switch to TileDb-Soma as their backend. As a bonus we would get this conversion to anndata thus being able to support VIP easily.

@bobermayer
Copy link

hi, thanks a lot for sharing. this looks great, however, I'm unable to build the docker image (on ubuntu 22) even including the additional lines suggested by @rohitrrj. I'm still getting the same error whne trying to build rpy2.
any suggestions greatly appreciated!

@Neah-Ko
Copy link
Author

Neah-Ko commented Apr 24, 2024

hi, thanks a lot for sharing. this looks great, however, I'm unable to build the docker image (on ubuntu 22) even including the additional lines suggested by @rohitrrj. I'm still getting the same error whne trying to build rpy2. any suggestions greatly appreciated!

Hi @bobermayer ,

Here's what you could try:

  1. Copy the conda-env VIP_cnag.yml file from my gist onto your local machine
  2. Remove rpy2 from this file, save it, change Dockerfile to use your file instead
  3. The following snippet is the one managing env creation:

https://github.com/bag-cnag/cxg_on_k8/blob/f4d66f50f7a8bc8eedc48a0a909cac1e12ca6b31/docker/Dockerfile_cellxgene_VIP_slim#L118-L123

Since you took out rpy2 from the env file, you want to install it manually in the env after it is created.
Append the following line like this:

RUN micromamba env create ...
    ...
    python3 -m pip install --no-deps /cellxgene*.whl && \
    [export R_HOME=/env/lib/R/ && \]
    python3 -m pip install rpy2==3.3.5

I'm not sure, the R_HOME line is necessary but you may try both versions.

Let me know if that worked.

Best,

@bobermayer
Copy link

Hi @Neah-Ko

thanks for your message. none of that worked, but I found a workaround by explicitly installing libcrypt-dev and copying crypt.h to the expected location (see stanford-futuredata/ColBERT#309).

Dockerfile
ARG PYTHON__V=3.8

FROM mambaorg/micromamba:1.5.6-bookworm-slim as base

USER root

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

ARG PYTHON__V

# ------------------------------------------------------------------------------
FROM base AS builder

ENV LLVM_CONFIG=/usr/lib/llvm14/bin/llvm-config

# Build dependencies
RUN apt-get update                       && \
    apt-get -y install bash              && \
    apt-get -y install build-essential   && \
    apt-get -y install jq                && \
    apt-get -y install git               && \
    apt-get -y install libhdf5-dev       && \
    apt-get -y install python3-pkgconfig && \
    apt-get -y install python3-dev       && \
    apt-get -y install python3-pip       && \
    apt-get -y install python3-wheel     && \
    apt-get -y install llvm-dev          && \
    apt-get -y install libblas-dev       && \
    apt-get -y install cpio

WORKDIR /
RUN mkdir cellxgene cellxgene_VIP

# Copies a single commit: lighter and fixes the version
WORKDIR /cellxgene_VIP
RUN git init && \
    git remote add origin https://github.com/interactivereport/cellxgene_VIP.git  && \
    git fetch --depth 1 origin 6d4e496b94701e742d99fa0a0f0362ebea82814b && \
    git checkout FETCH_HEAD

WORKDIR /cellxgene
RUN git init && \
    git remote add origin https://github.com/chanzuckerberg/cellxgene.git  && \
    git fetch --depth 1 origin ffcf6eb5d842972f2562c359cc2276a0fbbe77d5 && \
    git checkout FETCH_HEAD

# Applying cellxgene fixes:
#  - Upgrade: Flask, boto, s3fs, fssepec, numpy 
#  - limit Werkzeug version as new update (3.0.0) breaks server
#  - np.bool deprecated since numpy 1.20 -> replace by bool
#  - Replace Flask.json.JSONEncoder by json.JSONEncoder in utils.py
#  - Sets (s3) region name to false in default_config.py
#  - Add --legacy-peer-deps and --openssl-legacy-provider flags to npm commands in makefiles
#  - Extra Makefile entry to build a wheel
RUN cp ./environment.default.json /environment.default.json
RUN sed -i 's/np.bool/bool/g'                      server/data_common/data_adaptor.py    && \
    printf "\nWerkzeug<=2.3.7"              >>     server/requirements.txt               && \
    sed -i '/^boto3>/ s/=.*/=1.27.47/'             server/requirements.txt               && \
    sed -i '/^anndata/ s/==.*$/==0.9.2/'           server/requirements.txt               && \
    sed -i '/^Flask>/ s/,.*$/,<3.0.0/'             server/requirements.txt               && \
    sed -i '/^numpy>/ s/=.*$/=1.24.4/'             server/requirements.txt               && \
    sed -i '/^fsspec>/ s/,.*$//'                   server/requirements.txt               && \
    sed -i '/^s3fs==/ s/==.*$/==2023.9.0/'         server/requirements.txt               && \
    sed -i '10s/^/from json import JSONEncoder\n/' server/common/utils/utils.py          && \
    sed -i 's/json.JSONEncoder/JSONEncoder/g'      server/common/utils/utils.py          && \
    sed -i '/region_name/ s/:.*$/: false/'         server/default_config.py              && \
    sed -i 's/npm ci/npm ci --legacy-peer-deps/'   client/Makefile                       && \
    sed -i '6s/^/WHEELBUILD := $(BUILDDIR)\/lib\/server\n/' Makefile                     && \
    printf '\n\
build_wheel: build                                                                       \n\
	$(call copy_client_assets,$(CLIENTBUILD),$(WHEELBUILD))                              \n\
pywheel:                                                                                 \n\
	NODE_OPTIONS=--openssl-legacy-provider $(MAKE) build_wheel                           \n\
	python3 setup.py bdist_wheel -d wheel\n' >> Makefile

RUN cp /cellxgene_VIP/index_template.insert ./index_template.insert

# Patch from cellxgene_VIP/config.sh: update cellxgene client source code for VIP
RUN echo -e "\nwindow.store = store;" >> client/src/reducers/index.js && \
    sed -i "s|<div id=\"root\"></div>|$(sed -e 's/[&\\/]/\\&/g; s/|/\\|/g; s/$/\\/;' -e '$s/\\$//' index_template.insert)\n&|" client/index_template.html && \
    sed -i "s|logoRelatedPadding = 50|logoRelatedPadding = 60|" client/src/components/leftSidebar/index.js && \
    sed -i "s|title=\"cellxgene\"|title=\"cellxgene VIP\"|" client/src/components/app.js && \
    sed -i "s|const *scaleMax *= *[0-9\.]\+|const scaleMax = 50000|; s|const *scaleMin *= *[0-9\.]\+|const scaleMin = 0.1|; s|const *panBound *= *[0-9\.]\+|const panBound = 80|" client/src/util/camera.js && \
printf '\n\
from server.app.VIPInterface import route\n\
@webbp.route("/VIP", methods=["POST"])\n\
def VIP():\n\
    return route(request.data, current_app.app_config)\n' >> server/app/app.py && \
    sed -i '/^-e/d' ./client/src/reducers/index.js
##

# Build cellxgene wheel in node env for next stage
RUN micromamba create -yn node18 'nodejs>=18,<19' -c conda-forge && \
    micromamba run -n "node18" \
        make pywheel

# ------------------------------------------------------------------------------
FROM base AS final

ARG PYTHON__V

# Get wheel and VIP sources
COPY --from=builder /cellxgene/wheel/cellxgene*.whl /
COPY --from=builder /cellxgene_VIP /cellxgene_VIP
COPY --from=builder /cellxgene/test/decode_fbs.py /cellxgene/test/decode_fbs.py
# Conda runs with bash
SHELL ["/bin/bash", "-c"]

WORKDIR /tmp

# Get env file 
RUN apt-get update && \
    apt-get install -y --no-install-recommends wget && \ 
    wget https://gist.githubusercontent.com/Neah-Ko/d260316d77a42c5e7a698a766d8404a0/raw/6196bd8342350d01452500541151fd7e81e66443/VIP_cnag.yml 
# remove rpy2 from the yml file
RUN cat VIP_cnag.yml | grep -v rpy2 > VIP_cnag_no_rpy2.yml

# Create env and install cellxgene and ipykernel in it
RUN micromamba env create -p /env -y --file VIP_cnag_no_rpy2.yml             && \
    eval "$(micromamba shell hook --shell bash)"                             && \
    micromamba activate -p /env                                              && \
    python3 -m ipykernel install --display-name "Python (/env)" --sys-prefix && \
    python3 -m pip install --no-deps /cellxgene*.whl                         

# install rpy2 separately and first hack crypt.h into the expected location (see https://github.com/stanford-futuredata/ColBERT/issues/309)
RUN apt-get update                                                           && \
    apt-get -y install libcrypt-dev                                          && \
    cp /usr/include/crypt.h /env/include/crypt.h                             && \
    eval "$(micromamba shell hook --shell bash)"                             && \
    micromamba activate -p /env                                              && \
    python3 -m pip install rpy2==3.3.5 

ENV PYTHONPATH=/env/lib/python${PYTHON__V}/site-packages
ENV APPPATH=${PYTHONPATH}/server/app

# Patch from cellxgene_VIP/update.VIPInterface.sh
WORKDIR /cellxgene_VIP
RUN mkdir ${APPPATH}/gsea && \
    sed -i "s|MAX_LAYOUTS *= *[0-9]\+|MAX_LAYOUTS = 300|" ${PYTHONPATH}/server/common/constants.py && \
    # To display notebook results:
    sed -i 's|      $("#CLIresize").html(filteredRes);|      $("#CLIresize").html(filteredRes + res);|' ./interface.html && \
    cp ./interface.html ${PYTHONPATH}/server/common/web/static/ && \
    cp ./gsea/*.gmt                      ${APPPATH}/gsea/ && \
    cp ./VIPInterface.py                 ${APPPATH} && \
    cp ./fgsea.R                         ${APPPATH} && \
    cp ./complexHeatmap.R                ${APPPATH} && \
    cp ./volcano.R                       ${APPPATH} && \
    cp ./Density2D.R                     ${APPPATH} && \
    cp ./bubbleMap.R                     ${APPPATH} && \
    cp ./bubbleMap.R                     ${APPPATH} && \
    cp ./violin.R                        ${APPPATH} && \
    cp ./volcano.R                       ${APPPATH} && \
    cp ./browserPlot.R                   ${APPPATH} && \
    cp ./complexHeatmap.R                ${APPPATH} && \
    cp ./proteinatlas_protein_class.csv  ${APPPATH} && \
    cp ./complex_vlnplot_multiple.R      ${APPPATH} && \
    cp /cellxgene/test/decode_fbs.py     ${APPPATH}
##

# Some R packages need to be installed from sources
RUN apt-get update && \
    apt-get install -y --no-install-recommends libfreetype6-dev libharfbuzz-dev libfribidi-dev libpng-dev libtiff5-dev libjpeg-dev xfonts-base && \
    ln -s /usr/include/freetype2/freetype /env/include/freetype && \
    ln -s /usr/include/freetype2/ft2build.h /env/include/ft2build.h && \
    eval "$(micromamba shell hook --shell bash)" && \
    micromamba activate -p /env && \
    R -q -e 'if(!require(ggrastr)) \
       devtools::install_version("ggrastr", version="0.2.1", upgrade=FALSE, repos = c("https://packagemanager.posit.co/cran/__linux__/bookworm/latest/", "http://cran.us.r-project.org"))' && \
    R -q -e 'if(!require(hexbin)) \
       devtools::install_version("hexbin", version="1.28.2", upgrade=FALSE, repos = c("https://packagemanager.posit.co/cran/__linux__/bookworm/latest/", "http://cran.us.r-project.org"))' && \
    R -q -e 'if(!require(dbplyr)) \
        devtools::install_version("dbplyr", version="1.0.2", upgrade=FALSE, repos = c("https://packagemanager.posit.co/cran/__linux__/bookworm/latest/", "http://cran.us.r-project.org"))' && \
    apt-get remove -y libfreetype6-dev libharfbuzz-dev libfribidi-dev libpng-dev libtiff5-dev libjpeg-dev && \
    apt-get -y autoremove && \
    micromamba clean --all --yes

# Clean env from now unecessary stuff
RUN find /env -name '*.a'                                             | xargs rm -rf && \
    find /env -type d -name '__pycache__'                             | xargs rm -rf && \
    find /env -type d -name 'tests' -not -path *site-packages/tables* | xargs rm -rf && \
    find /env -name 'x86_64-conda*'                                   | xargs rm -rf && \
    rm -rf /env/share/doc /env/share/gtk-doc /env/conda-meta /env/compiler_compat    && \
    rm -rf /env/etc/conda /env/lib/gcc /env/lib/cmake /env/lib/ldscripts

# ------------------------------------------------------------------------------
# Needs a shell
FROM debian:bookworm-slim

ARG PYTHON__V

# Keep only the env & drop intermediate layers
COPY --from=final /env /env

# Set syspaths
ENV PYTHONPATH=/env/lib/python${PYTHON__V}/site-packages
ENV PATH /env/bin:$PATH

# Needed at runtime
RUN apt-get update && \
    apt-get install -y --no-install-recommends xfonts-base && \
    apt-get clean && \
    rm -rf /var/cache/apt/* /var/cache/debconf/* /var/lib/apt/lists/*

# Add user: cellxgeneuser, -> gives ownership over /data 
ARG UID=1000
ARG GID=1000
RUN mkdir /data                           && \
    addgroup --gid "${GID}" cellxgeneuser && \
    adduser --no-create-home                 \
            --disabled-password              \
            --uid "${UID}" --gid "${GID}"    \
            cellxgeneuser                 && \
    chown -R cellxgeneuser:cellxgeneuser /data

# Ensures that users have permissions over /tmp
USER root
RUN chmod 1777 /tmp

USER cellxgeneuser
# Sets temporary directories for (numba | matplotlib)
ENV NUMBA_CACHE_DIR=/tmp
ENV MPLCONFIGDIR=/tmp

ENTRYPOINT ["/env/bin/cellxgene"] 
CMD ["launch", "--help"]

@Neah-Ko
Copy link
Author

Neah-Ko commented Apr 25, 2024

HI @bobermayer,
Thanks for your input.

Interestingly, the built also failed with missing crypt.h on my machine. I guessed something somewhere changed since I've designed the Dockerfile.

I have added libxcrypt=4.4.36 in the conda env file and updated the Dockerfile to pull latest version.
It seem to have fixed the issue.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants