CELLxGENE Discover enables the publication, discovery and exploration of interoperable single-cell datasets. Data contributors can upload, review and publish datasets for private or public use. Via Discover, data consumers are able to discover, download and connect data to visualization tools such as CELLxGENE Explorer to perform further analysis. The goal of Discover is to catalyze distributed collaboration of single-cell research by providing a large, well-labeled repository of interoperable datasets.
Follow instructions in /frontend README
Note: Before you begin to install any Python packages, make sure you have activated your Python virtual environment.
- Install pre-commit:
pre-commit install
or check doc here - Set up your machine to be able to work with AWS using the instructions here. Please ensure to follow the step 3
AWS CLI access
instructions all the way to the bottom so that you are also set up for SSH access. When you run the final command that requires the team's infra repo, usesingle-cell-infra
. - install jq. If brew is installed run
brew install jq
. - install libpq. If brew is installed run
brew install libpq
. We need this tool to invoke thepsql
commandline utility. - Install chamber. For running functional tests below, you will need to install Chamber on your machine. Chamber is a tool for reading secrets stored in AWS Secret Store and Parameter Store. On Linux, go to https://github.com/segmentio/chamber/releases to download the latest version >= 2.9.0, and add it somewhere on your path. On Mac, run
brew install chamber
.
Note: Make sure you are running your Python virtual environment before going through the development guides.
Once you have run the pre-requisite sets, you are ready to begin developing for CELLxGENE Discover. As you start to change code, you may want to deploy a test instance of Discover so that you can check to see how your changes perform. We have two ways to deploy your changes:
-
Creating a local python-only development environment. This local development environment is a minimal environment that is suitable for running the automated test suite quickly. It also allows you to run specific test cases which is not possible in the dockerized local deployment environment (see below) - the
make
commands invoke docker commands to run test suites in their entirety. Moreover, creating an environment without docker also makes for much faster test runs. To set up a python-only local development environment, see this guide -
Creating a local deployment environment. This environment will be entirely hosted on your own machine. It relies upon Docker to run both CELLxGENE Discover servers, Discover unit tests, and infrastructure service dependencies (AWS, Postgres, OIDC). The environment will be initialized with a small amount of dummy data. This environment is great to have up and running while you are actively developing. See this guide for instructions on how to set up a local deployment.
-
Creating a remote deployment. This environment creates a lightweight replica of CELLxGENE Discover, hosted by AWS, and provides a more realistic test bed to test your changes before either sending them to a PR or try them out with a cross-functional partner. It takes a longer time to deploy your changes to a remote development environment which is why the local deployment is preferred until your changes are ready for broader review. See this guide for instructions on how to set up an rDev environment.
Command | Description | Notes |
---|---|---|
make fmt |
Auto-format codebase using black. | This should be run before merging in any changes. |
make lint |
Perform lint checks on codebase using flake8. | This should be run before merging in any changes. |
make unit-test |
Run all unit tests. | |
DEPLOYMENT_STAGE=<deployed_env> python3 -m unittest discover tests/functional/backend |
Run all functional tests. |
Environment variables are set using the command export <name>=<value>
. For example, export DEPLOYMENT_STAGE=dev
. These environment variables typically need to be set before you are able to set up your environments (i.e. local, rDev) and before you are able to successfully run any test suite.
Name | Description | Values |
---|---|---|
DEPLOYMENT_STAGE |
Specifies an app deployment stage for tasks such as deployments and functional tests. The test value implies local Docker development environment (and should probably be renamed local ). |
test , dev , staging , prod |
AWS_PROFILE |
Specifies the profile used to interact with AWS resources via the awscli. | single-cell-dev , single-cell-prod |
CORPORA_LOCAL_DEV |
Flag: If this variable is set to any value, the app will look for the database on localhost:5432 and will use the aws secret corpora/backend/\${DEPLOYMENT_STAGE}/database . |
Any |
If you need to make a change to the CELLxGENE Discover database, see CELLxGENE Discover Database Procedures.
- Set
AWS_PROFILE
. - Ensure that you have set up your local development environment per the instructions above and run
make local-init
to launch a local dev environment. - Run the tests using the command
$ make unit-test
.
Please take a look at the tutorial notebooks in example_dev_notebooks
for examples on how to run the WMG pipeline and endpoints locally for development purposes.
- If your unit tests crash due to an
Error 137
, that means the docker containers currently running are using up more memory than what the docker application has allocated. First run$ make local-stop
to kill all docker containers and rerun the tests. If that doesn't work either, you may need to increase the memory allocation by going into settings pane of your desktop docker application. A reasonable allocation of memory is16GB
.
- Set
AWS_PROFILE
. - Set
DEPLOYMENT_STAGE
to deployed environment you want to run tests, as written locally, against (dev, staging, or prod) - Run a specific suite of tests using
DEPLOYMENT_STAGE=<deployed_env> python3 -m unittest <path_to_functional_test>
. For example,DEPLOYMENT_STAGE=dev python3 -m unittest tests/functional/backend/corpora/test_revisions.py
- Run all functional tests by using
DEPLOYMENT_STAGE=<deployed_env> python3 -m unittest discover tests/functional/backend
Follow instructions in e2e tests README
The upload processing container is split into 2 parts: a base container that contains R libraries, and the CELLxGENE Discover upload application code that builds on top of this.
Because the base container takes a long time to build and is expected to change infrequently, the container is built separately from the standard release process.
The base image is built using Github actions. It is built both nightly, and whenever the Dockerfile.processing_base file is changed.
The CELLxGENE Discover upload application code by default uses the base image tagged with the tag "branch-main" (which the nightly and on-change base image build reassigns).
If a new base image build is needed but the Dockerfile has no functional change (e.g. upstream R libraries versions have changed), the Dockerfile.processing
can be modified with a non-functional to force the build (e.g. adding a blank line).
In the rare event a new build of the base image needs to be built without Github Actions (e.g. Github Actions is down), follow the steps Github's documentation for creating a personal access token, and build locally and push like any other Docker image.