Skip to content

Commit

Permalink
Packaging (#272)
Browse files Browse the repository at this point in the history
* renamed scIB to scib

* fixed imports

* Fixed name in setup.py

* fix name in CI

* refactored package options to setup.cfg

* separate command for pip install scib

* fix finding packages

* moved pytest.ini content to pyproject.toml

* removed old module notes

* removed redundant requirements files

* rename functions to snake case

* wrap integration functions with old functions names

* Better deprecation warning

* rename read_conos and read_scanorama

* moved test dependencies to setup.cfg

* Update README

* Rename batch in README usage

* add per batch trajectory score (#273)

* add per batch trajectory score

* Update trajectory.py

add missing comma

* Update trajectory.py

import pandas

* don't recompute trajectories per batch

* correct batch var handling

* Update trajectory.py

* Check batch key

* add tests for trajectory score

* update test values

* update test values

Co-authored-by: Strobl <[email protected]>
Co-authored-by: Michaela Mueller <[email protected]>

* renamed scIB to scib

* fixed imports

* Fixed name in setup.py

* fix name in CI

* refactored package options to setup.cfg

* separate command for pip install scib

* fix finding packages

* moved pytest.ini content to pyproject.toml

* removed old module notes

* removed redundant requirements files

* rename functions to snake case

* wrap integration functions with old functions names

* Better deprecation warning

* rename read_conos and read_scanorama

* moved test dependencies to setup.cfg

* Update README

* Rename batch in README usage

* rename trajectory batch function usage

* integrated code review

* add kwargs to integration methods

* Use tempfile for model paths

* rename packaging tools file

* minor code review changes

* use tempfile for conos saving

* Throw error when batches mismatch in trajectory conservation metric

* restructured utils functions in metrics module

* Revert "Throw error when batches mismatch in trajectory conservation metric"

This reverts commit fe0200c.

* Throw error when batches mismatch in trajectory conservation metric

* Revert "restructured utils functions in metrics module"

This reverts commit c30a501.

* fix batch check in TI conservation

* update import order

* setup bumpversion

* Bump version: 0.2.0 → 1.0.0

* add MANIFEST.in

Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: Strobl <[email protected]>
  • Loading branch information
3 people authored Oct 24, 2021
1 parent a757bfc commit 985d815
Show file tree
Hide file tree
Showing 57 changed files with 1,064 additions and 886 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,19 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest
pip install .
pip install -r tests/requirements.txt
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Install package
run: pip install .[test]
- name: Import package
run: |
pip list
cd tests
python -c 'import scIB'
python -c 'import scib'
- name: Test with pytest
run: |
pytest --durations 0 -s
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ testing.h5ad
data
.ipynb_checkpoints
*.egg-info
*dist/
*cache*
.snakemake

Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include VERSION.txt
87 changes: 65 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ batches of gene expression and chromatin accessibility data.
+ On our [website](https://theislab.github.io/scib-reproducibility) we visualise the results of the study.

+ The reusable pipeline we used in the study can be found in the
separate [scIB pipeline](https://github.com/theislab/scib-pipeline.git) repository. It is reproducible and automates
separate [scib pipeline](https://github.com/theislab/scib-pipeline.git) repository. It is reproducible and automates
the computation of preprocesssing combinations, integration methods and benchmarking metrics.

+ For reproducibility and visualisation we have a dedicated
Expand All @@ -24,14 +24,24 @@ batches of gene expression and chromatin accessibility data.
MD Luecken, M Büttner, K Chaichoompu, A Danese, M Interlandi, MF Mueller, DC Strobl, L Zappia, M Dugas, M Colomé-Tatché,
FJ Theis bioRxiv 2020.05.22.111161; doi: https://doi.org/10.1101/2020.05.22.111161_

## Package: `scIB`
## Package: `scib`

We created the python package called `scIB` that uses `scanpy` to streamline the integration of single-cell datasets and
evaluate the results. The evaluation of integration quality is based on a number of metrics.
We created the python package called `scib` that uses `scanpy` to streamline the integration of single-cell datasets and
evaluate the results. For evaluating the integration quality it provides a number of metrics.

### Requirements

+ Linux or UNIX system
+ Python >= 3.7
+ 3.6 <= R <= 4.0

We recommend working with environments such as Conda or virtualenv, so that python and R dependencies are in one place.
Please also check out [scib pipeline](https://github.com/theislab/scib-pipeline.git) for ready-to-use environments.
Alternatively, manually install the package on your system using pip, described in the next section.

### Installation

The `scIB` python package is in the folder scIB. You can install it from the root of this repository using
The `scib` python package is in the folder scib. You can simply install it from the root of this repository using

```
pip install .
Expand All @@ -49,53 +59,72 @@ Additionally, in order to run the R package `kBET`, you need to install it throu
devtools::install_github('theislab/kBET')
```

We recommend to use a conda environment or something similar, so that python and R dependencies are in one place. Please
also check out [scIB pipeline](https://github.com/theislab/scib-pipeline.git) for ready-to-use environments.
> **Note:** By default dependencies for integration methods are not installed due to dependency clashes.
> In order to use integration methods, see the next section
### Installing additional packages

This package contains code for running integration methods as well as for evaluating their output. However, due to
dependency clashes, `scIB` is only installed with the packages needed for the metrics. In order to use the integration
dependency clashes, `scib` is only installed with the packages needed for the metrics. In order to use the integration
wrapper functions, we recommend to work with different environments for different methods, each with their own
installation of `scIB`. Check out the `Tools` section for a list of supported integration methods.
installation of `scib`. You can install optional Python dependencies via pip as follows:

```
pip install .[bbknn] # using BBKNN
pip install .[scanorama] # using Scanorama
pip install .[bbknn,scanorama] # Multiple methods in one go
```

The `setup.cfg` for a full list of Python dependencies. For a comprehensive list of supported integration methods,
including R packages, check out the `Tools`.

## Usage

The package contains several modules for the different steps of the integration and benchmarking pipeline. Functions for
the integration methods are in `scIB.integration`. The methods can be called using
the integration methods are in `scib.integration` or for short `scib.ig`. The methods can be called using

```py
scib.integration.<method>(adata, batch=<batch_key>)
```
scIB.integration.run<method>(adata, batch=<batch>)

where `<method>` is the name of the integration method and `<batch_key>` is the name of the batch column in `adata.obs`.
For example, in order to run Scanorama, on a dataset with batch key 'batch' call

```py
scib.integration.scanorama(adata, batch='batch')
```

where `<method>` is the name of the integration method and `<batch>` is the name of the batch column in `adata.obs`.
> **Warning:** the following notation is deprecated.
> ```
> scib.integration.run<method>(adata, batch=<batch_key>)
> ```
> Please use the snake case naming without the `run` prefix.
Some integration methods (scGEN, SCANVI) also use cell type labels as input. For these, you need to additionally provide
Some integration methods (`scgen`, `scanvi`) also use cell type labels as input. For these, you need to additionally provide
the corresponding label column.
```
runScGen(adata, batch=<batch>, cell_type=<cell_type>)
runScanvi(adata, batch=<batch>, labels=<cell_type>)
```py
scgen(adata, batch=<batch_key>, cell_type=<cell_type>)
scanvi(adata, batch=<batch_key>, labels=<cell_type>)
```
`scIB.preprocessing` contains methods for preprocessing of the data such as normalisation, scaling or highly variable
gene selection per batch. The metrics are located at `scIB.metrics`. To run multiple metrics in one run, use
the `scIB.metrics.metrics()` function.
`scib.preprocessing` (or `scib.pp`) contains functions for normalising, scaling or selecting highly variable genes per batch
The metrics are under `scib.metrics` (or `scib.me`).

### Metrics
## Metrics

For a detailed description of the metrics implemented in this package, please see
the [manuscript](https://www.biorxiv.org/content/10.1101/2020.05.22.111161v2).

#### Batch removal metrics include:
### Batch removal metrics include:

- Principal component regression `pcr_comparison()`
- Batch ASW `silhouette()`
- K-nearest neighbour batch effect `kBET()`
- Graph connectivity `graph_connectivity()`
- Graph iLISI `lisi_graph()`

#### Biological conservation metrics include:
### Biological conservation metrics include:

- Normalised mutual information `nmi()`
- Adjusted Rand Index `ari()`
Expand All @@ -107,6 +136,20 @@ the [manuscript](https://www.biorxiv.org/content/10.1101/2020.05.22.111161v2).
- Trajectory conservation `trajectory_conservation()`
- Graph cLISI `lisi_graph()`

### Metrics Wrapper Functions
We provide wrapper functions to run multiple metrics in one function call.
The `scib.metrics.metrics()` function returns a `pandas.Dataframe` of all metrics specified as parameters.

```py
scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)
```

Furthermore, `scib.metrics.metrics()` is wrapped by convenience functions that only select certain metrics:

+ `scib.me.metrics_fast()` only computes metrics that require little preprocessing
+ `scib.me.metrics_slim()` includes all functions of `scib.me.metrics_fast()` and adds clustering-based metrics
+ `scib.me.metrics_all()` includes all metrics

## Tools

Tools that are compared include:
Expand Down
1 change: 1 addition & 0 deletions VERSION.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1.0.0
11 changes: 11 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[build-system]
requires = [
"setuptools",
"wheel",
]
build-backend = "setuptools.build_meta"

[tool.pytest.ini_options]
log_cli = 'True'
log_cli_level = 'INFO'
addopts = '-p no:warnings'
4 changes: 0 additions & 4 deletions pytest.ini

This file was deleted.

20 changes: 0 additions & 20 deletions requirements.txt

This file was deleted.

11 changes: 0 additions & 11 deletions requirements_extra.txt

This file was deleted.

9 changes: 0 additions & 9 deletions scIB/__init__.py

This file was deleted.

35 changes: 35 additions & 0 deletions scib/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
try:
from importlib import metadata
except ImportError: # for Python<3.8
import importlib_metadata as metadata

__version__ = metadata.version('scib')

from . import integration, metrics, preprocessing
from . import utils as utils
from ._package_tools import rename_func
from .metrics import clustering

alias_func_map = {
'runScanorama': integration.scanorama,
'runTrVae': integration.trvae,
'runTrVaep': integration.trvaep,
'runScGen': integration.scgen,
'runScvi': integration.scvi,
'runScanvi': integration.scanvi,
'runMNN': integration.mnn,
'runBBKNN': integration.bbknn,
'runSaucie': integration.saucie,
'runCombat': integration.combat,
'runDESC': integration.desc,
'readSeurat': preprocessing.read_seurat,
'readConos': preprocessing.read_conos,
}

for alias, func in alias_func_map.items():
rename_func(func, alias)

pp = preprocessing
ig = integration
me = metrics
cl = clustering
29 changes: 29 additions & 0 deletions scib/_package_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import inspect
import warnings
from functools import wraps

warnings.simplefilter('default') # or 'always'


def wrap_func_naming(func, name):
"""
Decorator that adds a `DeprecationWarning` and a name to `func`.
"""

@wraps(func)
def wrapper(*args, **kwargs):
warnings.warn(
f"Mixed case function naming is deprecated for '{name}'. "
f"Please use '{func.__name__}' instead.",
DeprecationWarning
)
return func(*args, **kwargs)

wrapper.__name__ = name
return wrapper


def rename_func(function, new_name):
if callable(function):
function = wrap_func_naming(function, new_name)
setattr(inspect.getmodule(function), new_name, function)
File renamed without changes.
Loading

0 comments on commit 985d815

Please sign in to comment.