Skip to content

Commit

Permalink
Implement readers and writers for MultiAssayExperiment with the new…
Browse files Browse the repository at this point in the history
… takane specification (#1)

* Implement the savers and loaders
* Add tests, documentation
* Update README
  • Loading branch information
jkanche authored Jan 26, 2024
1 parent fe24d79 commit f8a0bc4
Show file tree
Hide file tree
Showing 13 changed files with 557 additions and 190 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/pypi-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Publish to PyPI

on:
push:
tags: "*"

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest tox
# - name: Lint with flake8
# run: |
# # stop the build if there are Python syntax errors or undefined names
# flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
# # flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with tox
run: |
tox
- name: Build docs
run: |
tox -e docs
- run: touch ./docs/_build/html/.nojekyll
- name: GH Pages Deployment
uses: JamesIves/[email protected]
with:
branch: gh-pages # The branch the action should deploy to.
folder: ./docs/_build/html
clean: true # Automatically remove deleted files from the deploy branch
- name: Build Project and Publish
run: |
python -m tox -e clean,build
- name: Publish package
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
with:
user: __token__
password: ${{ secrets.PYPI_PASSWORD }}
37 changes: 37 additions & 0 deletions .github/workflows/pypi-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Test the library

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ '3.8', '3.9', '3.10', '3.11', '3.12' ]

name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest tox
# - name: Lint with flake8
# run: |
# # stop the build if there are Python syntax errors or undefined names
# flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
# # flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with tox
run: |
tox
3 changes: 0 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,3 @@

## Version 0.1 (development)

- Feature A added
- FIX: nasty bug #1729 fixed
- add your changes here!
73 changes: 66 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,76 @@
-->

[![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)
[![PyPI-Server](https://img.shields.io/pypi/v/dolomite-se.svg)](https://pypi.org/project/dolomite-se/)
![Unit tests](https://github.com/ArtifactDB/dolomite-se/actions/workflows/pypi-test.yml/badge.svg)

# dolomite-mae
# Save and load `MultiAssayExperiments` in Python

> Add a short description here!
## Introduction

A longer description of your project goes here...
The **dolomite-mae** package is the Python counterpart to the [**alabaster.mae**](https://github.com/ArtifactDB/alabaster.mae) R package,
providing methods for saving/reading `MultiAssayExperiment` objects within the [**dolomite** framework](https://github.com/ArtifactDB/dolomite-base).
All components of the `MultiAssayExperiment` - column_data, sample map and experiments - are saved to their respective file representations,
which can be loaded in a new R/Python environment for cross-language analyses.

## Quick start

<!-- pyscaffold-notes -->
Let's mock up a `MultiAssayExperiment`:

## Note
```python
from multiassayexperiment import MultiAssayExperiment
from singlecellexperiment import SingleCellExperiment
from summarizedexperiment import SummarizedExperiment
import biocframe
import numpy

This project has been set up using PyScaffold 4.5. For details and usage
information on PyScaffold see https://pyscaffold.org/.
x = numpy.random.rand(1000, 200)
x2 = (numpy.random.rand(1000, 200) * 10).astype(numpy.int32)

sce = SingleCellExperiment(
{"logcounts": x, "counts": x2},
main_experiment_name="aaron's secret modality",
row_data=biocframe.BiocFrame(
{"foo": numpy.random.rand(1000), "bar": numpy.random.rand(1000)},
row_names=["gene_sce_" + str(i) for i in range(1000)],
),
column_data=biocframe.BiocFrame(
{"whee": numpy.random.rand(200), "stuff": numpy.random.rand(200)},
row_names=["cell_sce" + str(i) for i in range(200)],
),
)

se = SummarizedExperiment(
{"counts": numpy.random.rand(100, 200)},
row_data=biocframe.BiocFrame(
{"foo": numpy.random.rand(100), "bar": numpy.random.rand(100)},
row_names=["gene_se_" + str(i) for i in range(100)],
),
column_data=biocframe.BiocFrame(
{"whee": numpy.random.rand(200), "stuff": numpy.random.rand(200)},
row_names=["cell_se" + str(i) for i in range(200)],
),
)

mae = MultiAssayExperiment(experiments={"jay_expt": sce, "aarons_expt": se})
```

Now we can save it:

```python
from dolomite_base import save_object
import dolomite_se
import os
from tempfile import mkdtemp

path = os.path.join(mkdtemp(), "test")
save_object(se, path)
```

And load it again, e,g., in a new session:

```python
from dolomite_base import read_object

roundtrip = read_object(path)
```
5 changes: 4 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
"sphinx.ext.ifconfig",
"sphinx.ext.mathjax",
"sphinx.ext.napoleon",
"sphinx_autodoc_typehints",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -171,7 +172,7 @@

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = "alabaster"
html_theme = "furo"

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
Expand Down Expand Up @@ -299,6 +300,8 @@
"scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
"setuptools": ("https://setuptools.pypa.io/en/stable/", None),
"pyscaffold": ("https://pyscaffold.org/en/stable", None),
"dolomite_base": ("https://artifactdb.github.io/dolomite-base", None),
"multiassayexperiment": ("https://biocpy.github.io/MultiAssayExperiment/", None),
}

print(f"loading configurations for {project} {version} ...", file=sys.stderr)
2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@
# sphinx_rtd_theme
myst-parser[linkify]
sphinx>=3.2.1
furo
sphinx-autodoc-typehints
16 changes: 11 additions & 5 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@

[metadata]
name = dolomite-mae
description = Add a short description here!
description = Save and load multi-assay experiments in the dolomite framework!
author = Jayaram Kancherla
author_email = [email protected]
license = MIT
license_files = LICENSE.txt
long_description = file: README.md
long_description_content_type = text/markdown; charset=UTF-8; variant=GFM
url = https://github.com/pyscaffold/pyscaffold/
url = https://github.com/ArtifactDB/dolomite-mae
# Add here related links, for example:
project_urls =
Documentation = https://pyscaffold.org/
Documentation = https://github.com/ArtifactDB/dolomite-mae
# Source = https://github.com/pyscaffold/pyscaffold/
# Changelog = https://pyscaffold.org/en/latest/changelog.html
# Tracker = https://github.com/pyscaffold/pyscaffold/issues
Expand All @@ -41,15 +41,21 @@ package_dir =
=src

# Require a min/specific Python version (comma-separated conditions)
# python_requires = >=3.8
python_requires = >=3.8

# Add here dependencies of your project (line-separated), e.g. requests>=2.2,<3.0.
# Version specifiers like >=2.2,<3.0 avoid problems due to API changes in
# new major versions. This works if the required packages follow Semantic Versioning.
# For more information, check out https://semver.org/.
install_requires =
importlib-metadata; python_version<"3.8"

dolomite_base==0.2.0-alpha6
dolomite_sce==0.1.0-alpha
dolomite_se==0.1.0-alpha2
multiassayexperiment>=0.4.2,<0.5.0
biocutils
pandas
numpy

[options.packages.find]
where = src
Expand Down
3 changes: 3 additions & 0 deletions src/dolomite_mae/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,6 @@
__version__ = "unknown"
finally:
del version, PackageNotFoundError

from .read_multi_assay_experiment import read_multi_assay_experiment
from .save_multi_assay_experiment import save_multi_assay_experiment
101 changes: 101 additions & 0 deletions src/dolomite_mae/read_multi_assay_experiment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import json
import os

import dolomite_base as dl
import h5py
from biocframe import BiocFrame
from dolomite_base.read_object import read_object_registry
from multiassayexperiment import MultiAssayExperiment

read_object_registry[
"multi_sample_dataset"
] = "dolomite_mae.read_multi_assay_experiment"


def read_multi_assay_experiment(
path: str, metadata: dict, **kwargs
) -> MultiAssayExperiment:
"""Load a
:py:class:`~multiassayexperiment.MultiAssayExperiment.MultiAssayExperiment`
from its on-disk representation.
This method should generally not be called directly but instead be invoked by
:py:meth:`~dolomite_base.read_object.read_object`.
Args:
path:
Path to the directory containing the object.
metadata:
Metadata for the object.
kwargs:
Further arguments, ignored.
Returns:
A
:py:class:`~multiassayexperiment.MultiAssayExperiment.MultiAssayExperiment`
with file-backed arrays in the assays.
"""

_sample_path = os.path.join(path, "sample_data")
_sample_data = None
if os.path.exists(_sample_path):
_sample_data = dl.read_object(_sample_path)

if _sample_data is None:
raise RuntimeError("Cannot read 'sample_data'.")

_srow_names = _sample_data.get_row_names()
if _srow_names is None:
raise RuntimeError("'sample_data' does not contain 'row_names'.")

_expts_path = os.path.join(path, "experiments")
_expts = {}
_expt_names = []
_sample_map_data = None
if os.path.exists(_expts_path):
with open(os.path.join(_expts_path, "names.json"), "r") as handle:
_expt_names = json.load(handle)

if len(_expt_names) > 0:
_sample_map_path = os.path.join(path, "sample_map.h5")
_shandle = h5py.File(_sample_map_path, "r")
_sghandle = _shandle["multi_sample_dataset"]
_primary = []
_assay = []
_colname = []

for _aidx, _aname in enumerate(_expt_names):
_expt_read_path = os.path.join(_expts_path, str(_aidx))

try:
_expts[_aname] = dl.read_object(_expt_read_path)
except Exception as ex:
raise RuntimeError(
f"failed to load experiment '{_aname}' from '{path}'; "
+ str(ex)
)

_expt_map = dl.load_vector_from_hdf5(
_sghandle[str(_aidx)], expected_type=int, report_1darray=True
)

_assay.extend([_aname] * _expts[_aname].shape[1])
_colname.extend(_expts[_aname].get_column_names())
_primary.extend([_srow_names[i] for i in _expt_map])

_sample_map_data = BiocFrame(
{"primary": _primary, "colname": _colname, "assay": _assay}
)

mae = MultiAssayExperiment(
experiments=_expts, column_data=_sample_data, sample_map=_sample_map_data
)

_meta_path = os.path.join(path, "other_data")
if os.path.exists(_meta_path):
_meta = dl.read_object(_meta_path)
mae = mae.set_metadata(_meta.as_dict())

return mae
Loading

0 comments on commit f8a0bc4

Please sign in to comment.