Skip to content

Commit

Permalink
Merge pull request #97 from draeger-lab/dev
Browse files Browse the repository at this point in the history
Integrate new features database access & more general functions to main
  • Loading branch information
GwennyGit authored Aug 21, 2023
2 parents 23e307f + b8ab57c commit 5eac900
Show file tree
Hide file tree
Showing 25 changed files with 127 additions and 68 deletions.
33 changes: 33 additions & 0 deletions .github/workflows/publish-to-test-pypi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Publish new refineGEMs release to PyPI and TestPyPI

on: workflow_dispatch

jobs:
build-n-publish:
name: Build and publish new refineGEMs release to TestPyPI
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.x"
- name: Install pypa/build
run: >-
python3 -m
pip install
build
--user
- name: Build a binary wheel and a source tarball
run: >-
python3 -m
build
--sdist
--wheel
--outdir dist/
.
- name: Publish distribution 📦 to Test PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository-url: https://test.pypi.org/legacy/
4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
include LICENSE
include refinegems/database/current_bigg_db_version.txt
include refinegems/database/sbo_media_db.sql
include refinegems/database/data.db
2 changes: 1 addition & 1 deletion Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
nbsphinx
ipython
sphinxcontrib-bibtex
sphinx_copybutton
accessible-pygments
6 changes: 5 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
author = 'Famke Bäuerle and Gwendolyn O. Gusak'

# The full version, including alpha/beta/rc tags
release = '1.2.2'
release = '1.3.0'


# -- General configuration ---------------------------------------------------
Expand All @@ -35,12 +35,16 @@
'sphinx.ext.autodoc',
'sphinx.ext.autosectionlabel',
'sphinx.ext.mathjax',
'sphinx_copybutton',
'nbsphinx',
'sphinx_rtd_theme',
'IPython.sphinxext.ipython_console_highlighting',
'sphinxcontrib.bibtex'
]

# For copy buttons in code blocks
copybutton_selector = "div.copyable pre"

# For citations
bibtex_bibfiles = ['library.bib']

Expand Down
9 changes: 6 additions & 3 deletions docs/source/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,13 @@ Development installation
* `pandoc`
* `ipython`
* `sphinxcontrib-bibtex`
* `sphinx_copybutton`

You can install the packages via pip to your local environment:

.. code:: bash
pip install sphinx nbsphinx sphinx_rtd_theme pandoc ipython sphinxcontrib-bibtex
pip install sphinx nbsphinx sphinx_rtd_theme pandoc ipython sphinxcontrib-bibtex sphinx_copybutton
If you run into an error with jinja2, just switch to version 3.0.3:

Expand All @@ -36,7 +37,7 @@ If you want your print message to show in the log file, replace the ```print()``
Documentation notes
-------------------

We use the autoDocstring extension (njpwerner.autodocstring) for vscode with the google format to generate function docstrings. To ensure a nice looking sphinx documentation, we add ``-`` to all variables that are passed as Args. And tuple returns are written as follows:
We use the autoDocstring extension (njpwerner.autodocstring) for VSCode with the google format to generate function docstrings. To ensure a nice looking sphinx documentation, we add ``-`` to all variables that are passed as Args. And tuple returns are written as follows:

.. code:: python
:linenos:
Expand All @@ -57,4 +58,6 @@ We are also trying to make input and return types explicit by declaring those in
.. code:: python
:linenos:
def my_func(input1: int, input2: str, input3: Model) -> tuple[str, int]:
def my_func(input1: int, input2: str, input3: Model) -> tuple[str, int]:
More details for certain specifics can also be found `here <https://github.com/draeger-lab/refinegems/issues/74>`__.
6 changes: 3 additions & 3 deletions docs/source/in_silico_media_generation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ From laboratory to *in silico* medium

.. hint::
If you want to use the medium with ``refineGEMs.growth`` add the definition to the database schema ``sbo_media_db.sql``
in the folder *data/database* in the downloaded repository. To update the database with the newly added table just
in the folder *refinegems/database* in the downloaded repository. To update the database with the newly added table just
delete the file ``data.db`` in the same folder and run refineGEMs.

1. Search papers containing medium definitions/ Search paper or provider information for a medium that could be
interesting for your organism
1. Search papers containing medium definitions./ Search paper or provider information for a medium that could be
interesting for your organism.
2. | If the paper contains already an *in silico* defintion: Go to step 3.
| If not:
Expand Down
20 changes: 11 additions & 9 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
Welcome to refineGEMs!
======================================
``refineGEMs`` is a python package intended
``refineGEMs`` is a Python package intended
to help with the curation of genome-scale metabolic models (GEMS).

.. hint:: For bug reports please write issues on the `GitHub page <https://github.com/draeger-lab/refinegems/issues>`__.
.. hint:: For bug reports please write issues on the `GitHub page <https://github.com/draeger-lab/refinegems/issues>`__ or open a discussion `here <https://github.com/draeger-lab/refinegems/discussions>`__.

Overview
--------

Currently ``refineGEMs`` can be used for the investigation of a GEM, it can complete the following tasks:

* loading GEMS with ``cobrapy`` and ``libSBML``
* loading GEMS with ``COBRApy`` and ``libSBML``
* report number of metabolites, reactions and genes
* report orphaned, deadends and disconnected metabolites
* report mass and charge unbalanced reactions
* report `Memote <https://memote.readthedocs.io/en/latest/index.html>`__ score
* compare the genes present in the model to the genes found in:
* the `KEGG <https://www.genome.jp/kegg/kegg1.html>`__ Database (Note: This requires a GFF file of your organism and the KEGG identifier of your organism.)
* Or the `BioCyc <https://biocyc.org/>`__ Database (Note: This requires that a database entry for your organism exists in BioCyc.)
* compare the charges and masses of the metabolites present in the model to the charges and masses denoted in the `ModelSEED <https://modelseed.org/>`__ Database
* compare the charges and masses of the metabolites present in the model to the charges and masses denoted in the `ModelSEED <https://modelseed.org/>`__ Database.

Other applications of ``refineGEMs`` to curate a given model include:

* The correction of a model created with `CarveMe <https://github.com/cdanielmachado/carveme>`__ v.1.5.1 (for example moving all relevant information from the notes to the annotation field) this includes automated annotation of NCBI genes to the GeneProtein section of the model
* The addition of `KEGG <https://www.genome.jp/kegg/kegg1.html>`__ Pathways as Groups (using the `libSBML <https://synonym.caltech.edu/software/libsbml/5.18.0/docs/formatted/python-api/classlibsbml_1_1_groups_model_plugin.html>`__ Groups Plugin)
* Updating the SBO-Term annotations based on a SBOannotator
* Updating the annotation of metabolites and extending the model with reactions (for the purpose of filling gaps) based on a table filled by the user ``data/manual_annotations.xlsx``, note that this only works when the structure of the given table is used
* And extending the model with all information surrounding reactions including the corresponding GeneProducts and metabolites by filling in the table ``data/modelName_gapfill_analysis_date_example.xlsx``, note this also only works when the structure of the given Excel file is used
* The correction of a model created with `CarveMe <https://github.com/cdanielmachado/carveme>`__ v.1.5.1 (for example moving all relevant information from the notes to the annotation field) this includes automated annotation of NCBI genes to the GeneProduct section of the model,
* The addition of `KEGG <https://www.genome.jp/kegg/kegg1.html>`__ Pathways as Groups (using the `libSBML <https://synonym.caltech.edu/software/libsbml/5.18.0/docs/formatted/python-api/classlibsbml_1_1_groups_model_plugin.html>`__ Groups Plugin),
* Updating the SBO-Term annotations based on SBOannotator\ :footcite:p:`Leonidou2023_sboann`,
* Updating the annotation of metabolites and extending the model with reactions (for the purpose of filling gaps) based on a table filled by the user ``data/manual_annotations.xlsx`` (Note: This only works when the structure of the given table is used.),
* And extending the model with all information surrounding reactions including the corresponding GeneProducts and metabolites by filling in the table ``data/modelName_gapfill_analysis_date_example.xlsx`` (Note: This also only works when the structure of the given Excel file is used).


.. toctree::
Expand All @@ -43,3 +43,5 @@ Other applications of ``refineGEMs`` to curate a given model include:

* :ref:`genindex`
* :ref:`search`

.. footbibliography::
15 changes: 9 additions & 6 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ Installation

Installation via pip
--------------------
To install refineGEMs as Python package, simply install it via ``pip``:
To install refineGEMs as Python package from `PyPI <https://pypi.org/project/refineGEMs/>`__, simply install it via ``pip``:

.. code:: bash
.. code:: console
:class: copyable
pip install refinegems
pip install refineGEMs
The corresponding project site can be found `here <https://pypi.org/project/refineGEMs/>`__.

Expand Down Expand Up @@ -60,6 +61,10 @@ If `which pip` does not show pip in the conda environment you can also create a
**Pipenv**

.. warning::
| Since version 1.1.0 the Pipfile and Pipfile.lock files are not up to date anymore.
| This installation method might not work.
You can use
`pipenv <https://pipenv.pypa.io/en/latest/>`__ to keep all dependencies together. You will need to install
``pipenv`` first. To install ``refineGEMs`` locally complete the
Expand Down Expand Up @@ -96,13 +101,11 @@ Troubleshooting
``pipenv install``.

- If you run into a problem with ``pipenv`` not locking after f.ex. moving the repository try uninstalling ``pipenv`` and reinstalling it via pip. Then run ``pipenv install`` and it should work again.
- If you use vscode terminals and have trouble accessing the python from within your conda environment, deactivate base and reactivate again:
- If you use VSCode terminals and have trouble accessing the python from within your conda environment, deactivate base and reactivate again:

.. code:: bash
conda deactivate
conda deactivate
conda activate base
conda activate <your conda env>
2 changes: 1 addition & 1 deletion docs/source/modules/examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"metadata": {},
"outputs": [],
"source": [
"model = rg.ioload_model_cobra('../../data/e_coli_core.xml')"
"model = rg.io.load_model_cobra('../../data/e_coli_core.xml')"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/source/modules/gapfill.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ To perform the gap analysis the following parameters are relevant for the config
To add genes, metabolites and reactions from an Excel table to a model the following parameters need to be set:
(The Excel file is either obtained by running gapfill_analysis or created by hand with the same structure as the result file from gapfill_analysis.
An example Excel file to fill in by hand can be found in the cloned repository under 'data/modelName_gapfill_analysis_date_example.xlsx')
An example Excel file to fill in by hand can be found in the cloned repository under ``data/modelName_gapfill_analysis_date_example.xlsx``)

.. code:: yaml
Expand Down
4 changes: 2 additions & 2 deletions docs/source/modules/growth.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ Outputs a table with the column headers:
Implementation
--------------

Growth rates and thus doubling times can be determined with Flux Balance Analysis (FBA). RefineGEMs uses a COBRApy based implementation that adds metabolites one-by-one to custom media definitions until growth is obtained. The pseudocode is shown below.
Growth rates and, thus, doubling times can be determined with Flux Balance Analysis (FBA). RefineGEMs uses a COBRApy based implementation that adds metabolites one-by-one to custom media definitions until growth is obtained. The pseudocode is shown below.

.. image:: ../images/growth_algorithm.png
:align: center
:width: 400
:alt: Pseudocode representation of the algorithm implemented for growth simulation.

There is a flag called basis which can be set to either default_uptake or minimal_uptake. You can decide from which uptake you want to fill your medium of interest when looking for missing metabolites. Either the default_uptake which is the uptake that the model has when no specific medium is set or the minimal_uptake which is the uptake resulting from cobrapys minimal_medium optimization.
There is a flag called basis which can be set to either ``default_uptake`` or ``minimal_uptake``. You can decide from which uptake you want to fill your medium of interest when looking for missing metabolites. Either the ``default_uptake`` which is the uptake that the model has when no specific medium is set or the ``minimal_uptake`` which is the uptake resulting from COBRApy's minimal_medium optimization.

Available media
---------------
Expand Down
10 changes: 5 additions & 5 deletions docs/source/modules/pathways.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
Addition of KEGG Pathways
=========================

The KEGG database holds information on metabolic pathways. If your organism occurs in the KEGG database, you can use this module to add KEGG pathways with the libSBML Groups plugin.
The KEGG database holds information on metabolic pathways. You can use this module to add KEGG pathways with the libSBML Groups plugin.

The workflow of the script is as follows:
1. Extraction of the KEGG reaction ID from the annotations of your reactions
2. Identification, in which KEGG pathways this reaction occurs
3. Addition of all KEGG pathways for a reaction then as annotations with the biological qualifier OCCURS_IN to the respective reaction.
4. Addition of all KEGG pathways as groups with references to the contained reactions as groups:member
1. Extraction of the KEGG reaction IDs from the annotations of your reactions
2. Identification, in which KEGG pathways these reactions occur
3. Addition of all KEGG pathways for a reaction with the biological qualifier ``OCCURS_IN`` to the annotations
4. Addition of all KEGG pathways as groups with references to the contained reactions as ``groups:member``

The only function that you will need to access is ``kegg_pathways``:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/modules/polish.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Polishing a CarveMe model
=========================

The newer version of CarveMe leads to some irritations in the model, the scripts in ``polish`` enable for example the addition of BiGG Ids to the annotations as well as a correct formatting of the annotations.
CarveMe version 1.5.1 leads to some irritations in the model, the scripts in ``polish`` enable for example the addition of BiGG IDs to the annotations as well as a correct formatting of the annotations.

.. warning::
Using ``lab_strain=True`` has the following two requirements:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/modules/sboann.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ SBOannotator with refineGEMs

RefineGEMs offers access to the functionalities of `SBOannotator <https://github.com/draeger-lab/SBOannotator>`__\ :footcite:p:`Leonidou2023_sboann`.

The ``sboann`` module is splitted into a lot of small functions which are all annotated, however when using it for SBO-Term annotation it only makes sense to run the "main" function:
The ``sboann`` module is splitted into a lot of small functions which are all annotated, however when using it for SBO-Term annotation it only makes sense to run the function ``sbo_annotation``:

.. autofunction:: refinegems.sboann.sbo_annotation
:noindex:
Expand All @@ -15,6 +15,6 @@ The ``sboann`` module is splitted into a lot of small functions which are all an
model_sboann = rg.sboann.sbo_annotation(<path to your model>)
rg.io.write_to_file(model_sboann, <path to modified model>)
If you use it from the refineGEMs toolbox with the config you can get visualizations of SBO-Term distribution before and after SBO-Term updates.
If you use it from the refineGEMs toolbox with the config you can get a visualization of the SBO-Term distribution before and after the SBO-Term update.

.. footbibliography::
10 changes: 5 additions & 5 deletions docs/source/pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ Generating a model for an organism where no information on genes and proteins is
causes the problem that the model will not contain valid database identifiers for any GeneProduct. To resolve this issue the
workflow in Figure :numref:`workflow` can be used.

1. First annotate the genome with NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) to obtain the same FASTA format as used in NCBI.
2. Then use diamond with the ``nr`` database from NCBI and the obtained annotated FASTA file as input. Restrict the search to your organism's taxon if known and use the flag for taxonomy checking.
1. First annotate the genome with NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) to obtain the same FASTA format as used in NCBI and use the flag for taxonomy checking.
2. Then use DIAMOND with the ``nr`` database from NCBI and the obtained annotated FASTA file as input. Restrict the search to your organism's taxon if known.
3. Check if any protein in the annotation FASTA file still has no database identifier.

| -> YES: Rerun diamond without the taxonomy check and without the restriction for the organism's taxon.
| -> YES: Rerun DIAMOND without the taxonomy check and without the restriction for the organism's taxon.
|
| -> NO: Continue with step 4.
4. Add the diamond result to the annotated FASTA file.
5. Run e.g. ``CarveME`` to obtain a draft model.
4. Add the DIAMOND result to the annotated FASTA file.
5. Run e.g. ``CarveMe`` to obtain a draft model.
6. Check if in the model any GeneProducts without NCBI Protein or RefSeq identifiers occur.

| -> YES:
Expand Down
10 changes: 5 additions & 5 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Usage as standalone application
-------------------------------

The script ``main.py`` can be used directly in the command line after
entering the virtual environment with ``pipenv shell``.
entering the virtual environment with ``pipenv shell`` or ``conda activate <EnvName>``.

The ``config.yaml`` file contains defaults for all variables that need
to be set by the user.
Expand Down Expand Up @@ -130,11 +130,11 @@ to be set by the user.
The repository structure has the following intention:

* ``refineGEMs/`` contains all the functions needed in ``main.py``
* ``data/`` contains all tables that are used by different parts of the script as well as a toy model ``e_coli_core.xml``
* ``refinegems/`` contains all the functions needed in ``main.py``
* ``data/`` contains all example tables that can be used as input for the curation scripts as well as the ``media_db.csv`` and a toy model ``e_coli_core.xml``
* Instead of using the files given in ``data/``, you can use your own files and just change the paths in ``config.yaml``. Please be aware that some functions rely on input in a certain format so make sure to check the files given in the ``data/`` folder and use the same formatting.
* ``databases/`` contains the ``sql`` file as well as the ``db`` file necessary for the SBOAnn script by Elisabeth Fritze as well as the modules ``gapfill``, ``growth`` and ``modelseed``.
* The ``setup.py`` and ``pyproject.toml`` enable creating a PyPi package called ``refineGEMs``.
* ``refinegems/databases/`` contains the SQL Schema file for the media and ``sboann``-related tables as well as the ready-to-use database file necessary for the SBOAnn script by Elisabeth Fritze as well as the modules ``gapfill``, ``growth`` and ``modelseed``.
* The ``setup.py`` and ``pyproject.toml`` enable creating a PyPI package called ``refineGEMs``.


Usage as python module
Expand Down
Loading

0 comments on commit 5eac900

Please sign in to comment.