Merge pull request #97 from draeger-lab/dev

Integrate new features database access & more general functions to main
draeger-lab · Aug 21, 2023 · 5eac900 · 5eac900
2 parents 23e307f + b8ab57c
commit 5eac900
Show file tree

Hide file tree

Showing 25 changed files with 127 additions and 68 deletions.
diff --git a/.github/workflows/publish-to-test-pypi.yml b/.github/workflows/publish-to-test-pypi.yml
@@ -0,0 +1,33 @@
+name: Publish new refineGEMs release to PyPI and TestPyPI
+
+on: workflow_dispatch
+
+jobs:
+  build-n-publish:
+    name: Build and publish new refineGEMs release to TestPyPI
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: "3.x"
+    - name: Install pypa/build
+      run: >-
+        python3 -m
+        pip install
+        build
+        --user
+    - name: Build a binary wheel and a source tarball
+      run: >-
+        python3 -m
+        build
+        --sdist
+        --wheel
+        --outdir dist/
+        .
+    - name: Publish distribution 📦 to Test PyPI
+      uses: pypa/gh-action-pypi-publish@release/v1
+      with:
+        password: ${{ secrets.TEST_PYPI_API_TOKEN }}
+        repository-url: https://test.pypi.org/legacy/
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,4 @@
+include LICENSE
+include refinegems/database/current_bigg_db_version.txt
+include refinegems/database/sbo_media_db.sql
+include refinegems/database/data.db
diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1,4 +1,5 @@
 nbsphinx
 ipython
 sphinxcontrib-bibtex
+sphinx_copybutton
 accessible-pygments
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -23,7 +23,7 @@
 author = 'Famke Bäuerle and Gwendolyn O. Gusak'
 
 # The full version, including alpha/beta/rc tags
-release = '1.2.2'
+release = '1.3.0'
 
 
 # -- General configuration ---------------------------------------------------
@@ -35,12 +35,16 @@
     'sphinx.ext.autodoc',
     'sphinx.ext.autosectionlabel',
     'sphinx.ext.mathjax',
+    'sphinx_copybutton',
     'nbsphinx',
     'sphinx_rtd_theme',
     'IPython.sphinxext.ipython_console_highlighting',
     'sphinxcontrib.bibtex'
 ]
 
+# For copy buttons in code blocks
+copybutton_selector =  "div.copyable pre"
+
 # For citations
 bibtex_bibfiles = ['library.bib']
 

diff --git a/docs/source/development.rst b/docs/source/development.rst
@@ -13,12 +13,13 @@ Development installation
     * `pandoc`
     * `ipython`
     * `sphinxcontrib-bibtex`
+    * `sphinx_copybutton`
 
 You can install the packages via pip to your local environment:
 
 .. code:: bash
 
-    pip install sphinx nbsphinx sphinx_rtd_theme pandoc ipython sphinxcontrib-bibtex
+    pip install sphinx nbsphinx sphinx_rtd_theme pandoc ipython sphinxcontrib-bibtex sphinx_copybutton
 
 If you run into an error with jinja2, just switch to version 3.0.3:
 
@@ -36,7 +37,7 @@ If you want your print message to show in the log file, replace the ```print()``
 Documentation notes
 -------------------
 
-We use the autoDocstring extension (njpwerner.autodocstring) for vscode with the google format to generate function docstrings. To ensure a nice looking sphinx documentation, we add ``-`` to all variables that are passed as Args. And tuple returns are written as follows:
+We use the autoDocstring extension (njpwerner.autodocstring) for VSCode with the google format to generate function docstrings. To ensure a nice looking sphinx documentation, we add ``-`` to all variables that are passed as Args. And tuple returns are written as follows:
 
 .. code:: python
     :linenos:
@@ -57,4 +58,6 @@ We are also trying to make input and return types explicit by declaring those in
 .. code:: python
     :linenos:
 
-    def my_func(input1: int, input2: str, input3: Model) -> tuple[str, int]:
+    def my_func(input1: int, input2: str, input3: Model) -> tuple[str, int]:
+
+More details for certain specifics can also be found `here <https://github.com/draeger-lab/refinegems/issues/74>`__.
diff --git a/docs/source/in_silico_media_generation.rst b/docs/source/in_silico_media_generation.rst
@@ -3,11 +3,11 @@ From laboratory to *in silico* medium
 
 .. hint:: 
    If you want to use the medium with ``refineGEMs.growth`` add the definition to the database schema ``sbo_media_db.sql`` 
-   in the folder *data/database* in the downloaded repository. To update the database with the newly added table just 
+   in the folder *refinegems/database* in the downloaded repository. To update the database with the newly added table just 
    delete the file ``data.db`` in the same folder and run refineGEMs.
 
-1. Search papers containing medium definitions/ Search paper or provider information for a medium that could be 
-   interesting for your organism
+1. Search papers containing medium definitions./ Search paper or provider information for a medium that could be 
+   interesting for your organism.
 2. | If the paper contains already an *in silico* defintion: Go to step 3.
    | If not:
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,32 +1,32 @@
 Welcome to refineGEMs!
 ======================================
-``refineGEMs`` is a python package intended
+``refineGEMs`` is a Python package intended
 to help with the curation of genome-scale metabolic models (GEMS).
 
-.. hint:: For bug reports please write issues on the `GitHub page <https://github.com/draeger-lab/refinegems/issues>`__.
+.. hint:: For bug reports please write issues on the `GitHub page <https://github.com/draeger-lab/refinegems/issues>`__ or open a discussion `here <https://github.com/draeger-lab/refinegems/discussions>`__.
 
 Overview
 --------
 
 Currently ``refineGEMs`` can be used for the investigation of a GEM, it can complete the following tasks:
 
-* loading GEMS with ``cobrapy`` and ``libSBML``
+* loading GEMS with ``COBRApy`` and ``libSBML``
 * report number of metabolites, reactions and genes
 * report orphaned, deadends and disconnected metabolites
 * report mass and charge unbalanced reactions
 * report `Memote <https://memote.readthedocs.io/en/latest/index.html>`__ score
 * compare the genes present in the model to the genes found in:
   * the `KEGG <https://www.genome.jp/kegg/kegg1.html>`__ Database (Note: This requires a GFF file of your organism and the KEGG identifier of your organism.)
   * Or the `BioCyc <https://biocyc.org/>`__ Database (Note: This requires that a database entry for your organism exists in BioCyc.)
-* compare the charges and masses of the metabolites present in the model to the charges and masses denoted in the `ModelSEED <https://modelseed.org/>`__ Database
+* compare the charges and masses of the metabolites present in the model to the charges and masses denoted in the `ModelSEED <https://modelseed.org/>`__ Database.
 
 Other applications of ``refineGEMs`` to curate a given model include: 
 
-* The correction of a model created with `CarveMe <https://github.com/cdanielmachado/carveme>`__ v.1.5.1 (for example moving all relevant information from the notes to the annotation field) this includes automated annotation of NCBI genes to the GeneProtein section of the model
-* The addition of `KEGG <https://www.genome.jp/kegg/kegg1.html>`__ Pathways as Groups (using the `libSBML <https://synonym.caltech.edu/software/libsbml/5.18.0/docs/formatted/python-api/classlibsbml_1_1_groups_model_plugin.html>`__ Groups Plugin)
-* Updating the SBO-Term annotations based on a SBOannotator
-* Updating the annotation of metabolites and extending the model with reactions (for the purpose of filling gaps) based on a table filled by the user ``data/manual_annotations.xlsx``, note that this only works when the structure of the given table is used
-* And extending the model with all information surrounding reactions including the corresponding GeneProducts and metabolites by filling in the table ``data/modelName_gapfill_analysis_date_example.xlsx``, note this also only works when the structure of the given Excel file is used
+* The correction of a model created with `CarveMe <https://github.com/cdanielmachado/carveme>`__ v.1.5.1 (for example moving all relevant information from the notes to the annotation field) this includes automated annotation of NCBI genes to the GeneProduct section of the model,
+* The addition of `KEGG <https://www.genome.jp/kegg/kegg1.html>`__ Pathways as Groups (using the `libSBML <https://synonym.caltech.edu/software/libsbml/5.18.0/docs/formatted/python-api/classlibsbml_1_1_groups_model_plugin.html>`__ Groups Plugin),
+* Updating the SBO-Term annotations based on SBOannotator\ :footcite:p:`Leonidou2023_sboann`,
+* Updating the annotation of metabolites and extending the model with reactions (for the purpose of filling gaps) based on a table filled by the user ``data/manual_annotations.xlsx`` (Note: This only works when the structure of the given table is used.),
+* And extending the model with all information surrounding reactions including the corresponding GeneProducts and metabolites by filling in the table ``data/modelName_gapfill_analysis_date_example.xlsx`` (Note: This also only works when the structure of the given Excel file is used).
 
 
 .. toctree::
@@ -43,3 +43,5 @@ Other applications of ``refineGEMs`` to curate a given model include:
 
 * :ref:`genindex`
 * :ref:`search`
+
+.. footbibliography::
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -3,11 +3,12 @@ Installation
 
 Installation via pip
 --------------------
-To install refineGEMs as Python package, simply install it via ``pip``:
+To install refineGEMs as Python package from `PyPI <https://pypi.org/project/refineGEMs/>`__, simply install it via ``pip``:
 
-.. code:: bash
+.. code:: console
+   :class: copyable
 
-   pip install refinegems
+   pip install refineGEMs
 
 The corresponding project site can be found `here <https://pypi.org/project/refineGEMs/>`__.
 
@@ -60,6 +61,10 @@ If `which pip` does not show pip in the conda environment you can also create a
 
 **Pipenv**
 
+.. warning::
+   | Since version 1.1.0 the Pipfile and Pipfile.lock files are not up to date anymore.
+   | This installation method might not work.
+
 You can use
 `pipenv <https://pipenv.pypa.io/en/latest/>`__ to keep all dependencies together. You will need to install
 ``pipenv`` first. To install ``refineGEMs`` locally complete the
@@ -96,13 +101,11 @@ Troubleshooting
    ``pipenv install``.
 
 - If you run into a problem with ``pipenv`` not locking after f.ex. moving the repository try uninstalling ``pipenv`` and reinstalling it via pip. Then  run ``pipenv install`` and it should work again.
-- If you use vscode terminals and have trouble accessing the python from within your conda environment, deactivate base and reactivate again:
+- If you use VSCode terminals and have trouble accessing the python from within your conda environment, deactivate base and reactivate again:
 
 .. code:: bash
 
    conda deactivate
    conda deactivate
    conda activate base
    conda activate <your conda env>
-
-
diff --git a/docs/source/modules/examples.ipynb b/docs/source/modules/examples.ipynb
@@ -32,7 +32,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "model = rg.ioload_model_cobra('../../data/e_coli_core.xml')"
+    "model = rg.io.load_model_cobra('../../data/e_coli_core.xml')"
    ]
   },
   {

diff --git a/docs/source/modules/gapfill.rst b/docs/source/modules/gapfill.rst
@@ -65,7 +65,7 @@ To perform the gap analysis the following parameters are relevant for the config
 
 To add genes, metabolites and reactions from an Excel table to a model the following parameters need to be set:
 (The Excel file is either obtained by running gapfill_analysis or created by hand with the same structure as the result file from gapfill_analysis.
-An example Excel file to fill in by hand can be found in the cloned repository under 'data/modelName_gapfill_analysis_date_example.xlsx')
+An example Excel file to fill in by hand can be found in the cloned repository under ``data/modelName_gapfill_analysis_date_example.xlsx``)
 
 .. code:: yaml
 

diff --git a/docs/source/modules/growth.rst b/docs/source/modules/growth.rst
@@ -12,14 +12,14 @@ Outputs a table with the column headers:
 Implementation
 --------------
 
-Growth rates and thus doubling times can be determined with Flux Balance Analysis (FBA). RefineGEMs uses a COBRApy based implementation that adds metabolites one-by-one to custom media definitions until growth is obtained. The pseudocode is shown below.
+Growth rates and, thus, doubling times can be determined with Flux Balance Analysis (FBA). RefineGEMs uses a COBRApy based implementation that adds metabolites one-by-one to custom media definitions until growth is obtained. The pseudocode is shown below.
 
 .. image:: ../images/growth_algorithm.png
   :align: center
   :width: 400
   :alt: Pseudocode representation of the algorithm implemented for growth simulation.
 
-There is a flag called basis which can be set to either default_uptake or minimal_uptake. You can decide from which uptake you want to fill your medium of interest when looking for missing metabolites. Either the default_uptake which is the uptake that the model has when no specific medium is set or the minimal_uptake which is the uptake resulting from cobrapys minimal_medium optimization.
+There is a flag called basis which can be set to either ``default_uptake`` or ``minimal_uptake``. You can decide from which uptake you want to fill your medium of interest when looking for missing metabolites. Either the ``default_uptake`` which is the uptake that the model has when no specific medium is set or the ``minimal_uptake`` which is the uptake resulting from COBRApy's minimal_medium optimization.
 
 Available media
 ---------------

diff --git a/docs/source/modules/pathways.rst b/docs/source/modules/pathways.rst
@@ -1,13 +1,13 @@
 Addition of KEGG Pathways
 =========================
 
-The KEGG database holds information on metabolic pathways. If your organism occurs in the KEGG database, you can use this module to add KEGG pathways with the libSBML Groups plugin.
+The KEGG database holds information on metabolic pathways. You can use this module to add KEGG pathways with the libSBML Groups plugin.
 
 The workflow of the script is as follows:
-1. Extraction of the KEGG reaction ID from the annotations of your reactions
-2. Identification, in which KEGG pathways this reaction occurs
-3. Addition of all KEGG pathways for a reaction then as annotations with the biological qualifier ‘OCCURS_IN’ to the respective reaction.
-4. Addition of all KEGG pathways as groups with references to the contained reactions as groups:member
+1. Extraction of the KEGG reaction IDs from the annotations of your reactions
+2. Identification, in which KEGG pathways these reactions occur
+3. Addition of all KEGG pathways for a reaction with the biological qualifier ``OCCURS_IN`` to the annotations
+4. Addition of all KEGG pathways as groups with references to the contained reactions as ``groups:member``
 
 The only function that you will need to access is ``kegg_pathways``:
 

diff --git a/docs/source/modules/polish.rst b/docs/source/modules/polish.rst
@@ -1,7 +1,7 @@
 Polishing a CarveMe model
 =========================
 
-The newer version of CarveMe leads to some irritations in the model, the scripts in ``polish`` enable for example the addition of BiGG Ids to the annotations as well as a correct formatting of the annotations.
+CarveMe version 1.5.1 leads to some irritations in the model, the scripts in ``polish`` enable for example the addition of BiGG IDs to the annotations as well as a correct formatting of the annotations.
 
 .. warning:: 
     Using ``lab_strain=True`` has the following two requirements:

diff --git a/docs/source/modules/sboann.rst b/docs/source/modules/sboann.rst
@@ -3,7 +3,7 @@ SBOannotator with refineGEMs
 
 RefineGEMs offers access to the functionalities of `SBOannotator <https://github.com/draeger-lab/SBOannotator>`__\ :footcite:p:`Leonidou2023_sboann`. 
 
-The ``sboann`` module is splitted into a lot of small functions which are all annotated, however when using it for SBO-Term annotation it only makes sense to run the "main" function: 
+The ``sboann`` module is splitted into a lot of small functions which are all annotated, however when using it for SBO-Term annotation it only makes sense to run the function ``sbo_annotation``: 
 
 .. autofunction:: refinegems.sboann.sbo_annotation
     :noindex:
@@ -15,6 +15,6 @@ The ``sboann`` module is splitted into a lot of small functions which are all an
     model_sboann = rg.sboann.sbo_annotation(<path to your model>)
     rg.io.write_to_file(model_sboann, <path to modified model>)
 
-If you use it from the refineGEMs toolbox with the config you can get visualizations of SBO-Term distribution before and after SBO-Term updates.
+If you use it from the refineGEMs toolbox with the config you can get a visualization of the SBO-Term distribution before and after the SBO-Term update.
 
 .. footbibliography::
diff --git a/docs/source/pipeline.rst b/docs/source/pipeline.rst
@@ -5,16 +5,16 @@ Generating a model for an organism where no information on genes and proteins is
 causes the problem that the model will not contain valid database identifiers for any GeneProduct. To resolve this issue the 
 workflow in Figure :numref:`workflow` can be used.
 
-1. First annotate the genome with NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) to obtain the same FASTA format as used in NCBI.
-2. Then use diamond with the ``nr`` database from NCBI and the obtained annotated FASTA file as input. Restrict the search to your organism's taxon if known and use the flag for taxonomy checking.
+1. First annotate the genome with NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) to obtain the same FASTA format as used in NCBI and use the flag for taxonomy checking.
+2. Then use DIAMOND with the ``nr`` database from NCBI and the obtained annotated FASTA file as input. Restrict the search to your organism's taxon if known.
 3. Check if any protein in the annotation FASTA file still has no database identifier.
 
-    | -> YES: Rerun diamond without the taxonomy check and without the restriction for the organism's taxon.
+    | -> YES: Rerun DIAMOND without the taxonomy check and without the restriction for the organism's taxon.
     |
     | -> NO: Continue with step 4.
 
-4. Add the diamond result to the annotated FASTA file.
-5. Run e.g. ``CarveME`` to obtain a draft model.
+4. Add the DIAMOND result to the annotated FASTA file.
+5. Run e.g. ``CarveMe`` to obtain a draft model.
 6. Check if in the model any GeneProducts without NCBI Protein or RefSeq identifiers occur.
 
     | -> YES: 

diff --git a/docs/source/usage.rst b/docs/source/usage.rst
@@ -5,7 +5,7 @@ Usage as standalone application
 -------------------------------
 
 The script ``main.py`` can be used directly in the command line after
-entering the virtual environment with ``pipenv shell``.
+entering the virtual environment with ``pipenv shell`` or ``conda activate <EnvName>``.
 
 The ``config.yaml`` file contains defaults for all variables that need
 to be set by the user.
@@ -130,11 +130,11 @@ to be set by the user.
 
 The repository structure has the following intention: 
 
-* ``refineGEMs/`` contains all the functions needed in ``main.py`` 
-* ``data/`` contains all tables that are used by different parts of the script as well as a toy model ``e_coli_core.xml`` 
+* ``refinegems/`` contains all the functions needed in ``main.py`` 
+* ``data/`` contains all example tables that can be used as input for the curation scripts as well as the ``media_db.csv`` and a toy model ``e_coli_core.xml`` 
 * Instead of using the files given in ``data/``, you can use your own files and just change the paths in ``config.yaml``. Please be aware that some functions rely on input in a certain format so make sure to check the files given in the ``data/`` folder and use the same formatting. 
-* ``databases/`` contains the ``sql`` file as well as the ``db`` file necessary for the SBOAnn script by Elisabeth Fritze as well as the modules ``gapfill``, ``growth`` and ``modelseed``.
-* The ``setup.py`` and ``pyproject.toml`` enable creating a PyPi package called ``refineGEMs``.
+* ``refinegems/databases/`` contains the SQL Schema file for the media and ``sboann``-related tables as well as the ready-to-use database file necessary for the SBOAnn script by Elisabeth Fritze as well as the modules ``gapfill``, ``growth`` and ``modelseed``.
+* The ``setup.py`` and ``pyproject.toml`` enable creating a PyPI package called ``refineGEMs``.
 
 
 Usage as python module