From a078bba1e2cdbb3fc1d9f623502823ac9e4488a9 Mon Sep 17 00:00:00 2001 From: Eric Date: Mon, 4 Nov 2024 11:14:39 -0600 Subject: [PATCH] First pass of updating Mkdocs website. --- CHANGELOG.md | 60 +++++----- docs/examples.md | 8 +- docs/extra.css | 23 ---- docs/format.md | 18 +-- docs/{neptune.ico => img/favicon.ico} | Bin docs/index.md | 10 +- docs/install.md | 154 +++++++------------------- docs/news.md | 8 +- docs/output.md | 16 +-- docs/parameters.md | 14 +-- docs/walkthrough.md | 16 +-- install/debian_dependencies.sh | 7 -- install/neptune | 6 - mkdocs.yml | 5 +- 14 files changed, 119 insertions(+), 226 deletions(-) rename docs/{neptune.ico => img/favicon.ico} (100%) delete mode 100755 install/debian_dependencies.sh delete mode 100755 install/neptune diff --git a/CHANGELOG.md b/CHANGELOG.md index ce7f9ad..7096977 100755 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,162 +2,162 @@ All notable changes to Neptune will be documented in this file. -## 2.0.0 ## +## 2.0.0 2024-10-21 This release updates Neptune to Python3, removes DRMAA support, fixes a crash when no signatures are produced, and updates the installation process. -### Changed ### +### Changed - Python3 has replaced Python2. - Improved and updated the installation process. -### Fixed ### +### Fixed - Fixed a crash that occurred when candidate signatures were of such low quality (as a consequence of ambiguous sequence characters) that these regions could not be aligned with themselves using BLAST. -### Removed ### +### Removed - DRMAA support. -## 1.2.5 ## +## 1.2.5 2017-05-03 This release provides fixes for ambiguous crashes and improvements to the code quality. -### Changed ### +### Changed - We have made an effort to improve the readability of function comments in the source code. -### Fixed ### +### Fixed - When running Neptune in parallel (non-DRMAA mode), runtime errors in forked jobs now correctly inform the calling process instead of hanging forever with no meaningful error message. Additionally, the runtime error message is reported to the user. This relates the a known error in Python 2.7 (https://bugs.python.org/issue9400). - Inputs containing no A, C, G, or T characters will now cause an appropriate runtime error with an informative message about this problem. - Lowercase characters are no longer ingnored when calculating the GC content of inputs. -## 1.2.4 ## +## 1.2.4 2017-02-27 This release makes several small improvements, including: reducing the standard output clutter, adding timings to stages, and updating the documentation. -### Added ### +### Added - Links in the README to the manual. - Walkthrough to the manual. - Example data to test the software. - Timings for stages. -### Changed ### +### Changed - Improved clarity in manual. - Codeblocks in the manual. -### Removed ### +### Removed - Considerable clutter has been removed from standard output. -## 1.2.3 ## +## 1.2.3 2016-07-11 This release simplifies the installation process. -### Added ### +### Added - A script for automatically installing Debian dependencies. -### Changed ### +### Changed - The dependencies have changed. Several are now installed as part of Neptune. - The Neptune installation no longer requires security privilages. - Neptune may be installed multiple times in multiple locations. - NumPy and SciPy are now installed using pip. -### 1.2.2 ### +## 1.2.2 2016-04-06 This release includes some Galaxy improvements and fixes a signature scoring problem. -### Changed ### +### Changed - Galaxy XML files have been updated to use different packages of Python. -### Fixed ### +### Fixed - A bug confusing inclusion and exclusion has been fixed. -## 1.2.1 ## +## 1.2.1 2016-03-23 This release of Neptune adds support for Galaxy. -### Added ### +### Added - Galaxy-related files: capsules, XML files. -### Changed ### +### Changed - Neptune.py and Execution.py are now compatible with Galaxy. -## 1.2.0 ## +## 1.2.0 2016-03-18 This release of Neptune allows for execution on a single machine without requiring DRMAA. Furthermore, several command line parameters have been modified. -### Added ### +### Added - Neptune may be run in parallel on a single machine without DRMAA. - "--version" command line option. -### Changed ### +### Changed - Several command-line parameters have been changed. - The "--parallelization" / "-p" parameter effects all parallelization. - The exclusion score is now displayed as a positive number. -## 1.1.1 ## +## 1.1.1 2016-02-24 This release of Neptune updates the installation instructions to be more informative. -### Changed ### +### Changed - Updated README and manual installation instructions. - Modified the style of code examples in the manual. -## 1.1.0 ## +## 1.1.0 2016-01-19 This release of Neptune introduces a simple signature consolidation step, which consolidates signatures produced from multiple files into a single file. Furthermore, the software has been updated to be compatible with the Slurm scheduler. -### Added ### +### Added - Neptune now automatically consolidates signatures into a single file. - DRMAA job names. - Neptune now maintains DRMAA log files. - Added the ability to specify the BLAST seed size. -### Changed ### +### Changed - The run receipt has been reorganized. - Removed some unneeded output files. - Removed some unneeded print statements. - Removed the --verbose parameter. There was no functionality. -### Fixed ### +### Fixed - Neptune is now compatible with the Slurm scheduler. - Updated PEP8/Flake8 code compliance (W503). -## 1.0.0 ## +## 1.0.0 2015-11-18 diff --git a/docs/examples.md b/docs/examples.md index 152064a..130bc12 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -1,6 +1,6 @@ -# Examples # +# Examples -## Basic Execution ## +## Basic Execution The following basic example will report all of the signatures that are sufficiently shared by the (FASTA) sequences in the inclusion directory and sufficiently absent from the (FASTA) sequences in the exclusion directory. Neptune will automatically calculate many of the parameters used in this execution. @@ -17,7 +17,7 @@ The output of immediate interest will be located in the follow file: This file will contain a consolidated list of signatures, sorted by their Neptune score, which is a combined estimate of sensitivity and specificity. The signatures with higher scores, near the top of the file, are considered the most discriminatory signatures. -## Faster Execution ## +## Faster Execution The following example highlights options that allow Neptune to run faster when running in parallel mode (default). It will attempt to run Neptune on 16 parallel processes (`--parallelization`) and parallelize *k*-mer counting and aggregation into 64 tasks (`--organization`) distributed over the 16 parallel processes available. @@ -30,7 +30,7 @@ neptune --organization 3 ``` -## Specifying File Locations ## +## Specifying File Locations You may wish to specify particular files used in signature discovery. This may be important when specifying references for signature extraction: diff --git a/docs/extra.css b/docs/extra.css index 3f00bf9..e69de29 100644 --- a/docs/extra.css +++ b/docs/extra.css @@ -1,23 +0,0 @@ -.bs-sidenav { - font-size: 16px; -} - -.navbar { - font-size: 16px; -} - -.navbar-default { - background-color: #343838; -} - -.navbar .dropdown-menu>li>a, .navbar .dropdown-menu>li>a:focus { - font-size: 14px; -} - -table { - font-size: 13px; -} - -body { - font-size: 14px; -} diff --git a/docs/format.md b/docs/format.md index 44639e6..11de7aa 100644 --- a/docs/format.md +++ b/docs/format.md @@ -1,4 +1,4 @@ -# Signature Format # +# Signature Format The signatures produced by Neptune are output in FASTA format with additional information in the description line. Signatures are output in the following format: @@ -27,35 +27,35 @@ Where: | [POS] | Position | The starting position of the signature in the reference. | | [SEQUENCE] | Sequence | The sequence content of the signature. | -## ID ## +## ID The signature ID is an __arbitrary__, run-unique ID assigned to the signature. The signatures within the same FASTA file will have unique IDs, relative to each other. However, signatures within multiple output files will have overlapping signature IDs. This will be the case when using multiple references or not specifying any reference files. The signatures within the `consolidated.fasta` output will have unique signature IDs. -## Total Score ## +## Total Score Signatures are assigned a score corresponding to their highest-scoring BLAST alignments with all inclusion and exclusion targets, which is a sum of the positive inclusion score (sensitivity) and the negative exclusion component (specificity). This score is maximized when all inclusion targets contain a region exactly matching the entire signature and there exists no exclusion targets that match the signature. -## Inclusion Score ## +## Inclusion Score The inclusion score is a non-negative number between 0.00 and 1.00 and relates to the signature's sensitivity. This score is determined by the signature's highest-scoring BLAST alignments with all inclusion targets. The inclusion score is maximized (good) when the signature is found exactly and completely in all inclusion targets and minimized (bad) when the signature is not found whatsoever in any inclusion targets. -## Exclusion Score ## +## Exclusion Score The exclusion score is a non-positive number between -1.00 and 0.00 and relates to the signature's specificity. This score is determined by the signature's highest-scoring BLAST alignments with all exclusion targets. The exclusion score is maximized (bad) when the signature is found exactly and completely in all exclusion targets and minimized (good) when the signature is not found whatsoever in any exclusion targets. -## Length ## +## Length The length describes the length of the signature in bases. Although this can be calculated from the sequence, it is included in the FASTA description to accommodate other tools. -## Reference ## +## Reference The reference describes the sequence identifier of the contig the signature was extracted from. This is useful for determining where the signature lies and what sequence surrounds it. -## Position ## +## Position The position describes the base position of the signature within the contig reference it was extracted from. This is useful for determining where the signature lies and what sequence surrounds it. -## Sequence ## +## Sequence The sequence describes the sequence content of the signature and follows the specifications of FASTA format. However, the sequence will not contain line breaks, regardless of the sequence length. diff --git a/docs/neptune.ico b/docs/img/favicon.ico similarity index 100% rename from docs/neptune.ico rename to docs/img/favicon.ico diff --git a/docs/index.md b/docs/index.md index 41ec1eb..1da69de 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,24 +1,22 @@ -# Neptune # +# Neptune A genomic signature is a genomic locus that is sufficiently represented in an inclusion group, and sufficiently absent from a background, or exclusion group. A signature might correlate genomic features with phenotypic traits, such as the presence of a gene with increased organism pathogenicity. Neptune locates genomic signatures using an exact *k*-mer matching strategy while accommodating *k*-mer mismatches. The software identifies sequences that are sufficiently represented within inclusion targets and sufficiently absent from exclusion targets. The signature discovery process is accomplished using probabilistic models instead of heuristic strategies. Neptune may be leveraged to reveal discriminatory signature sequences to uniquely delineate one group of organisms, such as isolates associated with a disease cluster or event, from unrelated sporadic or environmental microbes. -## Release ## - -## Neptune 2.0.0 ## +## Neptune v2.0.0 2024-10-21 This release updates Neptune to Python3, removes DRMAA support, fixes a crash when no signatures are produced, and updates the installation process. -## Resources ## +## Resources * **Source**: [https://github.com/phac-nml/neptune](https://github.com/phac-nml/neptune) * **Installation**: [https://phac-nml.github.io/neptune/install/](https://phac-nml.github.io/neptune/install/) * **Walkthrough**: [https://phac-nml.github.io/neptune/walkthrough/](https://phac-nml.github.io/neptune/walkthrough/) -## Contact ## +## Contact * **Eric Marinier**: eric.marinier@phac-aspc.gc.ca * **Gary van Domselaar**: gary.vandomselaar@phac-aspc.gc.ca diff --git a/docs/install.md b/docs/install.md index 80e63f4..b5ef645 100755 --- a/docs/install.md +++ b/docs/install.md @@ -1,147 +1,79 @@ -# Installation # +# Installation -This installation guide assumes the use of the [BASH](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) Unix shell and a 64-bit Linux system. Neptune may either be installed directly or as a [Bioconda](https://bioconda.github.io/) package. +This installation guide assumes the use of the [BASH](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) Unix shell and a 64-bit Linux system. -## Bioconda ## +## Direct -The simpliest way to install Neptune is using the [Bioconda](https://bioconda.github.io/) channel for the [conda](https://conda.io/docs/intro.html) package management system. We recommend installing conda with the [Miniconda](https://conda.io/miniconda.html) package. There are two variants of the Miniconda installer: Python 2.7 and Python 3.6. However, the choice of Miniconda only affects the Python version in root environment. We recommend installing Miniconda using Python 2.7 64-bit Linux [installer](https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh). Otherwise, you will need to explicitly use Neptune within a [conda environment](https://conda.io/docs/using/envs.html). +The following instructions describe how to install Neptune directly. These instructions may require administrative privilages. Directly installing Neptune from the source files involves the following: -### Overview ### + 1. Installing Python>=3.10 + 2. Installing pip + 3. Installing BLAST (aptitude: `sudo apt-get install ncbi-blast+`) + 4. Installing Neptune (`pip install .`) -The Bioconda-based Neptune installation involves the following: +More detailed instructions are provided below. - 1. Installing [Bioconda](https://bioconda.github.io/) - 2. Installing the "neptune" Bioconda package (`conda install neptune`). +### Python 3 -We provide detailed instructions below. +Ensure your version of Python is compatible (python>=3.10): -### Miniconda (Python 2.7) ### +`python --version` -[Bioconda](https://bioconda.github.io/) requires conda to be installed and we recommend using the [Miniconda](https://conda.io/miniconda.html) package. Miniconda may be installed with the follow instructions: +You may wish to use [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) to create an environment specifically for this purpose: -```bash -wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -chmod 755 Miniconda2-latest-Linux-x86_64.sh -./Miniconda2-latest-Linux-x86_64.sh -``` +`conda create --name neptune 'python>=3.10'` +`conda activate neptune` -You will likely want Miniconda to append the install location to your PATH and will need to select this option during the installation process. After installation, you will then need to either open a new terminal or source your bashrc file in the current terminal for Miniconda to become available on the PATH: +### pip -```bash -source ~/.bashrc -``` +Ensure you can run pip: -You can check if your Miniconda installation was successful with the following: +`pip --version` or `python -m pip --version` -```bash -conda --version -``` +If pip is unavailable, please refer to [these instructions](https://packaging.python.org/en/latest/tutorials/installing-packages/) on how to install pip. -### Bioconda ### +### BLAST -You will need to add the following channels to conda. They must be added in this order so that priority is set correctly. +Neptune requires BLAST to be manually installed and made available as a command-line program: -```bash -conda config --add channels conda-forge -conda config --add channels defaults -conda config --add channels r -conda config --add channels bioconda -``` +`sudo apt-get install ncbi-blast+` -### Neptune (Miniconda 2.7) ### +You can verify BLAST was installed by ensuring the follow commands are available: -The following instructions assume you are using Miniconda with Python 2.7, as described above. After enabling Bioconda, Neptune may be installed as a Bioconda package with the following: +`makeblastdb -h` +`blastn -h` -```bash -conda install neptune -``` +### Neptune and Dependencies -You can check if Neptune was installed correctly with the following: +After downloading Neptune's source files, you can install Neptune and all of its pip dependencies with the following: -```bash -neptune --version -``` +`pip install /path/to/neptune_directory/` or `pip install .` -### Neptune (Miniconda 3.6) ### +**CAUTION**: If you attempt `pip install neptune` (not interpreted as a file path), then you'll download a different package that's also named "neptune" that's available directly from pip. -The follwing instructions assume you are using the Python 3.6 version of Miniconda. In this circumstance, we need to install Neptune within a Python 2.7 environment: +The following packages and their dependencies will be installed: -```bash -conda create --name neptune python=2.7 neptune -``` - -This Neptune environment can be activated with the following: +- numpy +- scipy +- biopython +- neptune -```bash -source activate neptune -``` +You can verify the installation was successful with the following: -You can check if Neptune was installed correctly with the following: +`neptune --version` -```bash -neptune --version -``` - -The current environment may be deactivated with the following: - -```bash -source deactivate -``` +And you can test the installation with simple test inputs with the following: -It is important to note that this Neptune Bioconda environment will need to be activated in order to run the Neptune application. However, the benefit is that your system will be shielded from the Python 2.7 installation required by Neptune. +`neptune -i tests/data/example/inclusion/ -e tests/data/example/exclusion/ -o output` -## Direct ## +## Bioconda -The following instructions describe how to install Neptune directly. These instructions will likely require administrative privilages. - -### Overview ### - -The direct Neptune installation involves the following: - - 1. Installing Python 2.7 - 2. Installing dependencies (Ubuntu: `sudo neptune/install/debian_dependencies.sh`) - 3. Installing Neptune (`neptune/INSTALL.sh`) - -We provide more detailed instructions below. - -### Python ### - -Neptune requires Python 2.7. Note that Python 2.7 is provided with many major distributions of Linux. The following may check your Python version: - -```bash -python --version -``` - -### Dependencies ### - -#### Debian-Based Installation #### - -This section assumes the user has the [APT](https://help.ubuntu.com/community/AptGet/Howto) package manager. This is common to the [Ubuntu](https://en.wikipedia.org/wiki/Ubuntu_(operating_system)) operating system. However, this section should be compatible with any 64-bit Debian distribution. The following operation will automatically install Neptune's dependencies and require security privileges (sudo) to install the dependencies: +Currently, there is no [Bioconda](https://bioconda.github.io/) package for Neptune v2.0.0. If you wish to install Neptune v1.2.5 with Bioconda, please use the following command to create an environment with Neptune installed: ```bash -sudo neptune/install/debian_dependencies.sh +conda create -n neptune bioconda::neptune -c conda-forge -c default ``` -#### Manual Installation #### - -If you cannot install the dependencies using the above script, the following dependencies must be manually installed, if necessary, by the user: - -* pip -* virtualenv -* build-essential -* python-dev -* NCBI BLAST+ +Please note that specifying `bioconda::neptune` is necessary, because otherwise Conda is likely to resolve the name to different software that's also named `neptune`. -### Neptune ### - -Neptune will be installed using pip into its own Python virtual environment. The following will install Neptune locally into the source directory and will not require security privileges: - -```bash -neptune/INSTALL.sh -``` - -Alternatively, you may specify an install location, PREFIX, such as /usr/local/. Neptune will create the directories PREFIX/lib and PREFIX/bin. This may require security privileges: - -```bash -neptune/INSTALL.sh PREFIX -``` +You may also wish to review an [older version](https://github.com/phac-nml/neptune/blob/release/1.2/docs/install.md) of these installation instructions for installing Neptune v1.2.5. diff --git a/docs/news.md b/docs/news.md index 6941f69..37aea84 100755 --- a/docs/news.md +++ b/docs/news.md @@ -1,12 +1,12 @@ -# News # +# News -## 2.0.0 ## +## Version 2.0.0 2024-10-21 This release updates Neptune to Python3, removes DRMAA support, fixes a crash when no signatures are produced, and updates the installation process. -## Version 1.2.5 ## +## Version 1.2.5 2017-05-04 @@ -18,7 +18,7 @@ Neptune version 1.2.5 has been released to GitHub, Bioconda, and Galaxy: Please see the [Change Log](CHANGELOG) for additional information. -## Bioconda Installation ## +## Bioconda Installation 2017-04-07 diff --git a/docs/output.md b/docs/output.md index 2c1c6b3..0a4f373 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,4 +1,4 @@ -# Output # +# Output Neptune's output directory contains the following items: @@ -14,30 +14,30 @@ Neptune's output directory contains the following items: A file with the same name as each reference will be placed in each output directory (candidates, filtered, sorted), corresponding to the reference file from which it was derived. -## Candidates ## +## Candidates The candidate signatures are the sequences produced from the signature extraction step. These signatures will relatively sensitive, but not necessarily specific. This is because signature extraction is done using exact *k*-mer matches. The candidate signatures are guaranteed to contain no more exact matches with any exclusion *k*-mer than specified by the `--exhits` parameter. However, there may be inexact matches with exclusion targets. -## Filtered ## +## Filtered The filtering step is designed to remove signatures which are not interesting enough to warrant further investigation, because the negative component of their score is prohibitively large. The filtering step removes signatures that align sufficiently with any exclusion target. The filtered signatures are a subset of the candidate signatures. -## Sorted ## +## Sorted The sorted signatures files are organized as FASTA records containing the same signatures as their filtered signatures counterparts. However, the signatures are listed in descending order by their signature score. Signatures are assigned a score corresponding to their highest-scoring BLAST alignments with all inclusion and exclusion targets, which is a sum of a positive inclusion component and a negative exclusion component. This score is maximized when all inclusion targets contain a region exactly matching the entire signature and there exists no exclusion targets that match the signature. -## Consolidated ## +## Consolidated The sorted signatures from all references are combined into a single "consolidated.fasta" file, located within the "consolidated" directory. Signatures are added to the consolidated signatures file in a greedy manner by selecting the next highest scoring signature available from all references. While effort is taken to prevent signatures from overlapping entirely, it is possible for consolidate signatures to have a small amount of overlap. In many circumstances, this output might be considered the final output of Neptune. -## Databases ## +## Databases The databases directory contains BLAST databases constructed from the inclusion and exclusion files. -## Aggregate k-mers ## +## Aggregate k-mers The aggregated *k*-mers file, aggregated.kmers, contains a list of all *k*-mers observed in the inclusion and exclusion groups. These *k*-mers are sorted and followed by two integers: the number of inclusion and exclusion targets the *k*-mer appears in, respectively. -## Run Receipt ## +## Run Receipt The run receipt contains information about the Neptune execution. It contains a list of all the files in the inclusion and exclusion group, and the command line parameters used for the execution. diff --git a/docs/parameters.md b/docs/parameters.md index 03c6e7f..03a1201 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -1,4 +1,4 @@ -# Parameters # +# Parameters A help message may be viewed by running: @@ -7,7 +7,7 @@ A help message may be viewed by running: neptune --help ``` -## Mandatory ## +## Mandatory Neptune requires the location of the inclusion, exclusion, and output directories. The remaining parameters will be estimated based on the input sequence or revert to default settings. The following is the minimum number of command line parameters required to run Neptune: @@ -26,11 +26,11 @@ The following parameters are required by Neptune: | -e | --exclusion | FASTA | A list of exclusion targets in FASTA format. You may list multiple file or directory locations following the parameter. Neptune will automatically include all files within directories. However, Neptune will not recurse into additional directories. | | -o | --output | directory | The location of the output directory. If this directory exists, any files produced with existing names will be overwritten. If this directory does not exist, then it will be created. | -## Optional ## +## Optional The optional parameters will either be automatically calculated or be assigned default values. -### *k*-mer ### +### *k*-mer The following parameters relate to *k*-mer generation and aggregation: @@ -39,7 +39,7 @@ The following parameters relate to *k*-mer generation and aggregation: | -k | --kmer | integer | The size of the *k*-mers. This must be a positive integer and should be large enough such that random intra-genome *k*-mer matches, within the largest genome, are unexpected. The size of *k*-mers cannot be larger than the smallest sequence record. This will be automatically calculated if not specified. | | | --organization | integer | The degree of organization of *k*-mer counting and aggregation. This parameter determines the number nucleotide bases used in parallelized *k*-mer counting and, in turn, the number of parallel instances of *k*-mer aggregation. The number of parallel instances is determined by 4^n, where n is the specified organization parameter. This value must be a non-negative integer smaller than *k*. If the parameter is not specified, then n = 0 and there will be no parallel *k*-mer aggregation. This will likely require a much longer computation time to complete *k*-mer aggregation. | -### Filtering ### +### Filtering The following command-line parameters relate to signature filtering: @@ -49,7 +49,7 @@ The following command-line parameters relate to signature filtering: | | --filter-percent | float | The minimum percent identity of a signature candidate against a exclusion target required to filter out the candidate. The percent identity is calculated as identities divided by the alignment length. This value is a percentage expressed as a floating point number [0.0, 1.0]. If the any exclusion hit exceeds the percent length **and** percent identity of any candidate, the candidate is removed. The default value is 0.5. | | | --seed-size | integer | The seed size used for alignments. This value must be no smaller than 4. The default value is 11. | -### Extraction ### +### Extraction The following command-line parameters relate to signature extraction: @@ -64,7 +64,7 @@ The following command-line parameters relate to signature extraction: | | --gap | int | The maximum allowable number of base positions shifted before seeing an exact *k*-mer match. If this value is not specified, it will be automatically calculated using the rate, GC-content, and the *k*-mer size. The calculation can be found in the *Mathematics* documentation. This value must be a positive integer. | | | --size | int | The minimum size for a signature. Signatures which are shorter than this length will not be reported. If this value is not specified, the minimum signature size will be four times the length of the *k*-mer size. It is not recommended to locate signatures smaller than this size, unless application-specific. This value must be a positive integer. | -### Parallelization ### +### Parallelization The following parameters relate to the parallelization of Neptune: diff --git a/docs/walkthrough.md b/docs/walkthrough.md index b45ada2..be34522 100644 --- a/docs/walkthrough.md +++ b/docs/walkthrough.md @@ -1,10 +1,10 @@ -# Walkthrough # +# Walkthrough -## Overview ## +## Overview The purpose of this walkthrough will be to illustrate a simple, but complete example of using Neptune to locate discriminatory sequences. We will identity signature sequences within an artificial data set containing three inclusion sequences and three exclusion sequences. The output will be a list of signatures, sorted by score, for each inclusion target, and one consolidated signatures file, sorted by signature score, containing signatures from all inclusion targets. -## Input Data ## +## Input Data We will be using very small, artificial genomes for this walkthrough. However, these small genomes will be sufficient to illustrate the operation of Neptune. The artificial genome sequence content is derived from *Escherichia coli* and has been modified to introduce simple variation between genomes. @@ -22,7 +22,7 @@ neptune/tests/data/example/exclusion/ The inclusion and exclusion directories each contain three FASTA format genomes. The genomes all have some insertions and deletions that differentiate them from each other. However, the three inclusion genomes primarily differ from the three exclusion genomes in that they share large sequences that are absent from all exclusion genomes. -## Running Neptune ## +## Running Neptune Neptune will automatically calculate many of the parameters that might otherwise be specified by the user, such as the minimum number of targets signature sequence must be present within for it to be considered shared sequence. At minimum, Neptune requires the user specify the inclusion sequences, exclusion sequences, and an output directory. We will provide Neptune inclusion and exclusion sequences in the form of FASTA file genomes located within directories. The following command will run Neptune on the example data and output to the specified directory: @@ -33,9 +33,9 @@ neptune --output output/ ``` -## Output ## +## Output -### Standard Output ### +### Standard Output After running Neptune, very similar output will be printed to standard output, indicating that Neptune is starting and completing different stages of operation: @@ -69,7 +69,7 @@ Submitted 1 jobs. Complete! ``` -### Consolidated Signatures ### +### Consolidated Signatures As we did not specify references from which to extract signatures, Neptune will automatically investigate all inclusion genomes for signatures and consolidate those signatures into a single consolidated signature file. The `output/consolidated/consolidated.fasta` file contains these consolidated signatures. This file may be understood as the final output of the application. The following FASTA output is from the consolidated signatures file produced from this example: @@ -86,7 +86,7 @@ The FASTA header contains information relavent to the identified signature. A de In this example, Neptune identified three signatures: 1.0, 1.1, and 1.2 of lengths 103, 640, and 98, respectively. We see that all of these signatures originated from the inclusion1 reference. These signatures were located at positions 99, 3497, and 5209 within the inclusion1 reference. These signatures are of very high quality, within the context of our data set, with scores of 1.0000, 0.9979, and 0.9969, within the possible range of score values from -1.00 to +1.00. -### Sorted Signatures ### +### Sorted Signatures If we're interested in looking at the signatures produced from each individual inclusion target, we need to investigate the output in the `output/sorted` directory. The following are the signatures extracted exclusively from the inclusion1.fasta target: diff --git a/install/debian_dependencies.sh b/install/debian_dependencies.sh deleted file mode 100755 index c604293..0000000 --- a/install/debian_dependencies.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -apt-get --yes --force-yes install python-pip -apt-get --yes --force-yes install python-virtualenv -apt-get --yes --force-yes install build-essential -apt-get --yes --force-yes install python-dev -apt-get --yes --force-yes install ncbi-blast+ diff --git a/install/neptune b/install/neptune deleted file mode 100755 index a9057b8..0000000 --- a/install/neptune +++ /dev/null @@ -1,6 +0,0 @@ -#!/bin/bash - -DIR=`dirname $0` - -. $DIR/../lib/neptune/bin/activate -neptune "$@" diff --git a/mkdocs.yml b/mkdocs.yml index 4cea217..a086dfc 100755 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,11 +1,10 @@ site_name: Neptune -theme: cinder +theme: readthedocs repo_url: "https://github.com/phac-nml/neptune" -site_favicon: neptune.ico copyright: "Neptune is licensed under the Apache License Version 2.0
Copyright Government of Canada 2015-2024" -pages: +nav: - Home: index.md - News: - news.md