Skip to content

Commit

Permalink
Merge branch 'master' of github.com:griffithlab/pVAC-Seq into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
susannasiebert committed Aug 28, 2017
2 parents 85cf52a + d7c9e0d commit a874380
Show file tree
Hide file tree
Showing 10 changed files with 251 additions and 19 deletions.
26 changes: 22 additions & 4 deletions docs/additional_commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@ To make using pVAC-Seq easier several convenience methods are included in the pa
Download Example Data
---------------------

.. argparse::
.. topic:: For usage instructions run

``pvacseq download_example_data --help``

.. .. argparse::
:module: lib.download_example_data
:func: define_parser
:prog: pvacseq download_example_data
Expand All @@ -18,23 +22,37 @@ Download Example Data
Install VEP Plugin
------------------

.. argparse::
.. topic:: For usage instructions run

``pvacseq install_vep_plugin --help``

.. .. argparse::
:module: lib.install_vep_plugin
:func: define_parser
:prog: pvacseq install_vep_plugin

.. _valid_alleles:

List Valid Alleles
------------------

.. argparse::
.. topic:: For usage instructions run

``pvacseq valid_alleles --help``

.. .. argparse::
:module: lib.valid_alleles
:func: define_parser
:prog: pvacseq valid_alleles

Documentation For Configuration Files
-------------------------------------

.. argparse::
.. topic:: For usage instructions run

``pvacseq config_files --help``

.. .. argparse::
:module: lib.config_files
:func: define_parser
:prog: pvacseq config_files
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.coverage',
'sphinxarg.ext',
#'sphinxarg.ext',
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
20 changes: 14 additions & 6 deletions docs/filter_commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,23 +10,31 @@ Both filters can also be run manually to narrow the final results down further.
Binding Filter
--------------

.. argparse::
.. topic:: For usage instructions run

``pvacseq binding_filter --help``

.. .. argparse::
:module: lib.binding_filter
:func: define_parser
:prog: pvacseq binding_filter

The binding filter filters out variants that don't pass the chosen binding threshold. The user can chose whether to apply this filter to the "lowest" or the "median" binding affinity score. The "lowest" binding affinity score is recorded in the "Best MT Score" column and represents the lowest ic50 score of all prediction algorithms that were picked during the previous pVAC-Seq run. The "median" binding affinity score is recorded in the "Median MT Score" column and corresponds to the median ic50 score of all prediction algorithms used to create the report.
The binding filter filters out variants that don't pass the chosen binding threshold. The user can chose whether to apply this filter to the "lowest" or the "median" binding affinity score. The "lowest" binding affinity score is recorded in the "Best MT Score" column and represents the lowest ic50 score of all prediction algorithms that were picked during the previous pVAC-Seq run. The "median" binding affinity score is recorded in the "Median MT Score" column and corresponds to the median ic50 score of all prediction algorithms used to create the report.

The binding filter also offers the option to filter on Fold Change columns, which contain the ratio of the MT score to the WT Score. If the binding filter is set to "best", the "Corresponding Fold Change" column will be used. ("Corresponding WT Score"/"Best MT Score"). If the binding filter is set to "median", the "Median Fold Change" column will be used ("Median WT Score"/"Median MT Score").
The binding filter also offers the option to filter on Fold Change columns, which contain the ratio of the MT score to the WT Score. If the binding filter is set to "best", the "Corresponding Fold Change" column will be used. ("Corresponding WT Score"/"Best MT Score"). If the binding filter is set to "median", the "Median Fold Change" column will be used ("Median WT Score"/"Median MT Score").

Coverage Filter
---------------

.. argparse::
.. topic:: For usage instructions run

``pvacseq coverage_filter --help``

.. .. argparse::
:module: lib.coverage_filter
:func: define_parser
:prog: pvacseq coverage_filter

If a pVAC-Seq process has been run with bam-readcount or Cufflinks input files then the coverage_filter can be run again on the final report file to narrow down the results even further.
If a pVAC-Seq process has been run with bam-readcount or Cufflinks input files then the coverage_filter can be run again on the final report file to narrow down the results even further.

If no additional coverage input files have been provided to the main pVAC-Seq run then this information would need to be manually added to the report in order to run this filter.
If no additional coverage input files have been provided to the main pVAC-Seq run then this information would need to be manually added to the report in order to run this filter.
167 changes: 167 additions & 0 deletions docs/frequently_asked_questions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
.. raw:: html

<style> .large {font-size: 110%; font-weight: bold} </style>
<style> .large-code {font-size: 110%; font-family: monospace} </style>


Frequently Asked Questions
==========================

.. role:: large
.. role:: large-code

:large:`My pVAC-Seq command has been running for a long time. Why is
that?`

The rate-limiting factor in running pVAC-Seq is the number of calls that are
made to the IEDB software for binding score predictions.

.. note::

It is generally faster to make IEDB calls using a local install of IEDB than
using the IEDB web API. It is, therefore, recommended to use a local IEDB
install for any in-depth analysis.

There are a number of factors that determine the number of IEDB calls to be made:

- Number of variants in your VCF

pVAC-Seq will make predictions for each missense, inframe indel, and
frameshift variant in your VCF.

**Speedup suggestion**: Split the VCF into smaller subsets and process each one
individually, in parallel.

- Number of transcripts for each variant

pVAC-Seq will make predictions for each transcript of a supported variant
individually. The number of transcripts for each variant depends on how VEP was
run when the VCF was annotated.

**Speedup suggestion**: Use the ``--pick`` option when running VEP to
annotate each variant with the top transcript only.

- The ``--fasta-size`` parameter value

pVAC-Seq takes an input VCF and creates a wildtype and a mutant
fasta for each transcript. The number of fasta entries that get submitted
to IEDB at a time is limited by the ``--fasta-size`` parameter in order
to reduce the load on the IEDB servers. The smaller the fasta-size, the
more calls have to be made to IEDB.

**Speedup suggestion**: When using a local IEDB install, increase the size
of this parameter.

- Number of prediction algorithms, epitope lengths, and HLA-alleles

One call to IEDB is made for each combination of these parameters for each chunk
of fasta sequences. That means, for example, when 7 prediction
algorithms, 4 epitope lengths, and 6 HLA-alleles are chosen, 7*4*6=168 calls to
IEDB have to be made for each chunk of fastas.

**Speedup suggestion**: Reduce the number of prediction algorithms,
epitope lengths, and/or HLA-alleles to the ones that will be the most
meaningful for your analysis. For example, the NetMHCcons method is
already a consensus method between NetMHC, NetMHCpan, and PickPocket.
If NetMHCcons is chosen, you may want to omit the underlying prediction
methods. Likewise, if you want to run NetMHC, NetMHCpan, and PickPocket
individually, you may want to skip NetMHCcons.

- ``--downstream-sequence-length`` parameter value

This parameter determines how many amino acids of the downstream sequence after a
frameshift mutation will be included in the wildtype fasta sequence. The
shorter the downstream sequence length, the lower the number of epitopes
that IEDB needs to make binding predictions for.

**Speedup suggestion**: Reduce the value of this parameter.

:large:`My pVAC-Seq output file does not contain entries for all of the
alleles I chose. Why is that?`

There could be a few reasoans why the pVAC-Seq output does not contain
predictions for alleles:

- The alleles you picked might've not been compatible with the prediction algorithm and/or epitope lengths chosen. In that case no calls for that allele would've been made and a status message would've printed to the screen.

- It could be that all epitope predictions for some alleles got filtered out. You can check the ``<sample_name>.combined.parsed.tsv`` file to see all called epitopes before filtering.

:large:`Why are some values in the` :large-code:`WT Epitope Seq` :large:`column` :large-code:`NA` :large:`?`

Not all mutant epitope sequences will have a corresponding wildtype epitope sequence. This
occurs when the mutant epitope sequence is novel and a comparison is therefore not
meaningful:

- An epitope in the downstream portion of a frameshift might not have a corresponding wildtype epitope at the same position at all. The epitope is completely novel.

- An epitope that overlaps an inframe indel or multinucleotide polymorphism (MNP) might have a large number of amino acids that are different from the wildtype epitope at the corresponding position. If less than half of the amino acids between the mutant epitope sequence and the corresponding wildtype sequence match, the corresponding wildtype sequence in the report is set to ``NA``.

:large:`What filters are applied during a pVAC-Seq run?`

By default we filter the neoepitopes on their binding score. If bam-readcount
files and/or cufflinks files are provided we also filter on the depth, VAF,
and FPKM. In addition, candidates where the mutant epitope sequence is the
same as the wildtype epitope sequence will also be filtered out.

:large:`How can I see all of the candidate epitopes without any filters
applied?`

The ``<sample_name>.combined.parsed.tsv`` will contain all of the epitopes predicted
before filters are applied.

:large:`Why have some of my epitopes been filtered out even though the` :large-code:`Best MT Score` :large:`is below 500?`

By default, the binding filter will be applied to the ``Median MT Score``
column. This is the median score value among all chosen prediction algorithms.
The ``Best MT Score`` column shows the lowest score among all
chosen prediction algorithms. To change this behavior and apply the binding
filter to the ``Best MT Score`` column you may set the ``--top-score-metric``
parameter to ``lowest``.

:large:`Why are entries with` :large-code:`NA` :large:`in the`
:large-code:`VAF` :large:`and` :large-code:`depth` :large:`columns not
filtered?`

We do not filter out ``NA`` entries for depth and VAF since there is not
enough information to determine whether the cutoff has been met one way or another.

:large:`Why don't some of my epitopes have score predictions for certain prediction methods?`

Not all prediction methods support all epitope lengths or all alleles. To see
a list of supported alleles for a prediction method you may use the
``pvacseq valid_alleles`` :ref:`command <valid_alleles>`. For more details on
each algorithm refer to the IEDB MHC `Class I <http://tools.iedb.org/mhci/help/#Method>`_
and `Class II <http://tools.iedb.org/mhcii/help/#Method>`_ documentation.


:large:`How do I use StringTie instead of Cufflinks for transcript/gene abundance
estimates?`

You may also provide FPKM values from other sources, including StringTie, by creating
`cufflinks-formatted input files
<http://cole-trapnell-lab.github.io/cufflinks/file_formats/#fpkm-tracking-format>`_.

**For transcript FPKM**: a tab-separated file with a ``tracking_id`` column
containing Ensembl transcript IDs and a ``FPKM`` column containing
FPKM values.

**For gene FPKM**: a tab-separated file with a ``tracking_id`` column
containing Ensembl gene IDs, a ``locus`` column describing the
region within the gene, and a ``FPKM`` column containing FPKM values. In the
pVAC-Seq pipeline the FPKM values will be summed for all loci of a gene. You
may also provide already summed FPKM values. In that case you will still need
to provide a ``locus`` column but the values in that column can be empty.

:large:`How is pVAC-Seq licensed?`

pVAC-Seq is licensed under `NPOSL-3.0
<http://opensource.org/licenses/NPOSL-3.0>`_.

:large:`How do I cite pVAC-Seq?`

Jasreet Hundal, Beatriz M. Carreno, Allegra A. Petti, Gerald P. Linette, Obi
L. Griffith, Elaine R. Mardis, and Malachi Griffith. `pVAC-Seq: A genome-guided
in silico approach to identifying tumor neoantigens <http://www.genomemedicine.com/content/8/1/11>`_. Genome Medicine. 2016,
8:11, DOI: 10.1186/s13073-016-0264-5. PMID: `26825632
<http://www.ncbi.nlm.nih.gov/pubmed/26825632>`_.

3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ pVAC-Seq
pVAC-Seq is a cancer immunotherapy pipeline for the identification of **p**\ ersonalized **V**\ ariant **A**\ ntigens by **C**\ ancer **Seq**\ uencing (pVAC-Seq) that integrates tumor mutation and expression data (DNA- and RNA-Seq). It enables cancer immunotherapy research by using massively parallel sequence data to predicting tumor-specific mutant peptides (neoantigens) that can elicit anti-tumor T cell immunity. It is being used in studies of checkpoint therapy response and to identify targets for cancer vaccines and adoptive T cell therapies. For more general information, see the `manuscript published in Genome Medicine <http://www.genomemedicine.com/content/8/1/11>`_.

.. toctree::
:maxdepth: 3
:maxdepth: 2

features
install
Expand All @@ -18,6 +18,7 @@ pVAC-Seq is a cancer immunotherapy pipeline for the identification of **p**\ ers
filter_commands
additional_commands
optional_downstream_analysis_tools
frequently_asked_questions
contact

New in version |version|
Expand Down
9 changes: 8 additions & 1 deletion docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,14 @@ ____________
tar -zxvf IEDB_MHC_II-2.16.tar.gz
cd mhc_ii
./configure.py
Open the ``configure.py`` file and update the lines that set the ``smm`` and ``nn`` variables to use relative paths like so:

.. code-block:: none
smm = re.compile(curDir + "/netMHCII-1.1")
nn = re.compile(curDir + "/netMHCII-2.2")
.. note::

Running the ``configure`` script requires a Python 2 environment. If you are currently emulating a Python 3 environment with Conda you will need to run ``source deactivate`` before executing the ``configure`` script.
Expand Down
6 changes: 5 additions & 1 deletion docs/optional_downstream_analysis_tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,11 @@ Optional Downstream Analysis Tools
Generate Protein Fasta
----------------------

.. argparse::
.. topic:: For usage instructions run

``pvacseq generate_protein_fasta --help``

.. .. argparse::
:module: lib.generate_protein_fasta
:func: define_parser
:prog: pvacseq generate_protein_fasta
23 changes: 23 additions & 0 deletions docs/prerequisites.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@ To create a VCF for use with pVAC-Seq follow these steps:
The ``--dir_plugins <VEP_plugins directory>`` option may need to be set depending on where the VEP_plugins were installed to.

The ``--pick`` option might be useful to limit the annotation to the top
transcripts. Otherwise, VEP will annotate each variant with all possible
transcripts. pVAC-Seq will provide predictions for all transcripts in the VEP
CSQ field. Running VEP without the ``--pick`` option can therefor drasticly
increase the runtime of pVAC-Seq.

Additional VEP options that might be desired can be found `here <http://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html>`_.

**Example VEP Command**
Expand All @@ -45,6 +51,9 @@ Additional VEP options that might be desired can be found `here <http://useast.e
Optional Preprocessing
----------------------

Coverage and Expression Data
############################

Coverage and expression data can be added to the pVAC-Seq processing by providing bam-readcount and/or Cufflinks output files as additional input files. These additional input files must be provided as a yaml file in the following structure:

.. code-block:: none
Expand Down Expand Up @@ -91,3 +100,17 @@ Installation instructions for Cufflinks can be found on their `GitHub page <http
.. code-block:: none
cufflinks <sam_file>
You may also provide FPKM values from other sources by creating
cufflinks-formatted input files.

**For transcript FPKM**: a tab-separated file with a ``tracking_id`` column
containing Ensembl transcript IDs and a ``FPKM`` column containing
FPKM values.

**For gene FPKM**: a tab-separated file with a ``tracking_id`` column
containing Ensembl gene IDs, a ``locus`` column describing the
region within the gene, and a ``FPKM`` column containing FPKM values. In the
pVAC-Seq pipeline the FPKM values will be summed for all loci of a gene. You
may also provide already summed FPKM values. In that case you will still need
to provide a ``locus`` column but the values in that column can be empty.
12 changes: 8 additions & 4 deletions docs/run.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@ Usage
prediction algorithms. More information on how to install IEDB locally can
be found on the :ref:`Installation <iedb_install>` page.

.. argparse::
:module: lib.main
:func: define_parser
:prog: pvacseq run
.. topic:: For usage instructions run

``pvacseq run --help``

.. .. argparse::
:module: lib.main
:func: define_parser
:prog: pvacseq run
2 changes: 1 addition & 1 deletion pvacseq/pvacseq.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pkg_resources
try:
from . import lib
except SystemError:
except (SystemError, ImportError):
import lib

def main():
Expand Down

0 comments on commit a874380

Please sign in to comment.