Skip to content

Commit

Permalink
Merge pull request #69 from sbslee/0.37.0-dev
Browse files Browse the repository at this point in the history
0.37.0 dev
  • Loading branch information
sbslee authored Sep 9, 2023
2 parents 40627bc + 6e022af commit 4b84de8
Show file tree
Hide file tree
Showing 9 changed files with 422 additions and 48 deletions.
7 changes: 6 additions & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@
# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.7"

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
Expand All @@ -15,6 +21,5 @@ sphinx:

# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- requirements: docs/requirements.txt
9 changes: 9 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
Changelog
*********

0.37.0 (2023-09-09)
-------------------

* :issue:`67`: Fix bug in :meth:`pymaf.MafFrame.plot_waterfall` method where ``count=1`` was causing color mismatch.
* Add new submodule ``pychip``.
* Add new method :meth:`common.reverse_complement`.
* Fix bug in :meth:`common.extract_sequence` method where a long DNA sequence output was truncated.
* :issue:`68`: Refresh the variant consequences database from Ensembl VEP. The database's latest update was on May 31, 2021.

0.36.0 (2022-08-12)
-------------------

Expand Down
9 changes: 6 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,6 @@ README
.. image:: https://anaconda.org/bioconda/fuc/badges/downloads.svg
:target: https://anaconda.org/bioconda/fuc/files

.. image:: https://anaconda.org/bioconda/fuc/badges/installer/conda.svg
:target: https://conda.anaconda.org/bioconda

Introduction
============

Expand Down Expand Up @@ -65,6 +62,11 @@ and cite the following article:

Lee et al., 2022. `ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation <https://doi.org/10.1371/journal.pone.0272129>`__. PLOS ONE.

Support fuc
===========

If you find my work useful, please consider becoming a `sponsor <https://github.com/sponsors/sbslee>`__.

Installation
============

Expand Down Expand Up @@ -183,6 +185,7 @@ Below is the list of submodules available in the fuc API:
- **common** : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.
- **pybam** : The pybam submodule is designed for working with sequence alignment files (SAM/BAM/CRAM). It essentially wraps the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation. If you are mainly interested in working with depth of coverage data, please check out the pycov submodule which is specifically designed for the task.
- **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
- **pychip** : The pychip submodule is designed for working with annotation or manifest files from the Axiom (Thermo Fisher Scientific) and Infinium (Illumina) array platforms.
- **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation. The ``pycov.CovFrame`` class also contains many useful plotting methods such as ``CovFrame.plot_region`` and ``CovFrame.plot_uniformity``.
- **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
- **pygff** : The pygff submodule is designed for working with GFF/GTF files. It implements ``pygff.GffFrame`` which stores GFF/GTF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `GFF specification <https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md>`_.
Expand Down
7 changes: 7 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Below is the list of submodules available in the fuc API:
- **common** : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.
- **pybam** : The pybam submodule is designed for working with sequence alignment files (SAM/BAM/CRAM). It essentially wraps the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation. If you are mainly interested in working with depth of coverage data, please check out the pycov submodule which is specifically designed for the task.
- **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
- **pychip** : The pychip submodule is designed for working with annotation or manifest files from the Axiom (Thermo Fisher Scientific) and Infinium (Illumina) array platforms.
- **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation. The ``pycov.CovFrame`` class also contains many useful plotting methods such as ``CovFrame.plot_region`` and ``CovFrame.plot_uniformity``.
- **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
- **pygff** : The pygff submodule is designed for working with GFF/GTF files. It implements ``pygff.GffFrame`` which stores GFF/GTF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `GFF specification <https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md>`_.
Expand Down Expand Up @@ -48,6 +49,12 @@ fuc.pybed
.. automodule:: fuc.api.pybed
:members:

fuc.pychip
==========

.. automodule:: fuc.api.pychip
:members:

fuc.pycov
=========

Expand Down
8 changes: 5 additions & 3 deletions docs/create.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,6 @@
.. image:: https://anaconda.org/bioconda/fuc/badges/downloads.svg
:target: https://anaconda.org/bioconda/fuc/files
.. image:: https://anaconda.org/bioconda/fuc/badges/installer/conda.svg
:target: https://conda.anaconda.org/bioconda
Introduction
============
Expand Down Expand Up @@ -93,6 +90,11 @@
Lee et al., 2022. `ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation <https://doi.org/10.1371/journal.pone.0272129>`__. PLOS ONE.
Support fuc
===========
If you find my work useful, please consider becoming a `sponsor <https://github.com/sponsors/sbslee>`__.
Installation
============
Expand Down
61 changes: 59 additions & 2 deletions fuc/api/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -804,7 +804,12 @@ def parse_variant(variant):

def extract_sequence(fasta, region):
"""
Extract the region's DNA sequence from the FASTA file.
Extract the DNA sequence corresponding to a selected region from a FASTA
file.
The method also allows users to retrieve the reference allele of a
variant in a genomic coordinate format, instead of providing a genomic
region.
Parameters
----------
Expand All @@ -817,9 +822,20 @@ def extract_sequence(fasta, region):
-------
str
DNA sequence. Empty string if there is no matching sequence.
Examples
--------
>>> from fuc import common
>>> fasta = 'resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta'
>>> common.extract_sequence(fasta, 'chr1:15000-15005')
'GATCCG'
>>> # rs1423852 is chr16-80874864-C-T
>>> common.extract_sequence(fasta, 'chr16:80874864-80874864')
'C'
"""
try:
sequence = pysam.faidx(fasta, region).split('\n')[1]
sequence = ''.join(pysam.faidx(fasta, region).split('\n')[1:])
except pysam.SamtoolsError as e:
warnings.warn(str(e))
sequence = ''
Expand Down Expand Up @@ -1434,3 +1450,44 @@ def parse_list_or_file(obj, extensions=['txt', 'tsv', 'csv', 'list']):
return convert_file2list(obj[0])

return obj

def reverse_complement(seq, complement=True, reverse=False):
"""
Given a DNA sequence, generate its reverse, complement, or
reverse-complement.
Parameters
----------
seq : str
DNA sequence.
complement : bool, default: True
Whether to return the complment.
reverse : bool, default: False
Whether to return the reverse.
Returns
-------
str
Updated sequence.
Examples
--------
>>> from fuc import common
>>> common.reverse_complement('AGC')
'TCG'
>>> common.reverse_complement('AGC', reverse=True)
'GCT'
>>> common.reverse_complement('AGC', reverse=True, complement=False)
'GCT'
>>> common.reverse_complement('agC', reverse=True)
'Gct'
"""
new_seq = seq[:]
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A',
'a': 't', 'c': 'g', 'g': 'c', 't': 'a'}
if complement:
new_seq = ''.join([complement[x] for x in new_seq])
if reverse:
new_seq = new_seq[::-1]
return new_seq
Loading

0 comments on commit 4b84de8

Please sign in to comment.