Skip to content

Commit

Permalink
Merge pull request #80 from NBISweden/2.3.0
Browse files Browse the repository at this point in the history
* Remove python_version>="3.8.0" from install_requires section in setup.py that was not working properly. Use instead old fashion test e.g. if sys.version_info < (3,9): sys.exit('Sorry, Python >= 3.9 is required')
This change fix Installation issue EMBLmyGFF3 & python version requirements #79 , fix if I can specify certain python source during python setup.py install #77
* Update test : Remove dates for comparison between expected output and result (DT and RL lines). Change taxonomy for prokka test that has recently changed in NCBI taxonomy DB.
* Move minimum python requirement from python 3.8 to python 3.9 due to solve some dependencies requirement issues
* fix Not for ENA submission: Sequence too short #78 add possibility to avoid to skip short sequences (< 100 nt)
* Change badge in Readme for CI
  • Loading branch information
Juke34 authored Sep 1, 2023
2 parents 975eb31 + d9f951f commit d7a7f35
Show file tree
Hide file tree
Showing 20 changed files with 126 additions and 229 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
strategy:
matrix:
# Run in all these versions of Python
python-version: [3.8, 3.9]
python-version: [3.9, '3.10', 3.11]

# Steps represent a sequence of tasks that will be executed as part of the job
steps:
Expand Down
9 changes: 6 additions & 3 deletions EMBLmyGFF3/EMBLmyGFF3.py
Original file line number Diff line number Diff line change
Expand Up @@ -1299,6 +1299,7 @@ def main():
parser.add_argument("--force_uncomplete_features", action="store_true", help="Force to keep features whithout all the mandatory qualifiers. /!\ Option not suitable for submission purpose.")
parser.add_argument("--interleave_genes", action="store_false", help="Print gene features with interleaved mRNA and CDS features.")
parser.add_argument("--keep_duplicates", action="store_true", help="Do not remove duplicate features during the process. /!\ Option not suitable for submission purpose.")
parser.add_argument("--keep_short_sequences", action="store_true", help="Do not skip short sequences (<100bp). /!\ Option not suitable for submission purpose.")
parser.add_argument("--locus_numbering_start", default=1, type=int, help="Start locus numbering with the provided value.")
parser.add_argument("--no_progress", action="store_false", help="Hide conversion progress counter.")
parser.add_argument("--no_wrap_qualifier", action="store_true", help="By default there is a line wrapping at 80 characters. The cut is at the world level. Activating this option will avoid the line-wrapping for the qualifiers.")
Expand Down Expand Up @@ -1472,9 +1473,11 @@ def main():
"For you information, if you use the --translate option the tool will raise an error due to ??? codons that do not exist." % (record.id))

# Check sequence size and skip if < 100 bp
if len(record.seq)<100:
logging.warning("Sequence %s too short (%s bp)! Minimum accpeted by ENA is 100, we skip it !" % (record.name, len(record.seq) ) )
continue
if not args.keep_short_sequences:
if len(record.seq)<100:
logging.warning("Sequence %s too short (%s bp)! Minimum accpeted by ENA is 100, we skip it !" % (record.name, len(record.seq) ) )
continue

writer = EMBL( record, True )

# qualifiers / features json information
Expand Down
7 changes: 7 additions & 0 deletions EMBLmyGFF3/modules/help.py
Original file line number Diff line number Diff line change
Expand Up @@ -569,6 +569,13 @@ def Help(string):
Bolean - Doesnt expect any value
Do not remove duplicate features during the process.
/!\ Option not suitable for submission purpose. Features that have the same key (feature type) and location as another feature are considered as duplicates and aren't allowed by the EMBL database. So they are remove during the process. If you don't plan to submit the file to ENA and you wish to keep these features, use the --keep_duplicates option.
"""
if(string == "keep_short_sequences" or string == "all"):
output += string+""":
EMBLmyGFF3 tool specific
Bolean - Doesnt expect any value
Do not remove short sequences (< 100bp) during the process.
/!\ Option not suitable for submission purpose.
"""
if(string == "force_unknown_features" or string == "all"):
output += string+""":
Expand Down
2 changes: 1 addition & 1 deletion EMBLmyGFF3/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '2.2'
__version__ = '2.3'
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@


[![Build Status](https://travis-ci.org/NBISweden/EMBLmyGFF3.svg?branch=master)](https://travis-ci.org/NBISweden/EMBLmyGFF3) [![DOI](EMBLmyGFF3.svg)](https://doi.org/10.1186/s13104-018-3686-x)
![GitHub CI](https://github.com/NBISweden/EMBLmyGFF3/actions/workflows/main.yml/badge.svg)
[![DOI](EMBLmyGFF3.svg)](https://doi.org/10.1186/s13104-018-3686-x)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/emblmygff3/README.html)
[![Anaconda-Server Badge](https://img.shields.io/conda/dn/bioconda/emblmygff3.svg?style=flat)](https://anaconda.org/bioconda/emblmygff3)
[<img alt="docker_emblmygff3" src="https://img.shields.io/badge/container-Docker-blue">](https://quay.io/repository/biocontainers/emblmygff3)
Expand Down Expand Up @@ -62,7 +61,7 @@ __You don't know how to submit to ENA ? Please visit the [ENA: Guidelines and Ti

## Prerequisites

**Python >=3.8**, **biopython >=1.78**, **numpy >=1.22** and the **bcbio-gff >=0.6.4** python packages.
**Python >=3.9**, **biopython >=1.78**, **numpy >=1.22** and the **bcbio-gff >=0.6.4** python packages.

In order to install pip please use the following steps:

Expand Down Expand Up @@ -321,6 +320,7 @@ You can also find a comprehensive help about the different parameters using the
| --isolate| Individual isolate from which the sequence was obtained. May be needed when organism belongs to Bacteria.|
| --isolation_source| Describes the physical, environmental and/or local geographical source of the biological sample from which the sequence was derived. Mandatory when environmental_sample option used.|
| --keep_duplicates| Do not remove duplicate features during the process. /!\ Option not suitable for submission purpose.|
| --keep_short_sequences| Do not remove short sequences (< 100bp) during the process. /!\ Option not suitable for submission purpose.|
| --locus_numbering_start| Start locus numbering with the provided value.|
| --no_progress| Hide conversion progress counter.|
| --no_wrap_qualifier| By default there is a line wrapping at 80 characters. The cut is at the world level. Activating this option will avoid the line-wrapping for the qualifiers.|
Expand Down
2 changes: 1 addition & 1 deletion conda_environment_EMBLmyGFF3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ channels:
- defaults

dependencies:
- python>=3.8.0
- python>=3.9.0
- biopython>=1.78
- bcbio-gff>=0.6.4
- numpy>=1.22
2 changes: 1 addition & 1 deletion examples/aa_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def main():
MOLECULE="genomic DNA"

#Create the command
command = "EMBLmyGFF3 --translate --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t linear -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-aa-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
command = "EMBLmyGFF3 --no_progress --translate --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t linear -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-aa-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
print("Running the following command: "+command)

#Execute the command
Expand Down
2 changes: 1 addition & 1 deletion examples/augustus_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def main():
MOLECULE="genomic DNA"

#Create the command
command = "EMBLmyGFF3 --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t linear -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-augustus-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
command = "EMBLmyGFF3 --no_progress --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t linear -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-augustus-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
print("Running the following command: "+command)

#Execute the command
Expand Down
2 changes: 1 addition & 1 deletion examples/dbxref_test_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def main():
MOLECULE="genomic DNA"

#Create the command
command = "EMBLmyGFF3 --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t linear -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-dbxref_test-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
command = "EMBLmyGFF3 --no_progress --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t linear -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-dbxref_test-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
print("Running the following command: "+command)

#Execute the command
Expand Down
2 changes: 1 addition & 1 deletion examples/maker_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def main():
MOLECULE="genomic DNA"

#Create the command
command = "EMBLmyGFF3 --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t linear -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-maker-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
command = "EMBLmyGFF3 --no_progress --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t linear -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-maker-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
print("Running the following command: "+command)

#Execute the command
Expand Down
2 changes: 1 addition & 1 deletion examples/prokka_disorder_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def main():
STRAIN="K-12"

#Create the command
command = "EMBLmyGFF3 --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t "+TOPOLOGY+" --strain \""+STRAIN+"\" -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-prokka_disorder-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
command = "EMBLmyGFF3 --no_progress --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t "+TOPOLOGY+" --strain \""+STRAIN+"\" -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-prokka_disorder-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
print("Running the following command: "+command)

#Execute the command
Expand Down
2 changes: 1 addition & 1 deletion examples/prokka_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def main():
STRAIN="K-12"

#Create the command
command = "EMBLmyGFF3 --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t "+TOPOLOGY+" --strain \""+STRAIN+"\" -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-prokka-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
command = "EMBLmyGFF3 --no_progress --rg REFERENCE_GROUP -i "+LOCUS_TAG+" -p "+PROJECT+" -m \""+MOLECULE+"\" -r "+TABLE+" -t "+TOPOLOGY+" --strain \""+STRAIN+"\" -s \""+SPECIES+"\" -x "+TAXONOMY+" -o EMBLmyGFF3-prokka-example.embl "+fill_path(ANNOTATION)+" "+fill_path(GENOME)
print("Running the following command: "+command)

#Execute the command
Expand Down
7 changes: 6 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Check python version - it is not possible to specify which Python version to use in the setup.py file
import sys
if sys.version_info < (3,9):
sys.exit('Sorry, Python >= 3.9 is required')

from setuptools import setup, find_packages

# access the version wihtout importing the EMBLmyGFF3 package
Expand All @@ -19,7 +24,7 @@
license='GPL-3.0',
packages=find_packages(),

install_requires=['biopython>=1.78', 'bcbio-gff>=0.6.4','numpy>=1.22', 'python_version>="3.8.0"' ],
install_requires=['biopython>=1.78', 'bcbio-gff>=0.6.4','numpy>=1.22'],
include_package_data=True,

entry_points={
Expand Down
2 changes: 0 additions & 2 deletions t/EMBLmyGFF3-aa-test.embl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ AC * _4
XX
PR Project:17285;
XX
DT 02-SEP-2022 (Rel. 133, Created)
XX
DE XXX
XX
Expand All @@ -26,7 +25,6 @@ RN [1]
RP 1-1351857
RG REFERENCE_GROUP
RT ;
RL Submitted (02-SEP-2022) to the INSDC.
XX
FH Key Location/Qualifiers
FH
Expand Down
2 changes: 0 additions & 2 deletions t/EMBLmyGFF3-augustus-test.embl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ AC * _4
XX
PR Project:17285;
XX
DT 04-MAR-2021 (Rel. 133, Created)
XX
DE XXX
XX
Expand All @@ -26,7 +25,6 @@ RN [1]
RP 1-1351857
RG REFERENCE_GROUP
RT ;
RL Submitted (04-MAR-2021) to the INSDC.
XX
FH Key Location/Qualifiers
FH
Expand Down
2 changes: 0 additions & 2 deletions t/EMBLmyGFF3-dbxref_test-test.embl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ AC * _4
XX
PR Project:17285;
XX
DT 02-SEP-2022 (Rel. 133, Created)
XX
DE XXX
XX
Expand All @@ -26,7 +25,6 @@ RN [1]
RP 1-1351857
RG REFERENCE_GROUP
RT ;
RL Submitted (02-SEP-2022) to the INSDC.
XX
FH Key Location/Qualifiers
FH
Expand Down
2 changes: 0 additions & 2 deletions t/EMBLmyGFF3-maker-test.embl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ AC * _4
XX
PR Project:17285;
XX
DT 04-MAR-2021 (Rel. 133, Created)
XX
DE XXX
XX
Expand All @@ -26,7 +25,6 @@ RN [1]
RP 1-1351857
RG REFERENCE_GROUP
RT ;
RL Submitted (04-MAR-2021) to the INSDC.
XX
FH Key Location/Qualifiers
FH
Expand Down
Loading

0 comments on commit d7a7f35

Please sign in to comment.