Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genbank formatted output #188

Closed
innovate-invent opened this issue Nov 27, 2019 · 1 comment
Closed

Genbank formatted output #188

innovate-invent opened this issue Nov 27, 2019 · 1 comment

Comments

@innovate-invent
Copy link
Collaborator

innovate-invent commented Nov 27, 2019

There are some barriers that will need to be overcome before Genbank format (or EMBL) can be supported for output.

Genbank supports a limited number of feature types, genomic islands not being one of them.
I emailed the NIH requesting advice on how to store unsupported features and they recommended using the misc_feature feature type. ex:

misc_feature    654..26955
                         /note="AbGRI1-5 genomic island"

This is not ideal as it places structured data in a free form text field.

If genomic islands can accurately be referred to as mobile elements then another feature was recommended:

mobile_element  3190..57412
                           /note="Integrative Element (IE)"
                           /mobile_element_type="other:Acinetobacter Genomic
                           Island 1 (AGI1)"

but this does not contain structured data identifying it as a genomic island.

My alternative proposal is:

mobile_element  3190..57412
                           /note="Integrative Element (IE)"
                           /mobile_element_type="other:genomic_island"
                           /standard_name="Acinetobacter Genomic
                           Island 1 (AGI1)"

This conforms to the Genbank standard here: http://www.insdc.org/files/feature_table.html
The mobile_element_type feature qualifier is defined as semi-structured data and genomic_island is the appropriate term from the Sequence Ontology. The Sequence Ontology also defines genomic_island as a descendant of mobile_genetic_element.

The other major barrier is that the stitcher currently generates invalid Genbank files. See brinkmanlab/galaxy-tools#5 and brinkmanlab/galaxy-tools#6 . The first linked issue could be resolved with brinkmanlab/galaxy-tools#8 but I doubt the second issue would be.

@innovate-invent
Copy link
Collaborator Author

Merging issue with #171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant