-
Notifications
You must be signed in to change notification settings - Fork 26
| Interim taxonomy patch feature (moved)
!!! Page moved to https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Interim-taxonomy-patch-feature
The patch facility permits custom additions and repairs to the Open Tree Taxonomy (OTT).
Introduction to patch files
Modifying OTT
Format of the patch files
Modifying OTT on your own machine
In the process of constructing a new version of the taxonomy (OTT), patches are applied after the source taxonomies (NCBI, GBIF, ...) are algorithmically combined, but before OTT identifiers are assigned (or reassigned) to taxa. Patches can therefore refer to parts of any of the input taxonomies, or to details of the way in which they were combined.
## Adding / editing taxa in OTT In order to add / modify taxa in OTT, you must add the required operation to a text file (a _patch file_) that gets processed at the end of OTT assembly. The simplest way to modify a patch file is directly on github.com (i.e. this site). There are instructions at the bottom of this page for [editing the files locally and uploading to github](#modifying_ott_locally) .Names of patch files should end in '.tsv' (for tab-separated values).
- If you don't already have one, create a GitHub account: use the green Sign up button on the top right. There are more instructions if you need them). You will also need to ask Rick, Karen, Mark or Stephen to add you to the OpenTreeOfLife organization in order to edit files - just send us your github username.
- Log into GitHub.
- Click on one of the files in the list of patch files. Use ott_edits.txt unless your changes fit into one of the other categories. If you want to create a new file, you will have to look at the instructions for working locally.
- Click the Edit button near the top of the file. If it is greyed out, you probably aren't logged in.
- Add a new line to the file for every change that you want to make. Each line contains six columns, separated by tabs (not spaces!). The format section describes what should go in each column.
- Add one or more comments lines (lines that start with #) to document why you are making the change.
- Click the green Commit changes button at the bottom to save your changes. You can add an optional message describing the change (and it is a good idea to do so!).
Each patch file is a plain (UTF-8) text file containing a tab-delimited table. Blank rows and rows beginning with '#' are ignored. Please make liberal use of comment lines '#' to explain the reason for the change.
The columns of a patch file are as follows
- command - what kind of operation to perform (includes add, synonym, move, prune and elide)
- name1 - the name of a taxon (call it taxon1)
- rank - rank to be associated with taxon1
- name2 - the name of another taxon (call it taxon2)
- context - the name of a taxon that is an ancestor of taxon1 and taxon2 (for homonym disambiguation) - typically this can be at the kingdom level since names are usually unique within any particular code (not always though)
- sourceInfo - abbreviated information about the source of this taxon, see below
If neither name1 nor name2 is a homonym (of some other name) then context can be 'life'. Otherwise context should be some non-homonym name with the property that the taxon it names has at most one descendant named name1 (or name2).
The commands are:
### add - taxon1 is a (probably new) child of taxon2- taxon2 should already exist in the taxonomy.
- If taxon1 already exists, no action is taken. The command is flagged as an error if taxon1's parent is other than taxon2, or if its rank is not as given.
- Otherwise taxon1 is added to the taxonomy as a child of taxon2, with the specified rank.
- taxon2 should already exist in the taxonomy.
- If name1 is already a name for taxon2, no action is taken.
- Otherwise name1 is added as a name for taxon2 (i.e. a synonym of name2).
- The rank field should be blank.
- taxon1 and taxon2 should already exist.
- taxon2 is made to be the parent of taxon1.
- The rank of taxon1 is made to be as specified.
- taxon1 should exist; if it doesn't no action is taken.
- name2 should be blank.
Its children are altered so that their parent becomes the parent of the deleted taxon.
- taxon1 should exist; if it doesn't no action is taken.
- name2 should be blank.
This is used to provide source or provenance information for newly added taxa. If you are adding a new taxon, under no circumstances should this field be left empty.
- The value should be either a URI or a CURIE.
- This should be a reference to an accession in an established taxonomic database.
- If no such reference is available, it should be a reference to a published description of the taxon in question.
- If no such reference is available, it should be the DOI for the article reporting the study, assuming it explains the new name (and it should!).
- If there is no DOI for the article, use the best (most stable) possible URL for the article you can get ahold of.
- If there is no provenance other than that it's your own unpublished opinion, put your ORCID.
Examples
- ncbi:1234 - the taxon comes from record 1234 of the NCBI taxonomy
- gbif:1234 - the taxon comes from record 1234 of the GBIF taxonomy
- IF:1234 - Index Fungorum
- MB:1234 - Mycobank
- http://dx.doi.org/10.12345 - see published article
- http://books.google.com/books?isbn=0643099298 - book
- http://orcid.org/0000-0001-7694-8250 - personal identifier
The system currently (Sept 2013) knows about the above four databases (ncbi: gbif: IF: MB:). Other source prefixes can be added on request. Submit the request as an issue in the 'opentree' issue tracker https://github.com/OpenTreeOfLife/opentree/issues. Absent any other kind of provenance, provide a URL that goes to an explanation of the name.
## Modifying a patch file on your local machine Instructions for this coming soon. Reading up on git would be a good place to start. Here are a few places to start: * http://git-scm.com/documentation * http://www.sbf5.com/~cduan/technical/git/ *