diff --git a/_wiki/classify.seqs.md b/_wiki/classify.seqs.md index 0317964..4de9ede 100644 --- a/_wiki/classify.seqs.md +++ b/_wiki/classify.seqs.md @@ -10,7 +10,7 @@ neighbor consensus and zap. Taxonomy outlines and reference sequences can be obtained from the [taxonomy outline](/wiki/taxonomy_outline) page. The command requires that you provide a fasta-formatted input and database sequence file and a -taxonomy file for the reference sequences. To run through the example below, download [Example Data](https://mothur.s3.us-east-2.amazonaws.com/wiki/ExampleDataSet.zip) +taxonomy file for the reference sequences. To run through the example below, download [Example Data](https://mothur.s3.us-east-2.amazonaws.com/wiki/exampledataset.zip) and [mothur-formatted version of the RDP training set (v.9)](https://mothur.s3.us-east-2.amazonaws.com/wiki/trainset9_032012.pds.zip). diff --git a/_wiki/greengenes-formatted_databases copy.md b/_wiki/greengenes-formatted_databases copy.md new file mode 100644 index 0000000..3a6e714 --- /dev/null +++ b/_wiki/greengenes-formatted_databases copy.md @@ -0,0 +1,10 @@ +--- +title: 'greengenes2-formatted databases' +redirect_from: '/wiki/greengenes2-formatted_databases' +--- + +The [biocore group](https://github.com/biocore/greengenes2/) released an updated version of the greengenes taxonomy in [October 2022](https://ftp.microbio.me/greengenes_release/2022.10/), which was published in [Nature Biotechnology](https://www.nature.com/articles/s41587-023-01845-1). If you use these files, you should cite McDonald et al. + +I have modified the version made available on the greengenes2 ftp server. The most notable difference is that I removed the species level names since more than two thirds of the genera only have one species name. In my opinion, this would give an overly specific sense of the classification of your sequences since there is insufficient diversity within each species. If you would like to see how to get the species names and see how else I modified the files, please see the [mothur blog post](/blog/2014/greengenes-v13_8_99-reference-files) which is the same as the README file found within the download that I am making available. + +* [greengenes2 (2020_10, wo/ species level names)](https://mothur.s3.us-east-2.amazonaws.com/wiki/greengenes2_2020_10.wo_sp.tgz) \ No newline at end of file diff --git a/_wiki/miseq_sop.md b/_wiki/miseq_sop.md index 95ca2d9..25d2fb9 100644 --- a/_wiki/miseq_sop.md +++ b/_wiki/miseq_sop.md @@ -75,8 +75,9 @@ For this tutorial you will need `mothur` and several sets of files: You can easily substitute these choices (and should) for the reference and taxonomy alignments using the updated [Silva reference files](/wiki/Silva_reference_files), [RDP reference -files](/wiki/RDP_reference_files), and [Greengenes-formatted -databases](/wiki/Greengenes-formatted_databases). We use the above +files](/wiki/RDP_reference_files), [Greengenes-formatted +databases](/wiki/Greengenes-formatted_databases), and [Greengenes2-formatted +databases](/wiki/greengenes2-formatted_databases). We use the above files because they're compact and do a pretty good job. The various classification references perform differently with different sample types so your mileage may vary. It is generally easiest to decompress diff --git a/_wiki/taxonomy_outline.md b/_wiki/taxonomy_outline.md index a526dcc..28ea374 100644 --- a/_wiki/taxonomy_outline.md +++ b/_wiki/taxonomy_outline.md @@ -27,3 +27,6 @@ You can download our version of the.. files](/wiki/Greengenes-formatted_databases): The fasta and taxonomic outline that greengenes uses with their classifier and can be used with the Bayesian classifier +- [ greengenes2 reference + files](/wiki/greengenes2-formatted_databases): The fasta and + taxonomic outline that was modified by McDonald et al.