Skip to content

Releases: roblanf/sarscov2phylo

9-11-20

24 Nov 23:11
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
143253
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 140652
Alignment length:    29903
Total # residues:    4191538620
Smallest:            29105
Largest:             29903
Average length:      29800.8
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 140652
Alignment length:    29903
Total # residues:    4172839779
Smallest:            29036
Largest:             29675
Average length:      29667.8
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 140608
Alignment length:    29903
Total # residues:    4171537236
Smallest:            29036
Largest:             29675
Average length:      29667.9
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 140608
Alignment length:    29646
Total # residues:    4161263119
Smallest:            28337
Largest:             29646
Average length:      29594.8
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	243604
#leaves:	140559
#dichotomies:	97586
#leaf labels:	140559
#inner labels:	91526
Number of new sequences added this iteration
2313 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • None

7-11-20

24 Nov 23:09
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
140940
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 138365
Alignment length:    29903
Total # residues:    4123335297
Smallest:            29105
Largest:             29903
Average length:      29800.4
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 138365
Alignment length:    29903
Total # residues:    4104977719
Smallest:            29036
Largest:             29675
Average length:      29667.7
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 138321
Alignment length:    29903
Total # residues:    4103675176
Smallest:            29036
Largest:             29675
Average length:      29667.8
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 138321
Alignment length:    29646
Total # residues:    4093552083
Smallest:            28337
Largest:             29646
Average length:      29594.6
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	239758
#leaves:	138295
#dichotomies:	96101
#leaf labels:	138295
#inner labels:	90149
Number of new sequences added this iteration
4069 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • None

5-11-20

24 Nov 23:05
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
136871
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 134414
Alignment length:    29903
Total # residues:    4005663633
Smallest:            29105
Largest:             29903
Average length:      29800.9
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 134414
Alignment length:    29903
Total # residues:    3987748459
Smallest:            29036
Largest:             29675
Average length:      29667.7
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 134370
Alignment length:    29903
Total # residues:    3986445916
Smallest:            29036
Largest:             29675
Average length:      29667.7
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 134370
Alignment length:    29646
Total # residues:    3976491642
Smallest:            28498
Largest:             29646
Average length:      29593.6
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	233128
#leaves:	134252
#dichotomies:	93693
#leaf labels:	134252
#inner labels:	87918
Number of new sequences added this iteration
5656 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • None

13-11-20

24 Nov 23:15
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
150054
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 147328
Alignment length:    29903
Total # residues:    4390461015
Smallest:            29105
Largest:             29903
Average length:      29800.6
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 147328
Alignment length:    29903
Total # residues:    4370913227
Smallest:            29036
Largest:             29675
Average length:      29667.9
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 147284
Alignment length:    29903
Total # residues:    4369610684
Smallest:            29036
Largest:             29675
Average length:      29667.9
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 147284
Alignment length:    29646
Total # residues:    4358914020
Smallest:            28337
Largest:             29646
Average length:      29595.3
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	255148
#leaves:	147156
#dichotomies:	102247
#leaf labels:	147156
#inner labels:	95805
Number of new sequences added this iteration
3502 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • None

11-11-20

24 Nov 23:13
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
146552
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 143902
Alignment length:    29903
Total # residues:    4288314661
Smallest:            29105
Largest:             29903
Average length:      29800.2
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 143902
Alignment length:    29903
Total # residues:    4269256702
Smallest:            29036
Largest:             29675
Average length:      29667.8
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 143858
Alignment length:    29903
Total # residues:    4267954159
Smallest:            29036
Largest:             29675
Average length:      29667.8
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 143858
Alignment length:    29646
Total # residues:    4257455014
Smallest:            28337
Largest:             29646
Average length:      29594.8
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	249182
#leaves:	143782
#dichotomies:	99803
#leaf labels:	143782
#inner labels:	93540
Number of new sequences added this iteration
3299 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • None

30-10-20

18 Nov 22:51
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
125559
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 123289
Alignment length:    29903
Total # residues:    3674259396
Smallest:            25059
Largest:             29903
Average length:      29802.0
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 123289
Alignment length:    29903
Total # residues:    3657631006
Smallest:            24962
Largest:             29675
Average length:      29667.1
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 123095
Alignment length:    29903
Total # residues:    3651899312
Smallest:            28961
Largest:             29675
Average length:      29667.3
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 123095
Alignment length:    29646
Total # residues:    3642447151
Smallest:            28437
Largest:             29646
Average length:      29590.5
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	214496
#leaves:	123048
#dichotomies:	86812
#leaf labels:	123048
#inner labels:	81578
Number of new sequences added this iteration
2820 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • None

3-11-20

18 Nov 23:15
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
131215
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 128821
Alignment length:    29903
Total # residues:    3839063069
Smallest:            29105
Largest:             29903
Average length:      29801.5
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 128821
Alignment length:    29903
Total # residues:    3821789644
Smallest:            29036
Largest:             29675
Average length:      29667.4
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 128777
Alignment length:    29903
Total # residues:    3820487101
Smallest:            29036
Largest:             29675
Average length:      29667.5
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 128777
Alignment length:    29646
Total # residues:    3810742331
Smallest:            28498
Largest:             29646
Average length:      29591.8
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	224247
#leaves:	128714
#dichotomies:	90656
#leaf labels:	128714
#inner labels:	85123
Number of new sequences added this iteration
6018 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • There is some notable lack of clustering in some pangolin lineages, but I have not yet had a chance to figure out whether this is an issue with my recently added QC code (quite possible), the tree, or something else.

1-11-20

18 Nov 23:11
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
125525
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 123213
Alignment length:    29903
Total # residues:    3672030124
Smallest:            29105
Largest:             29903
Average length:      29802.3
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 123213
Alignment length:    29903
Total # residues:    3655405481
Smallest:            29036
Largest:             29675
Average length:      29667.4
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 123169
Alignment length:    29903
Total # residues:    3654102938
Smallest:            29036
Largest:             29675
Average length:      29667.4
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 123169
Alignment length:    29646
Total # residues:    3644707382
Smallest:            28498
Largest:             29646
Average length:      29591.1
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	214501
#leaves:	123070
#dichotomies:	86781
#leaf labels:	123070
#inner labels:	81333
Number of new sequences added this iteration
493 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • None

28-10-20

11 Nov 02:46
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
122740
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 120561
Alignment length:    29903
Total # residues:    3592959392
Smallest:            25059
Largest:             29903
Average length:      29802.0
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 120561
Alignment length:    29903
Total # residues:    3576678015
Smallest:            24962
Largest:             29675
Average length:      29667.0
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 120367
Alignment length:    29903
Total # residues:    3570946321
Smallest:            28961
Largest:             29675
Average length:      29667.2
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 120367
Alignment length:    29646
Total # residues:    3561657772
Smallest:            28437
Largest:             29646
Average length:      29590.0
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	209776
#leaves:	120275
#dichotomies:	84953
#leaf labels:	120275
#inner labels:	79848
Number of new sequences added this iteration
5293 alignment_names_new.txt

Notable changes to the scripts in this release

  • None

Notable aspects of the trees

  • None

26-10-20

30 Oct 22:01
Compare
Choose a tag to compare

Citation and reuse

Please cite this release as:

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

You can visit that DOI here: DOI

If you publish papers that use this tree you must still follow the GISAID data sharing and attribution rules.

Details

The trees in this release were generated with the following command line:

bash global_tree_gisaid_start_tree.sh -i [gisaid.fasta] -p [previous_iteration] -t 250

  • [gisaid.fasta] is the fasta file of high coverage and complete raw sequences from GISAID up to and including the date in the title of the release, determined by the 'submission date' filter on a GISAID data feed

  • [previous_iteration] is the filepath of the previous release, this is used to provide the excluded_sequences.tsv and ft_SH.tree files as the starting points of the current iteration.

Filtering statistics

sequences downloaded from GISAID
117452
//
alignment stats of global alignment
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 115343
Alignment length:    29903
Total # residues:    3437577737
Smallest:            25059
Largest:             29903
Average length:      29803.1
Average identity:    100%
//
alignment stats of global alignment after masking sites
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 115343
Alignment length:    29903
Total # residues:    3421874903
Smallest:            24962
Largest:             29675
Average length:      29666.9
Average identity:    100%
//
alignment stats after filtering out short/ambiguous sequences
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 115149
Alignment length:    29903
Total # residues:    3416143209
Smallest:            28961
Largest:             29675
Average length:      29667.2
Average identity:    100%
//
alignment stats of global alignment after trimming sites that are >50% gaps
Alignment number:    1
Format:              aligned FASTA
Number of sequences: 115149
Alignment length:    29646
Total # residues:    3407086202
Smallest:            28437
Largest:             29646
Average length:      29588.5
Average identity:    100%
//
After filtering sequences with TreeShrink
Type:	Phylogram
#nodes:	201308
#leaves:	115078
#dichotomies:	81906
#leaf labels:	115078
#inner labels:	76993
Number of new sequences added this iteration
1534 alignment_names_new.txt

Notable changes to the scripts in this release

  • Scripts have changed to include some simple QC.

Notable aspects of the trees

  • None