From adbab82ffd93fa705491f53a09433461f2750b1f Mon Sep 17 00:00:00 2001 From: John Huddleston Date: Wed, 14 Aug 2024 14:25:23 -0700 Subject: [PATCH] Small text edit --- manuscript/cartography.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manuscript/cartography.tex b/manuscript/cartography.tex index c5bd65af..9b366b13 100644 --- a/manuscript/cartography.tex +++ b/manuscript/cartography.tex @@ -437,7 +437,7 @@ \subsection{SARS-CoV-2 clusters recapitulate broad genetic groups corresponding Unlike the Pango lineages in the early SARS-CoV-2 data, the lineages from the later data exhibited fewer pairwise genetic distances between samples in each lineage than samples in Nextstrain clades or any embedding cluster (Supplementary Fig.~S\ref{S_Fig_sarscov2_within_between_group_distances}). To understand whether t-SNE clusters could capture Pango-resolution genetic groups within a single Nextstrain clade, we evenly sampled approximately 2,000 sequences from a dominant Nextstrain clade with many Pango lineages, 21J (Delta), and identified clusters from a t-SNE embedding of those data. -Within the 1,992 sequences from 21J (Delta), we found 38 Pango lineages after collapsing lineages with fewer than 10 sequences into their parent lineages. +Within the 1,992 sequences sampled from 21J (Delta), we found 38 Pango lineages after collapsing lineages with fewer than 10 sequences into their parent lineages. We found 28 t-SNE clusters representing 1,806 sequences (91\%) with 186 sequences (9\%) assigned to the unclustered ``-1'' label (Supplementary Fig.~\ref{S_Fig_sarscov2_single_clade_embeddings_tsne_counts}). The VI distance between Pango lineages and all clusters (including the unclustered group) was 0.17 (Supplementary Table~\ref{S_Table_optimal_cluster_parameters}). This distance was consistent with the distance of 0.14 between Pango lineages and t-SNE clusters from both the full early and late SARS-CoV-2 datasets.