-
Notifications
You must be signed in to change notification settings - Fork 42
/
Copy pathREADME.Rmd
504 lines (427 loc) · 23.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
---
title: crAssphage project
output:
github_document:
toc: true
toc_depth: 3
fig_height: 4
fig_width: 6
dev: jpeg
bibliography: crAssphage.bib
---
# Background
Supplementary data analysis for the paper:
_**Karkman, A., Pärnänen, K., Larsson, DGJ.**_ 2019. Fecal pollution explains antibiotic resistance gene abundances in anthropogenically impacted environments. _Nature Communications_ **10**:80. DOI: [10.1038/s41467-018-07992-3](https://doi.org/10.1038/s41467-018-07992-3)
## Introduction
Discharge of treated sewage introduces antibiotic resistance genes (ARGs) and resistant bacteria (ARBs) to the environment [@karkman_antibiotic-resistance_2018]. It has been speculated that the resistance determinants could be selected and/or disseminated to other bacteria in the downstream environments due to the antibiotcs and other selective agents released in the effluents [@rizzo_urban_2013; @guo_metagenomic_2017]. However, the increased abundance of ARGs in downstream environments could as well be explained by the amount of fecal pollution without any large scale selection and/or dissemination of the genes.
Recently a human feacal phage, crAssphage, was discovered from human fecal metagenomes [@dutilh_highly_2014] and was shown to infect _Bacteroides intestinalis_ [@shkoporov_crass001_2018]. It has been shown to be very abundant in human fecal material and mostly specific to humans [@garcia-aljaro_determination_2017]. Due to these facts, it has already been used as fecal marker in various studies [@stachler_correlation_2018;@ahmed_precipitation_2018;@stachler_metagenomic_2014;@stachler_quantitative_2017;@ahmed_evaluation_2018]. Another fecal Bacteroides phage, ɸB124-14, has also been used for microbial source tracking. Unlike crAssphage, it should be abundant in porcine and bovine guts as well [@ogilvie_resolution_2017]. However, we did not find it to perform as well as crAssphage and it won't be included in the analyses.
In this study we show that in most of the the studied environments the abundance of ARGs, *intI1* integrase gene and mobile genetic elements correlates well with fecal pollution levels with no evident signs of selection or dissemination of the resistance genes. The only exception being sediments polluted with wastewater from drug manufacturing containing exceptionally high levels of antibiotics [@bengtsson-palme_shotgun_2014;@kristiansson_pyrosequencing_2011].
**This supplementary data analysis file is meant to document the analyses. For more detailed information about the background, methods, results and conclusions we of course recommend the original article.**
# Bioinformatics
The bioinformartics part shows only example commands and is not meant to be run as such. The results from the bioinformatics part are available in the `data` folder and will be used in the data analysis part with R.
All metagenomic samples were downloaded from public repositories as described in the methods.
The crAssphage ([NC_024711.1](https://www.ncbi.nlm.nih.gov/nuccore/NC_024711.1)) and ɸB124-14 ([HE608841.1](https://www.ncbi.nlm.nih.gov/nuccore/HE608841.1)) genomes were downloaded from
GenBank as fasta files.
## Phage genomes and Bowtie index
The phage genomes were indexed using `bowtie2-build`
```{bash, eval=FALSE}
bowtie2-build phage_genome.fasta phage_genome
```
## Mapping reads against phage genomes and calculating genome coverage
After indexing the phage genomes each sample was mapped against the genomes. The average genome coverage was used as a proxy for phage abundance.
__Paired-end reads:__
```{bash, eval=FALSE}
bowtie2 -x phage_genome -1 SampleX_R1_reads.fastq.gz -2 SampleX_R2_reads.fastq.gz -S SampleX.sam
samtools view -Sb -f 2 SampleX.sam > SampleX.bam
samtools sort SampleX.bam -o SampleX_sort.bam
samtools index SampleX_sort.bam
export GEN_COV=$(samtools depth -a SampleX_sort.bam |\
awk '{ sum += $3; n++ } END { if (n > 0) print sum / n; }')
echo 'SampleX\t'$GEN_COV
```
__Single reads:__
```{bash, eval=FALSE}
bowtie2 -x phage_genome -U SampleY.fastq -S SampleY.sam
samtools view -Sb -q 10 SampleY.sam > SampleY.bam
samtools sort SampleY.bam -o SampleY_sort.bam
samtools index SampleY_sort.bam
export GEN_COV=$(samtools depth -a SampleY_sort.bam |\
awk '{ sum += $3; n++ } END { if (n > 0) print sum / n; }')
echo 'SampleY\t'$GEN_COV
```
## ARG and MGE abundance
The `fastq` files were converted to `fasta`. The sample name was added to each sequence header before concatenating all fasta files to one file for annotation with DIAMOND against the [ARG](https://bitbucket.org/genomicepidemiology/resfinder_db) and [MGE](https://github.com/KatariinaParnanen/MobileGeneticElementDatabase) databases.
The count tables are generated from the DIAMOND outputs using custom scripts. The scripts for single (`parse_diamond.py`) and paired-end reads (`parse_diamondPE.py`) can be downloaded from my other [Github repository ](https://github.com/karkman/parse_diamond).
__Paired-end reads:__
```{bash, eval=FALSE}
diamond blastx -d ResFinder -q SampleX_R1.fasta --max-target-seqs 1 -o SampleX_R1_res.txt \
-f 6 --id 90 --min-orf 20 -p 24 --masking 0
diamond blastx -d ResFinder -q SampleX_R2.fasta --max-target-seqs 1 -o SampleX_R2_res.txt \
-f 6 --id 90 --min-orf 20 -p 24 --masking 0
python parse_diamondPE.py -1 SampleX_R1_res.txt -2 SampleX_R2_res.txt -o SampleX_ResFinder.csv
```
__Single reads:__
```{bash, eval=FALSE}
diamond blastx -d ResFinder -q SampleY.fasta --max-target-seqs 1 -o SampleY_res.txt \
-f 6 --id 90 --min-orf 20 -p 24 --masking 0
python parse_diamondPE.py -i SampleY_res.txt -o SampleY_ResFinder.csv
```
To bp count in each metagenome was used for normalization.
# Data analysis and statistics in R
The results from mapping against crAssphage and the gene annotations were imported to R and combined in data frames. Resulting data frames for each part of the study can be found from the `data`folder.
## Load in the data and libraries needed in the analyses
Packages `tidyverse`, `vegan`, `grid` and `gridExtra` are needed for the analyses
(they can be installed with `install.packages` function).
In here the results are read to R and the colors used in the figures are defined.
```{r, message=FALSE}
library(tidyverse)
library(vegan)
library(grid)
library(gridExtra)
cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
HMP <- read.table("data/HMP.txt")
crass_impact <- read.table("data/crass_impact.txt")
MG_RAST <- read.table("data/MG-RAST.txt")
crass_wwtp <- read.table("data/crass_wwtp.txt")
res_risk <- read.table("data/res_risk.txt")
```
## Figure 1 - crAssphage and ARG dynamics in human feacal metagenomes
The first figure shows the correlation between crAssphage and ARGs and _intI1_ integrase gene in human fecal metagenomes in different populations.
```{r, warning=FALSE, fig.height=6, fig.width=12}
par(fig=c(0,0.45,0,0.8), new=TRUE)
plot(log10(rel_res)~log10(rel_crAss), data=HMP, bg=cols[as.factor(HMP$country)], pch=21,
ylab = "Normalized ARG abundance (log10)",
xlab="Normalized crAssphage abundance (log10)", cex=2, ylim=c(2.5, 4.5))
par(fig=c(0,0.45,0.5,1), new=TRUE)
boxplot(log10(rel_crAss)~country, data=HMP, horizontal=TRUE, col=cols, axes=F)
axis(2, at=1:3, labels=c("China", "Europe", "US"), las=1)
title("A", adj = 0, line = 0)
par(fig=c(0.45,0.9,0,0.8), new=TRUE)
tmp <- subset(HMP, rel_int>0)
plot(log10(rel_res)~log10(rel_int), data=tmp, bg=cols[as.factor(tmp$country)], pch=21,
ylab = "", xlab="Normalized intI1 abundance (log10)", cex=2, ylim=c(2.5, 4.5))
par(fig=c(0.45,0.9,0.5,1), new=TRUE)
boxplot(log10(rel_int)~country, data=tmp, horizontal=TRUE, col=cols, axes=F)
axis(2, at=1:3, labels=c("China", "Europe", "US"), las=1)
title("B", adj = 0, line = 0)
par(fig=c(0.8,1,0,0.8),new=TRUE)
boxplot(log10(rel_res)~country, data=HMP, col=cols, axes=F)
axis(1, at=1:3, labels=c("China", "Europe", "US"), las=3)
```
**Figure 1.** Abundance of antibiotic resistance genes, intI1 gene and crAssphage in human fecal metagenomes.
The regression model between ARGs and crAssphage shows no significant correlation.
```{r}
crass_mod <- lm(log10(rel_res)~country+log10(rel_crAss), data=HMP)
summary(crass_mod)
```
However, for ARGs and *intI1* integrase gene the correlation was significant.
```{r}
int_mod <- lm(log10(rel_res)~country+log10(rel_int), data=HMP)
summary(int_mod)
```
## Figure 2 - Industrially polluted sediment is a hotspot for ARG selection
Figure 2 shows the correlation between crAssphage and ARGs in impacted environments. The only environenmt where selection possibly is happening are the heavily polluted Indian sediments having therapeutic concemntrations of antibioitcs.
```{r}
ggplot(crass_impact, aes(x=rel_crAss, y=rel_res, color=country)) +
geom_smooth(method="lm") +
geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() +
scale_y_log10() +
theme_classic() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
color="Study", shape="crAssphage detection") + scale_colour_manual(values=cols)
```
**Figure 2.** Correlation between ARG abundance and crAssphage abundance in environments with
642 pollution from WWTPs, hospitals or drug manufacturing.
The regression model for ARGs and crAssphage with different intercepts for different studies.
```{r}
impact_mod <- lm(log10(rel_res)~country+log10(rel_crAss), data=crass_impact)
summary(impact_mod)
```
## Figure 3 - ARG abundance is largely explained by fecal matter, not selection
In figure 3 the link between fecal pollution and ARG abundance was studied in MG-RAST metagenomes. Only part of the metagenomes were from human impacted environemnts, so all samples where crAssphage was not detected can be removed. Also samples that were the only representative from the environemnt where they came where removed.
The annotations in the resulting samples were manually curated ands revised.
```{r}
MG_RAST_crass <- subset(MG_RAST, crAss!="NA")
MG_RAST_crass <- MG_RAST_crass[MG_RAST_crass$feature %in%
levels(MG_RAST_crass$feature)[table(MG_RAST_crass$feature)>2],]
MG_RAST_crass$revised_source <- "WWTP"
MG_RAST_crass[MG_RAST_crass$project_id=="mgp9679",]$revised_source <- "High ammonia AS"
MG_RAST_crass[MG_RAST_crass$project_id=="mgp9798",]$revised_source <- "High ammonia AS"
MG_RAST_crass[MG_RAST_crass$project_id=="mgp6153",]$revised_source <- "Mouse gut"
MG_RAST_crass[MG_RAST_crass$project_id=="mgp6698",]$revised_source <- "Mouse gut"
MG_RAST_crass[MG_RAST_crass$project_id=="mgp3907",]$revised_source <- "Mouse gut"
MG_RAST_crass[MG_RAST_crass$project_id=="mgp3190",]$revised_source <- "River water"
MG_RAST_crass[MG_RAST_crass$project_id=="mgp3756",]$revised_source <- "Beijing air"
ggplot(MG_RAST_crass, aes(x=rel_crAss, y=rel_res, color=revised_source)) +
geom_point(size=5) +
scale_x_log10() +
scale_y_log10() +
geom_smooth(method="lm") +
theme_classic() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
color = "Revised source") + scale_colour_manual(values=cols)
```
**Figure 3.** The correlation between crAssphage abundance and total ARG abundance in MG-RAST
651 metagenomes where crAssphage was detected.
## Predicting antibiotic resistance gene abundance with crAssphage
The results from the impacted environemnts were used to build a regression model and that model was used to predict the ARG abundance using the crAssphage abundance. For the figure, see under [Supplementary figures.](#supplementary-figures)
```{r}
crass_df <- data.frame(crass=log10(crass_impact$rel_crAss), res=log10(crass_impact$rel_res))
crass_mod <- lm(res~crass, data=crass_df)
pos_crass <- subset(MG_RAST_crass, revised_source=="River water" |
revised_source == "Beijing air" | revised_source == "WWTP" )
pos_crass <- data.frame(crass=log10(pos_crass$rel_crAss), res=log10(pos_crass$rel_res),
sample=row.names(pos_crass), revised_source=pos_crass$revised_source)
pos_crass$predicted <- predict(crass_mod, pos_crass)
pred_mod <- (lm(res~predicted, data=pos_crass))
summary(pred_mod)
```
## Figure 4 - Antibiotic resistance gene dynamics in waste water treatment plants
```{r}
ggplot(crass_wwtp, aes(rel_crAss, rel_res, color=country_wwtp)) +
geom_smooth(method="lm") +
geom_point(size=5) +
scale_x_log10() +
scale_y_log10() +
theme_classic() +
scale_colour_manual(values=cols) +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
color="Country:WWTP")
```
**Figure 4.** ARG and crAssphage abundance in two US and three Swedish waste water treatment
658 plants showing similar correlation with different base level of resistance.
The correlation was significant and similar in both countries. Only the intercept differed.
```{r}
wwtp_mod <- lm(log10(rel_res)~country+log10(rel_crAss), data=crass_wwtp)
summary(wwtp_mod)
```
## Estimated resistance risk correlates with fecal pollution
The resistance risk values were taken from the original publication (see main article) and ARG and crAssphage abundances were measured in this publication as described earlier.
The results show that due to the one outlier (hospital effluent) with higher ARG abundance than would be estimated from the crAssphage abundance the regression is not significant.
The regression model between the ARG abundance and resistance risk is significant revealing the main driver behind the resistasnce risk calculations. With the resistance risk approach the hospital effluent, a possible hotspot for ARGs, would not have been spotted. For the figures, see under [Supplementary figures.](#supplementary-figures)
```{r}
risk_mod_crAss <- lm(ResRisk~log10(rel_crAss), data=res_risk)
summary(risk_mod_crAss)
risk_mod_res <- lm(ResRisk~log10(rel_res), data=res_risk)
summary(risk_mod_res)
```
## Supplementary figures
The supplementary figures are not described in detail. Only the data and codes are provided below.
A function for shared legend in multi panel plots copied from [here](https://github.com/tidyverse/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs).
```{r}
grid_arrange_shared_legend <- function(..., ncol = length(list(...)), nrow = 1,
position = c("bottom", "right")) {
require(gridExtra)
require(grid)
plots <- list(...)
position <- match.arg(position)
g <- ggplotGrob(plots[[1]] + theme(legend.position = position))$grobs
legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]]
lheight <- sum(legend$height)
lwidth <- sum(legend$width)
gl <- lapply(plots, function(x) x + theme(legend.position="none"))
gl <- c(gl, ncol = ncol, nrow = nrow)
combined <- switch(position,
"bottom" = arrangeGrob(do.call(arrangeGrob, gl),
legend,
ncol = 1,
heights = unit.c(unit(1, "npc") - lheight, lheight)),
"right" = arrangeGrob(do.call(arrangeGrob, gl),
legend,
ncol = 2,
widths = unit.c(unit(1, "npc") - lwidth, lwidth)))
grid.newpage()
grid.draw(combined)
# return gtable invisibly
invisible(combined)
}
```
### Supplementary Figure 1
The ARG categories are based on the ResFinder annotations.
```{r, fig.width=12, fig.height=10}
crass_categ <- read.table("data/crAss_categ.txt")
# Tetracycline
p1 <- ggplot(crass_categ, aes(rel_crAss, rel_tet, color=country)) +
geom_smooth(method="lm") +
geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="Tetracycline", shape="crAssphage detection") +
theme_classic()
# Aminoglycoside
p2 <- ggplot(crass_categ, aes(rel_crAss, rel_amino, color=country)) +
geom_smooth(method="lm") +
geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="Aminoglycoside", shape="crAssphage detection") +
theme_classic()
# MLSB
p3 <- ggplot(crass_categ, aes(rel_crAss, rel_mls, color=country)) +
geom_smooth(method="lm") + geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="MLSB", shape="crAssphage detection") +
theme_classic()
# Beta_lactam
p4 <- ggplot(crass_categ, aes(rel_crAss, rel_beta, color=country)) +
geom_smooth(method="lm") + geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="Beta-lactam", shape="crAssphage detection") +
theme_classic()
# Trimethoprim
p5 <- ggplot(crass_categ, aes(rel_crAss, rel_tri, color=country)) +
geom_smooth(method="lm") + geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="Trimethoprim", shape="crAssphage detection") +
theme_classic()
# Sulphonamide
p6 <- ggplot(crass_categ, aes(rel_crAss, rel_sul, color=country)) +
geom_smooth(method="lm") + geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="Sulphonamide", shape="crAssphage detection") +
theme_classic()
# Vancomycin
p7 <- ggplot(crass_categ, aes(rel_crAss, rel_van, color=country)) +
geom_smooth(method="lm") + geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="Vancomycin", shape="crAssphage detection") +
theme_classic()
# Chloramphenicol
p8 <- ggplot(crass_categ, aes(rel_crAss, rel_clo, color=country)) +
geom_smooth(method="lm") + geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="Chloramphenicol", shape="crAssphage detection") +
theme_classic()
# Quinolone
p9 <- ggplot(crass_categ, aes(rel_crAss, rel_qui, color=country)) +
geom_smooth(method="lm") + geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
title="Quinolone", shape="crAssphage detection") +
theme_classic()
grid_arrange_shared_legend(p1, p2, p3, p4, p5, p6, p7, p8, p9, ncol=3, nrow=3)
```
### Supplementary Figure 2
```{r, fig.width=12, fig.height=8, warning=FALSE}
p1 <- ggplot(crass_impact, aes(x=rel_crAss, y=rel_res, color=country)) +
geom_smooth(method="lm") +
geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() +
scale_y_log10() +
theme_classic() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
color="Study", shape="crAssphage detection")
p2 <- ggplot(crass_impact, aes(x=rel_crAss, y=rel_rich, color=country)) +
geom_smooth(method="lm") +
geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() +
scale_y_log10() +
theme_classic() +
labs(y = "Normalized ARG richness", x="Normalized crAssphage abundance",
color="Study", shape="crAssphage detection")
p3 <- ggplot(crass_impact, aes(x=rel_crAss, y=rel_mge, color=country)) +
geom_smooth(method="lm") +
geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() +
scale_y_log10() +
theme_classic() +
labs(y = "Normalized MGE abundance", x="Normalized crAssphage abundance",
color="Study", shape="crAssphage detection")
p4 <- ggplot(crass_impact, aes(x=rel_crAss, y=rel_int, color=country)) +
geom_smooth(method="lm") +
geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() +
scale_y_log10() +
theme_classic() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
color="Study", shape="crAssphage detection")
grid_arrange_shared_legend(p1, p2, p3, p4, ncol=2, nrow=2)
```
### Supplementary Figure 3
The most abundant ARGs were removed due to possible bias introduced by the whole genome amplification used in the study.
Gene | ResFinder accessions
----- | --------------------
*sul2* |sul2_1_AF542061, sul2_2_GQ421466, sul2_12_AF497970
*strA* | strA_1_M96392,
*strB* | strB_1_M96392,
*qnrD* | QnrD_1_FJ228229,
*msr(E)* | msr(E)_4_EU294228
*qnrVC* | QnrVC4_1_GQ891757
```{r}
crass_reduced <- read.table("data/crass_reduced.txt")
ggplot(crass_reduced, aes(x=rel_crAss, y=rel_res, color=country)) +
geom_smooth(method="lm") +
geom_point(aes(shape=crAss_detection), size=5) +
scale_x_log10() +
scale_y_log10() +
theme_classic() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance",
color="Study", shape="crAssphage detection")
```
### Supplementary Figure 4
```{r, fig.width=10}
MG_RAST_NocrAss <- MG_RAST[is.na(MG_RAST$crAss),]
MG_RAST_NocrAss <- subset(MG_RAST_NocrAss, feature %in%
names(table(MG_RAST_NocrAss$feature)[table(MG_RAST_NocrAss$feature)>2]))
ggplot(MG_RAST_NocrAss, aes(x=feature, y=rel_res)) +
geom_boxplot() +
theme_classic() +
theme(axis.text.x=element_text(angle=45, hjust=1, size=10)) +
labs(y = "Normalized ARG abundance", x="")
```
### Supplementary Figure 5
```{r}
ggplot(pos_crass, aes(x=predicted, y=res, color=pos_crass$revised_source)) +
geom_point(size=5) + theme_classic() +
labs(x="Predicted ARG abundance", y="Measured ARG abundance", color="Revised source")
```
### Supplementary Figure 6
More detailed annotations for the WWTPs taken from the original publications. Note the slightly different samples annotations between US and SWE due to differences in the processes. Also the SWE WWTP sample annotations have been modified.
Original annotation | Modified annotation
-------------------- | ------------------
Primary/Surplus | Primary sludge
Primary | Primary sludge
Surplus | Sludge
Digested | Digested sludge
Inlet - ... | Raw Sewage
Treated | Treated sewage
Sand-filtered | Sand-filtered treated sewage
```{r, fig.width=12}
us_wwtp <- read.table("data/us_wwtp.txt")
swe_wwtp <- read.table("data/swe_wwtp.txt")
p1 <- ggplot(us_wwtp, aes(rel_crAss, rel_res, shape=geo_loc_name)) +
geom_smooth(method="lm") +
geom_point(aes(color=Sample_loc), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance") +
theme_classic() +
guides(shape=guide_legend(title="WWTP"), color=guide_legend(title="Sample"))
p2 <- ggplot(swe_wwtp, aes(rel_crAss, rel_res, shape=WWTP)) +
geom_smooth(method="lm") +
geom_point(aes(color=Sample), size=5) +
scale_x_log10() + scale_y_log10() +
labs(y = "Normalized ARG abundance", x="Normalized crAssphage abundance") +
theme_classic() +
guides(shape=guide_legend(title="WWTP"), color=guide_legend(title="Sample"))
grid.arrange(p1, p2, ncol=2)
```
### Supplementary Figure 7
```{r, fig.width=8, warning=FALSE}
p1 <- ggplot(res_risk,aes(y=ResRisk,x=log10(rel_res),color=Environment)) +
geom_point(size=5) +
labs(y = "Resistance risk", x="Normalized ARG abundance") +
theme_classic()
p2 <- ggplot(res_risk,aes(y=ResRisk,x=log10(rel_crAss),color=Environment)) +
geom_point(size=5) +
labs(y = "Resistance risk", x="Normalized crAssphage abundance") +
theme_classic()
grid_arrange_shared_legend(p1,p2, ncol=2)
```
# References