Skip to content

Commit

Permalink
Deduplication notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
willbradshaw committed Nov 8, 2023
1 parent 2fcf383 commit 67ac7a5
Show file tree
Hide file tree
Showing 12 changed files with 1,263 additions and 7 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
sample,read_id_full,read_id_short,assignment_old,assignment_new,blast_hit_nt_viral,blast_hit_nt,blast_best_match
D23-13405-1,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:1:10102:0403:3142:TCTAGGCT+NNNNNNNN",1:10102:0403:3142,Human mastadenovirus B (taxid 108098),Unclassified (hits to taxid 108098),FALSE,FALSE,None
D23-13405-1,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:1:10603:0328:0567:TCTAGGCT+NNNNNNNN",1:10603:0328:0567,Macacine alphaherpesvirus 1 (taxid 10325),Unclassified (hits to taxid 10325),FALSE,FALSE,Bacterial (Stenotrophomonas)
D23-13405-1,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:1:21304:0821:2638:TCTAGGCT+NNNNNNNN",1:21304:0821:2638,Yaba monkey tumor virus (taxid 38804),Unclassified (hits to taxid 38804),FALSE,FALSE,Bacterial (Flavobacterium) / Viral (Gordonia phage Nedarya)
D23-13405-1,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:1:10204:5122:0531:TCTAGGCT+NNNNNNNN",1:10204:5122:0531,Avian paramyxovirus goose/Shimane/67/2000 (taxid 1401445),Unclassified (no viral hits),FALSE,FALSE,Bacterial (Acidovorax temperans strain LMJ)
D23-13405-1,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:1:20901:3396:1256:TCTAGGCT+NNNNNNNN",1:20901:3396:1256,Hepatitis C virus genotype 2 (taxid 40271),Unclassified (no viral hits),FALSE,FALSE,Bacterial (Flavobacterium)
D23-13405-1,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:1:11503:3203:1707:TCTAGGCT+NNNNNNNN",1:11503:3203:1707,Molluscum contagiosum virus subtype 1 (taxid 10280),Unclassified (read 2 failed quality filtering),FALSE,FALSE,Bacterial (Microbacterium)
D23-13406-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:20202:3004:2385:CCGCTATT+NNNNNNNN",2:20202:3004:2385,Ekpoma virus 1 (taxid 1987020),Unclassified (hits to taxid 1987020),FALSE,FALSE,Bacterial (Cloacibacterium)
D23-13406-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:10304:4444:3122:CCGCTATT+NNNNNNNN",2:10304:4444:3122,Ekpoma virus 1 (taxid 1987020),Unclassified (hits to taxid 1987020),FALSE,FALSE,Bacterial (Cloacibacterium)
D23-13406-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:11403:3457:2144:CCGCTATT+NNNNNNNN",2:11403:3457:2144,Human metapneumovirus (taxid 162145),Unclassified (hits to taxid 162145),FALSE,FALSE,Eukaryotic (Ptychoptera)
D23-13406-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:21902:3424:3184:CCGCTATT+NNNNNNNN",2:21902:3424:3184,Human betaherpesvirus 6B (taxid 32604),Unclassified (hits to taxid 32604),FALSE,FALSE,Bacterial (Quatrionicoccus)
D23-13406-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:11903:2428:0754:CCGCTATT+NNNNNNNN",2:11903:2428:0754,Human alphaherpesvirus 1 (taxid 10298),Unclassified (hits to taxid 10298),FALSE,FALSE,"Bacterial ( Verrucomicrobia)"
D23-13406-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:21904:3435:0179:CCGCTATT+NNNNNNNN",2:21904:3435:0179,Ekpoma virus 1 (taxid 1987020),Unclassified (hits to taxid 1987020),FALSE,FALSE,Bacterial (Cloacibacterium)
D23-13406-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:20702:5106:2394:CCGCTATT+NNNNNNNN",2:20702:5106:2394,Human papillomavirus KC5 (taxid 1647924),Unclassified (no viral hits),FALSE,FALSE,Bacterial (Pulveribacter) / Eukaryotic (Darwinula)
D23-13406-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:21102:4135:0742:CCGCTAAT+NNNNNNNN",2:21102:4135:0742,Influenza A virus (A/New York/392/2004(H3N2)) (taxid 335341),Unclassified (excluded during ribodepletion),TRUE,TRUE,Bacterial (Xiphinematobacter) / Viral (Influenza A)
D23-13405-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:10502:3202:1468:TCTAGGCT+NNNNNNNN",2:10502:3202:1468,Human mastadenovirus E (taxid 130308),Unclassified (hits to taxid 130308),FALSE,FALSE,Viral (Caudoviricetes phage)
D23-13405-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:12203:5006:1380:TCTAGGCT+NNNNNNNN",2:12203:5006:1380,Senegalvirus marseillevirus (taxid 944645),Unclassified (hits to taxid 944645),FALSE,FALSE,Eukaryotic (Vigna unguiculata)
D23-13405-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:12102:3184:2302:TCTAGGCT+NNNNNNNN",2:12102:3184:2302,Human herpesvirus 4 type 2 (taxid 12509),Unclassified (hits to taxid 12509),FALSE,FALSE,None
D23-13405-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:21902:2639:1967:TCTAGGCT+NNNNNNNN",2:21902:2639:1967,Sandfly fever Turkey virus (taxid 688699),Unclassified (hits to taxid 688699),TRUE,TRUE,Viral (Sandfly Sicilian Turkey virus)
D23-13405-2,"AV224802:231013_72603_6408E:2320572603,810-00002,2307310022,20240730:2:21603:4385:2524:TCTAGGCT+NNNNNNNN",2:21603:4385:2524,Orf virus (taxid 10258),Unclassified (read 2 failed quality filtering),FALSE,FALSE,Bacterial (Deinococcus)
97 changes: 97 additions & 0 deletions data/2023-11-06_pr-dedup/n_dup.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
sample,replicate,read_pairs_in,n_reads_out,read_pairs_out,n_dup,p_dup,unpair_repair
D23-13404,1,10000,19994,9997,3,0.0003,TRUE
D23-13404,2,10000,19992,9996,4,0.0004,TRUE
D23-13404,3,10000,19996,9998,2,0.0002,TRUE
D23-13405,1,10000,20000,10000,0,0,TRUE
D23-13405,2,10000,19998,9999,1,0.0001,TRUE
D23-13405,3,10000,19990,9995,5,0.0005,TRUE
D23-13406,1,10000,19996,9998,2,0.0002,TRUE
D23-13406,2,10000,19998,9999,1,0.0001,TRUE
D23-13406,3,10000,20000,10000,0,0,TRUE
D23-13404,1,100000,199486,99743,257,0.00257,TRUE
D23-13404,2,100000,199500,99750,250,0.0025,TRUE
D23-13404,3,100000,199598,99799,201,0.00201,TRUE
D23-13405,1,100000,199512,99756,244,0.00244,TRUE
D23-13405,2,100000,199568,99784,216,0.00216,TRUE
D23-13405,3,100000,199524,99762,238,0.00238,TRUE
D23-13406,1,100000,199840,99920,80,0.0008,TRUE
D23-13406,2,100000,199806,99903,97,0.00097,TRUE
D23-13406,3,100000,199808,99904,96,0.00096,TRUE
D23-13404,1,1000000,1970336,985168,14832,0.014832,TRUE
D23-13404,2,1000000,1970432,985216,14784,0.014784,TRUE
D23-13404,3,1000000,1969858,984929,15071,0.015071,TRUE
D23-13405,1,1000000,1970370,985185,14815,0.014815,TRUE
D23-13405,2,1000000,1969994,984997,15003,0.015003,TRUE
D23-13405,3,1000000,1970210,985105,14895,0.014895,TRUE
D23-13406,1,1000000,1988396,994198,5802,0.005802,TRUE
D23-13406,2,1000000,1988482,994241,5759,0.005759,TRUE
D23-13406,3,1000000,1987738,993869,6131,0.006131,TRUE
D23-13404,1,10000000,17731492,8865746,1134254,0.1134254,TRUE
D23-13404,2,10000000,17729818,8864909,1135091,0.1135091,TRUE
D23-13404,3,10000000,17732345,8866172.5,1133827.5,0.11338275,TRUE
D23-13405,1,10000000,17728293,8864146.5,1135853.5,0.11358535,TRUE
D23-13405,2,10000000,17726169,8863084.5,1136915.5,0.11369155,TRUE
D23-13405,3,10000000,17726870,8863435,1136565,0.1136565,TRUE
D23-13406,1,10000000,18907178,9453589,546411,0.0546411,TRUE
D23-13406,2,10000000,18908203,9454101.5,545898.5,0.05458985,TRUE
D23-13406,3,10000000,18909204,9454602,545398,0.0545398,TRUE
D23-13404,1,100000000,140097589,70048794.5,29951205.5,0.299512055,TRUE
D23-13404,2,100000000,140098194,70049097,29950903,0.29950903,TRUE
D23-13404,3,100000000,140115115,70057557.5,29942442.5,0.299424425,TRUE
D23-13405,1,100000000,139910932,69955466,30044534,0.30044534,TRUE
D23-13405,2,100000000,139927287,69963643.5,30036356.5,0.300363565,TRUE
D23-13405,3,100000000,139919577,69959788.5,30040211.5,0.300402115,TRUE
D23-13406,1,100000000,169354120,84677060,15322940,0.1532294,TRUE
D23-13406,2,100000000,169362691,84681345.5,15318654.5,0.153186545,TRUE
D23-13406,3,100000000,169362791,84681395.5,15318604.5,0.153186045,TRUE
D23-13404,1,229823475,279881653,139940826.5,89882648.5,0.391094289,TRUE
D23-13405,1,196470009,246083898,123041949,73428060,0.373736737,TRUE
D23-13406,1,118938720,198682176,99341088,19597632,0.164770833,TRUE
D23-13404,1,10000,19996,9998,2,0.0002,FALSE
D23-13404,2,10000,19994,9997,3,0.0003,FALSE
D23-13404,3,10000,20000,10000,0,0,FALSE
D23-13405,1,10000,20000,10000,0,0,FALSE
D23-13405,2,10000,20000,10000,0,0,FALSE
D23-13405,3,10000,19994,9997,3,0.0003,FALSE
D23-13406,1,10000,20000,10000,0,0,FALSE
D23-13406,2,10000,19998,9999,1,0.0001,FALSE
D23-13406,3,10000,20000,10000,0,0,FALSE
D23-13404,1,100000,199716,99858,142,0.00142,FALSE
D23-13404,2,100000,199748,99874,126,0.00126,FALSE
D23-13404,3,100000,199792,99896,104,0.00104,FALSE
D23-13405,1,100000,199736,99868,132,0.00132,FALSE
D23-13405,2,100000,199764,99882,118,0.00118,FALSE
D23-13405,3,100000,199786,99893,107,0.00107,FALSE
D23-13406,1,100000,199910,99955,45,0.00045,FALSE
D23-13406,2,100000,199896,99948,52,0.00052,FALSE
D23-13406,3,100000,199908,99954,46,0.00046,FALSE
D23-13404,1,1000000,1982716,991358,8642,0.008642,FALSE
D23-13404,2,1000000,1982784,991392,8608,0.008608,FALSE
D23-13404,3,1000000,1982104,991052,8948,0.008948,FALSE
D23-13405,1,1000000,1982832,991416,8584,0.008584,FALSE
D23-13405,2,1000000,1982594,991297,8703,0.008703,FALSE
D23-13405,3,1000000,1982574,991287,8713,0.008713,FALSE
D23-13406,1,1000000,1993000,996500,3500,0.0035,FALSE
D23-13406,2,1000000,1993062,996531,3469,0.003469,FALSE
D23-13406,3,1000000,1992524,996262,3738,0.003738,FALSE
D23-13404,1,10000000,19050540,9525270,474730,0.047473,FALSE
D23-13404,2,10000000,19051530,9525765,474235,0.0474235,FALSE
D23-13404,3,10000000,19053706,9526853,473147,0.0473147,FALSE
D23-13405,1,10000000,19044532,9522266,477734,0.0477734,FALSE
D23-13405,2,10000000,19045302,9522651,477349,0.0477349,FALSE
D23-13405,3,10000000,19045442,9522721,477279,0.0477279,FALSE
D23-13406,1,10000000,19613738,9806869,193131,0.0193131,FALSE
D23-13406,2,10000000,19614816,9807408,192592,0.0192592,FALSE
D23-13406,3,10000000,19615128,9807564,192436,0.0192436,FALSE
D23-13404,1,100000000,166457946,83228973,16771027,0.16771027,FALSE
D23-13404,2,100000000,166457242,83228621,16771379,0.16771379,FALSE
D23-13404,3,100000000,166480106,83240053,16759947,0.16759947,FALSE
D23-13405,1,100000000,166191238,83095619,16904381,0.16904381,FALSE
D23-13405,2,100000000,166206048,83103024,16896976,0.16896976,FALSE
D23-13405,3,100000000,166199686,83099843,16900157,0.16900157,FALSE
D23-13406,1,100000000,184420200,92210100,7789900,0.077899,FALSE
D23-13406,2,100000000,184422484,92211242,7788758,0.07788758,FALSE
D23-13406,3,100000000,184425006,92212503,7787497,0.07787497,FALSE
D23-13404,1,229823475,350045876,175022938,54800537,0.238446212,FALSE
D23-13405,1,196470009,304338346,152169173,44300836,0.225483962,FALSE
D23-13406,1,118938720,217466564,108733282,10205438,0.085804169,FALSE
Binary file added data/2023-11-06_pr-dedup/n_dup.xlsx
Binary file not shown.
34 changes: 28 additions & 6 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,29 @@ <h3 class="no-anchor listing-title">
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1698724800000" data-listing-file-modified-sort="1698941598593" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="17">
<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1698897600000" data-listing-file-modified-sort="1699450558652" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="13">
<div class="thumbnail">
<p><a href="./notebooks/2023-11-02_project-runway-dna-deduplication.html"> <p class="card-img-top"><img src="notebooks/2023-11-02_project-runway-dna-deduplication_files/figure-html/unnamed-chunk-2-1.png" class="thumbnail-image card-img"/></p> </a></p>
</div>
<div class="body">
<a href="./notebooks/2023-11-02_project-runway-dna-deduplication.html">
<h3 class="no-anchor listing-title">
Estimating the effect of read depth on duplication rate for Project Runway DNA data
</h3>
<div class="listing-subtitle">
How deep can we go?
</div>
</a>
</div>
<div class="metadata">
<a href="./notebooks/2023-11-02_project-runway-dna-deduplication.html">
<div class="listing-date">
Nov 2, 2023
</div>
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="2" data-listing-date-sort="1698724800000" data-listing-file-modified-sort="1698941598593" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="17">
<div class="thumbnail">
<p><a href="./notebooks/2023-10-31_project-runway-initial.html"> <p class="card-img-top"><img src="notebooks/2023-10-31_project-runway-initial_files/figure-html/unnamed-chunk-3-1.png" class="thumbnail-image card-img"/></p> </a></p>
</div>
Expand All @@ -173,7 +195,7 @@ <h3 class="no-anchor listing-title">
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="2" data-listing-date-sort="1697688000000" data-listing-file-modified-sort="1697766328595" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11">
<div class="quarto-post image-right" data-index="3" data-listing-date-sort="1697688000000" data-listing-file-modified-sort="1697766328595" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11">
<div class="thumbnail">
<p><a href="./notebooks/2023-10-19_deduplication.html"> <p class="card-img-top"><img src="notebooks/2023-10-19_deduplication_files/figure-html/unnamed-chunk-2-1.png" class="thumbnail-image card-img"/></p> </a></p>
</div>
Expand All @@ -195,7 +217,7 @@ <h3 class="no-anchor listing-title">
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="3" data-listing-date-sort="1697428800000" data-listing-file-modified-sort="1697493211896" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="15">
<div class="quarto-post image-right" data-index="4" data-listing-date-sort="1697428800000" data-listing-file-modified-sort="1697493211896" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="15">
<div class="thumbnail">
<p><a href="./notebooks/2023-10-13_rrna-removal.html"> <p class="card-img-top"><img src="notebooks/2023-10-13_rrna-removal_files/figure-html/rrna-overlap-venn-johnson-1.png" class="thumbnail-image card-img"/></p> </a></p>
</div>
Expand All @@ -217,7 +239,7 @@ <h3 class="no-anchor listing-title">
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="4" data-listing-date-sort="1697083200000" data-listing-file-modified-sort="1697319460554" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12">
<div class="quarto-post image-right" data-index="5" data-listing-date-sort="1697083200000" data-listing-file-modified-sort="1697319460554" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12">
<div class="thumbnail">
<p><a href="./notebooks/2023-10-12_fastp-vs-adapterremoval.html"> <p class="card-img-top"><img src="notebooks/2023-10-12_fastp-vs-adapterremoval_files/figure-html/unnamed-chunk-2-1.png" class="thumbnail-image card-img"/></p> </a></p>
</div>
Expand All @@ -239,7 +261,7 @@ <h3 class="no-anchor listing-title">
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="5" data-listing-date-sort="1696996800000" data-listing-file-modified-sort="1697148020355" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10">
<div class="quarto-post image-right" data-index="6" data-listing-date-sort="1696996800000" data-listing-file-modified-sort="1697148020355" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10">
<div class="thumbnail">
<p><a href="./notebooks/2023-10-12_how-does-element-sequencing-work.html"> <p class="card-img-top"><img src="img/2023-10-11_rolling-circle-amplification.png" class="thumbnail-image card-img"/></p> </a></p>
</div>
Expand All @@ -261,7 +283,7 @@ <h3 class="no-anchor listing-title">
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="6" data-listing-date-sort="1695268800000" data-listing-file-modified-sort="1695331351195" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11">
<div class="quarto-post image-right" data-index="7" data-listing-date-sort="1695268800000" data-listing-file-modified-sort="1695331351195" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11">
<div class="thumbnail">
<p><a href="./notebooks/2023-09-12_settled-solids-extraction-test.html"> <p class="card-img-top"><img src="notebooks/2023-09-12_settled-solids-extraction-test_files/figure-html/plot-concentrations-1.png" class="thumbnail-image card-img"/></p> </a></p>
</div>
Expand Down
1 change: 1 addition & 0 deletions docs/listings.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"listing": "/index.html",
"items": [
"/notebooks/2023-11-02_project-runway-comparison.html",
"/notebooks/2023-11-02_project-runway-dna-deduplication.html",
"/notebooks/2023-10-31_project-runway-initial.html",
"/notebooks/2023-10-19_deduplication.html",
"/notebooks/2023-10-13_rrna-removal.html",
Expand Down
Loading

0 comments on commit 67ac7a5

Please sign in to comment.