Updates and addition of lab b2

kth-gt · Aug 12, 2023 · d554267 · d554267
1 parent 0ee8fe6
commit d554267
Show file tree

Hide file tree

Showing 6 changed files with 414 additions and 9 deletions.
diff --git a/lab/b2/MarB.fasta b/lab/b2/MarB.fasta
@@ -0,0 +1,110 @@
+>gb|AF226275.1|:1823-2035 Salmonella enteritidis multiple antibiotic resistance operon, complete sequence
+ATGAAAATGCTGTTTCCCGCCCTGCCGGGTCTGTTACTTATCGCCTCCGGATATGGCATTGCAGAACAAA
+CTTTGTTACCTGTGGCGCAAAATAGCCGCGATGTGATGCTGCTGCCCTGTGTAGGCGATCCGCCAAATGA
+CCTTCACCCCGTGAGCGTGAACAGCGATAAGTCAGATGAATTAGGCGTGCCCTATTATAACGACCAACAC
+CTT
+>gb|CP003047.1|:1361810-1362022 Salmonella enterica subsp. enterica serovar Gallinarum/pullorum str. RKS5078, complete genome
+AAGGTGTTGGTCGTTATAATAGGGCACGCCTAATTCATCTGACTTATCGCTGTTCACGCTCACGGGGTGA
+AGGTCATTTGGCGGATCGCCTACACAGGGCAGCAGCATCACATCGCGGCTATTTTGCGCCACAGGTAACA
+AAGTTTGTTCTGCAATGCCATATCCGGAGGCGATAAGTAACAGACCCGGCAGGGCGGGAAACAGCATTTT
+CAT
+>gb|CP007254.1|:1627993-1628205 Salmonella enterica subsp. enterica serovar Enteritidis str. EC20111095, complete genome
+ATGAAAATGCTGTTTCCCGCCCTGCCGGGTCTGTTACTTATCGCCTCCGGATATGGCATTGCAGAACAAA
+CTTTGTTACCTGTGGCGCAAAATAGCCGCGATGTGATGCTGCTGCCCTGTGTAGGCGATCCGCCAAATGA
+CCTTCACCCCGTGAGCGTGAACAGCGATAAGTCAGATGAATTAGGCGTGCCCTATTATAACGACCAACAC
+CTT
+>gb|CP007376.1|:1632173-1632385 Salmonella enterica subsp. enterica serovar Enteritidis str. EC20120927 genome
+ATGAAAATGCTGTTTCCCGCCCTGCCGGGTCTGTTACTTATCGCCTCCGGATATGGCATTGCAGAACAAA
+CTTTGTTACCTGTGGCGCAAAATAGCCGCGATGTGATGCTGCTGCCCTGTGTAGGCGATCCGCCAAATGA
+CCTTCACCCCGTGAGCGTGAACAGCGATAAGTCAGATGAATTAGGCGTGCCCTATTATAACGACCAACAC
+CTT
+>gb|CP007395.1|:1632241-1632453 Salmonella enterica subsp. enterica serovar Enteritidis str. EC20121748 genome
+ATGAAAATGCTGTTTCCCGCCCTGCCGGGTCTGTTACTTATCGCCTCCGGATATGGCATTGCAGAACAAA
+CTTTGTTACCTGTGGCGCAAAATAGCCGCGATGTGATGCTGCTGCCCTGTGTAGGCGATCCGCCAAATGA
+CCTTCACCCCGTGAGCGTGAACAGCGATAAGTCAGATGAATTAGGCGTGCCCTATTATAACGACCAACAC
+CTT
+>gb|EU900879.1|:1-216 Escherichia coli strain TW14359 multiple antibiotic resistance protein (ECs2139) gene, complete cds
+ATGAAACCGCTTTTATCCGCAATAGCAGCTGCGCTTATTCTCTTTTCCGCGCAGGGCGTTGCGGAACAAA
+CCCAGCAACCGCTCGTTACTTCCTGTGGCGATGTGGTGGTTGTTCCCCCATCGCAGGAACAACCACCGTT
+CGATTTAAATCACATGGGTACAGGCAGTGACAAATCGGATGCGCTGGGCGTGCCCTATTATAACCAACAA
+GCCATG
+>gb|CP001846.1|:1949318-1949533 Escherichia coli O55:H7 str. CB9615, complete genome
+ATGAAACCGCTTTTATCCGCAATAGCAGCTGCGCTTATTCTCTTTTCCGCGCAGGGCGTTGCGGAACAAA
+CCCAGCAACCGCTCGTTACTTCCTGTGGCGATGTGGTGGTTGTTCCCCCATCGCAGGAACAACCACCGTT
+CGATTTAAATCACATGGGTACAGGCAGTGACAAATCGGATGCGCTGGGCGTGCCCTATTATAACCAACAA
+GCCATG
+>gb|CP002212.1|:1738314-1738523 Escherichia coli str. 'clone D i14', complete genome
+ATGAAACCACTTTTATCCGCAATAGCAACTGCGCTTATTCTCTTTTCTGCGCAGGGCGTTGCGGAACAAA
+CCACGCAGCCGGTTGTTACTTCCTGTGGCAATGTCGTGGTTGTTCCCACATCGCAGGAACAACCACCGTT
+TGATTTAAATCACATGGGTACTGGCAGTGATAAGTCGGATGCGCTCGGCGTGCCCTATTATAATCAACAC
+>gi|544340954:1679621-1679839 Escherichia coli PMV-1 main chromosome, complete genome
+ATGAAACCACTTTTATCCGCAATAGCAACTGCGCTTATTCTCTTTTCTGCGCAGGGCGTTGCGGAACAAA
+CCACGCAGCCGGTTGTTACTTCCTGTGGCAATGTCGTGGTTGTTCCCACATCGCAGGAACAACCACCGTT
+TGATTTAAATCACATGGGTACTGGCAGTGATAAGTCGGATGCGCTCGGCGTGCCCTATTATAATCAACAC
+GCTATGTAG
+>gb|CP007557.1|:3666787-3667002 Citrobacter freundii CFNIH1, complete genome
+CAGGTTGCGCGTATTGTAATAAGGCACACCCAATTCGTCGGACTTATCGCTGCCGGCACCCATATGATTA
+AAATCGAACGGGGACTGATCGTGCGCAGGGGGCATAATCATCGCGTCACGGCTGTTCTGGGTGGCTGGTT
+GCGCGGTCTTTTCCGCCATGCTCTGGCCCGAAACAATCAACAGCAGGGCCAGCATCGCATAAGACAGAAT
+TTTCAT
+>gb|CP004887.1|:3422041-3422259 Klebsiella oxytoca HKOPL1, complete genome
+TTACAGGCTTTGCTGGTTGTAATAGGGCACGCCAAGTTCATCTGATTTATCACTGCCAGAGGCCATGTGA
+TTGAAATCAAAAGGCGAATCATTATGTTCGGAAGGAATAATCATCGCATCCCGGTTGTTGTGGCGAACCG
+GGGTGCTGGTTTGCTCCGCATAACTTTGGCTCGAGACCAGCGCCAATAGCACGATGGCGGCGGAAGCGAA
+TAGTTTCAT
+>gb|CP003683.1|:3309185-3309403 Klebsiella oxytoca E718, complete genome
+ATGAAACTATTCGCTTCCGCCGCCATCGTGCTATTGGCGCTGGTCTCGAGCCAAAGTTATGCGGAGCAAA
+CCAGCACCCCGGTTCGCCACAACAACCGGGATGCGATGATTATTCCTTCCGAACATAATGATTCGCCTTT
+TGATTTCAATCACATGGCCTCTGGCAGTGATAAATCAGATGAACTTGGCGTGCCCTATTACAACCAGCAA
+AGCCTGTAA
+>gb|KJ694144.1|:401-604 Uncultured bacterium clone S10_CH_30 genomic sequence
+ATGAAACTATTCGCTTCCGCCGCCCTCACCGCCCTGGTGCTGGTCTCCGGCCAGAGTTTTGCGGAGCAAA
+CTCCACGTGTTCCGCAGCAGAACAACCGCGACACGATGATCCTGCCAACGGCTAACGGCCAGTCGCCCCA
+TGACTTTAACCATATGGGCGCAGGCAGCGACAAATCCGACGAGTTAGGCGTCCCTTATTACAAT
+>gb|CP000036.1|:1635481-1635699 Shigella boydii Sb227, complete genome
+ATGAAACCACTTTCATCCGCAATAGCAGCTGCGCTTATTCTCTTTTCCGCGCAGGGCGTTGCGGAACAAA
+CCACGCAGCCAGTTGTTACTTCTTGTGCCAATGTCGTGGTTGTTCCCCCATCGCAGGAACAACCACCGTT
+TGATTTAAATCACATGGGTACTGGCAGTGATAAGTCGGATGCGCTCGGCGTGCCCTATTATAATCAACAC
+GCTATGTAG
+>gb|CP001383.1|:1650797-1651015 Shigella flexneri 2002017, complete genome
+CTACATAGCGTGTTGATTATAATAGGGCACGCCGAGCGCATCCGACTTATCACTGCCAGTACCCATGTGA
+TTTAAATCAAACGGTGGTTGTTCCTGCGATGGGGGAACAACCACGACATTGGCACAAGAAGTAACAACTG
+GCTGCGTGGTTTGTTCCGCAACGCTCTGCGCGGAAAAGAGAATAAGCGCAGCTGCTATTGCGGATGAAAG
+TGGTTTCAT
+>gb|CP000647.1|:1804847-1805065 Klebsiella pneumoniae subsp. pneumoniae MGH 78578, complete sequence
+TCACAGGTCGTGCTGCTGGTAGTACGGCACGCCAAGTTCGTCAGATTTATCATTACCAGCCGCCATATGA
+TTGAAATCAAACGGCGAATCATTATGTTCGGAAGGGATAATCATTGTATCACGCTGGTTTTGACGCACCG
+GCGTGGTGTTTTGCTCCGCATAGCTGAGGCTGGAGGCCAGCGACAAGAGTACGATAGCTGCGGCAGCGAA
+TAGTTTCAT
+>gb|CP007727.1|:2602333-2602551 Klebsiella pneumoniae subsp. pneumoniae KPNIH10, complete genome
+TCACAGGTCGTGCTGCTGGTAGTACGGCACGCCAAGTTCGTCAGATTTATCATTACCAGCCGCCATATGA
+TTGAAATCAAACGGCGAATCATTATGTTCGGAAGGGATAATCATTGTATCACGCTGGTTTTGACGCACCG
+GCGTGGTGTTTTGCTCCGCATAGCTGAGGCTGGAGGCCAGCGACAAGAGTACGATAGCTGCGGCAGCGAA
+TAGTTTCAT
+>gb|CP000964.1|:2815073-2815291 Klebsiella pneumoniae 342, complete genome
+ATGAAACTATTCGCTGCCGCAGCTATCGTACTCCTGTCGCTGGTCTCCAGCCTCAGCTATGCGGAGCAAA
+ACACCACGCTGGTGCGTCAAAACCAGCGTGATACAATGATTATCCCTTCGGAACATAACGATTCGCCATT
+TGATTTCAATCATATGGCGGCTGGCAGTGATAAATCCGACGAACTGGGCGTGCCGTACTACCAGCAGCAC
+GACCTGTGA
+>gb|CP003312.1|:2087372-2087584 Cronobacter sakazakii ES15, complete genome
+TTAACGCGACTGGTTGTAGTACGGCACGCCCAGTTCGTCTGATTTATCGCTGCCTGCGCTGCGATGGCTG
+AGATCCAGCGATTCATGATGCTCAATCGGCACCATCATTGTGCTGGTGTCCCCGGCGCACGCGTCAGTTT
+TGGGGCTGCCTGCCAGCGCGTAGCCGGAGGTCAACGCCAGCAACAGCGCAGCGGCGCACGTGACGGATTT
+CAT
+>gi|323575285:2280852-2281064 Cronobacter turicensis z3032 complete genome
+ATGAAATCCGTAACGTGCGCCGCAGCGCTGTTGCTGGCGCTGACCTCCGGCTATGCGCTGGCAGGCAGCC
+CCAAAACCGACGCGTGCGCCGGGGACCAGAGCACGATGATGGTGCCGATTGAGCATCACGAGTCGCTGGA
+TCTCAGCCATCGCAGCGCGGGCAGCGATAAATCAGACGAGCTGGGTGTGCCGTACTACAACCAGTCGCGT
+TAA
+>gb|CP001918.1|:2045053-2045271 Enterobacter cloacae subsp. cloacae ATCC 13047, complete genome
+ATGAACGTTACCGCCTCCGCCGCCCTCGCCTTGCTGGTGCTCTTTTCCAGCCAGACCTTCGCGGAGCAAC
+CCCCTCGTGCAACGCAGCAAAATAATCATGACACGATGATTTTGCCGTCAGCACATAGCCAGTCCCCTTA
+TGATTTCAACCACATGGGGTCTGGTAGCGACAAATCCGACGAATTAGGCGTGCCTTATTATAATCAGCAC
+GGCTTCTGA
+>gi|629665248:741-914 Eutypa lata UCREL1 putative amp-binding enzyme protein mRNA
+GAAGTACGGCGCGCCCAACGGATCGCTCGTGATCGACAACTGGTGGTCCTCGGAGGCCGGGTCGCCCATC
+TCGGGTATCAGCCTGCTGCCACATACTACGAGCGATAGGAAGGCAGGGGTCAAGGACTACCACCCAATGC
+CGCTGATCAAGCCGGGGAGCGCGGGGAAGCCCAT
+>gb|CP005287.1|:259392-259520 Propionibacterium avidum 44067, complete genome
+TACGCCAAGATCGTCCTTCCGGTGTCGATTCCCGGCTTCGTCGTGACCCTCATCTGGCAGTTCACCAGCG
+CATGGAATGACTTCCTCTTCGCGCTGTTCCTGACGAACCAGAACAATGGTCCGGTCACC
diff --git a/lab/b2/readme.md b/lab/b2/readme.md
@@ -0,0 +1,124 @@
+# LAB B2: rRNA finding, taxonomic classification and multiple sequence alignment
+
+## Preparation questions
+
+1. What is rRNA? Which organisms have this?
+2. How does Blast work?
+3. What is “bootstrapping”? How does this procedure estimate the statistical support to the results of an analysis?
+4. What is “worse” in a sequence alignment? Substitutions, small gaps or big gaps?
+5. How does multiple sequence alignment work? Can you explain the “once a gap, always a gap” rule?
+
+## Instructions and questions
+
+Welcome to the second computer exercise in bioinformatics! In the previous bioinformatics lab, we learned about the epidemic of unknown bacteria. We have found genes in their recently sequenced genomes, and we saw how to use Blast to compare genes to databases in different ways. Today we will start off again by finding genes, but only of a particular kind: ribosomal RNA (rRNA) genes. Ribosomal rRNA is often used for identifying unknown bacteria at a species-level. Go to the [BioToolsBooklet](../biotoolsbooklet.md) and find a tool for identifying rRNA genes. Run your bacterial genome through it.
+
+#### Q1
+
+How many rRNA genes did you find? What subunit of the rRNA are they (last column)? How many of each are there?
+
+Download the FASTA results and make a file with just the 16S rRNA gene. This is the subunit that is most commonly used for classification. You will use an online tool for classifying rRNA. Find a tool for doing this in the booklet or online and run it.
+
+#### Q2
+
+Is the classification entirely consistent (do all the matches agree with each other)? Does it have strong bootstrap support?
+
+Look up information on this genus online. Wikipedia might be enough, but look up other websites if you're not convinced.
+
+#### Q3
+
+Does it make sense that these patients are so sick? Justify your answers and include your sources.
+
+Now that you know a bit more about your bacteria, we can use this information to compare this new isolate to previously sequenced ones. This might give interesting clues on what is going on. Find in Canvas a collection of reference genomes for your bacterium of interest and download it.
+
+#### Q4
+
+Is it likely that very closely related bacterial species will have entire genes added or missing? Why/why not?
+*Hint: consider mechanisms of horizontal gene transfer.*
+
+Let's try to see what distinguishes your new genome from others in the species. One way of finding this (albeit not necessarily the most efficient) is through Blast.
+
+#### Q5
+
+In which format is the reference genome, nucleotide or protein? What about the file containing the genes you found in the previous bioinformatics lab? In that case, which type of Blast is recommended?
+
+Blast the genes that you found in the previous bioinformatics lab against the reference genome of the bacteria that you chose (reference genomes provided in this lab). Remember to choose a suitable E-value for your search.
+
+#### Q6
+
+Which genes do NOT find a good match from this database?
+*Hint: open the Blast result in a text editor and use ctrl+F to find empty headers (headers without any matches).* It’s recommended to use a “tabular” output format. See blast help section for how to choose format. Depending on the version of Blast, you might get the output in the form of:
+
+```verbatim
+# BLASTN 2.9.0+
+# Query: xxxxx ID= xxxxx
+# Database: xxxxxx
+# 0 hits found 
+```
+
+Find these by searching “# 0” and take note of the query. 
+
+Retrieve only the non-matching genes and make a separate fasta file with them. Submit these to online Blast.
+
+#### Q7
+
+Which Blast program did you select? Did you change any parameters from the default values? Which and why?
+
+#### Q8
+
+What are the main Blast hits found? Do they explain the toxicity of this new strain of bacteria?
+
+#### Q9
+
+From which organism do these genes seem to come from? Does this make sense? Why/why not?
+
+A class of genes that has very important clinical implications is antibiotic-resistance genes. Common mechanisms of antibiotic resistance are pumps that keep the drugs outside the bacterial cell or enzymes that break down the antibiotic, but bacteria can also mutate in a way that makes their own proteins immune to the antibiotics. 
+
+You’ve asked for help from the experts at the sequencing center to characterize the antibiotic-resistance genes in your unknown bacteria. They’ve informed you that it is a multi-antibiotic-resistance operon known as mar. It’s mode of action is still unknown, but some things are already understood. The operon contains 4 protein-coding genes, marA, marB, marC and marR. We’re going to work more with these protein products in later labs. For now, let’s take a closer look at the marB genes.
+
+Download from Canvas a file called [marB.fasta](marB.fasta), which contains marB-related proteins from several different bacteria.
+
+We’ll use online Blast again for pairwise sequence comparison. It would take too long to compare each pair of sequences, so we’ll focus on 3 pairs. For each pair you’ll have to choose between blastn, megablast and tblastx as the best tool for aligning them (by best, understand “the tool that gives the most information”). Justify all your answers with your own words as well as the dot-matrix and other pictures from the blast output that you find relevant.
+
+Compare the first sequence, `gb|AF226275.1|:1823-2035`, with the second, `gb|CP003047.1|:1361810-1362022`.
+
+#### Q10
+
+Which Blast tool(s) did you pick? Using this, how closely related are these two proteins?
+
+Now compare the first sequence with `gb|CP004887.1|:3422041-3422259`.
+
+#### Q11
+
+Which Blast tool(s) did you pick? Using this, how closely related are these two proteins? How does this answer change comparing different tools?
+
+Finally, compare the first sequence with `gi|629665248:741-914`.
+
+#### Q12
+
+Which Blast tool(s) did you pick? Using this, how closely related are these two proteins? 
+
+As you’ve noticed, doing pairwise sequence comparison is quite slow. Fortunately, there are tools for comparing several proteins at the same time. Run this file through a multiple sequence alignment tool. Choose ClustalW format for the multiple alignment, it will make the next steps a little easier.
+
+#### Q13
+
+Which multiple sequence alignment tool did you choose? Do the results confirm that the sequences in this file are all related?  Justify your answer with your own words as well as copying parts of the alignment.
+
+Keep this result output open and run one more multiple-alignment tool. Choose the same format as before.
+
+#### Q14
+
+Which tool did you use now? Do you see any differences in the result comparing the two? Which tool seems better? Justify your answer with your own words as well as copying parts of the alignments.
+
+Now select a relatively conserved part of the multiple alignment and use an online tool for creating a sequence logo.
+
+#### Q15
+
+What remarkable characteristics do you see in your logo? Can you make any biological hypothesis about this? Include your logo in the answer.
+
+#### Q16
+
+Which sort of information is highlighted in each of these sequence comparison methods: pairwise alignment, multiple alignment and logo? Can you say in which situation you would pick each of these?
+
+#### Q17
+
+From a biological perspective, why are there conserved regions or motifs?
diff --git a/lab/b3/readme.md b/lab/b3/readme.md
@@ -7,8 +7,12 @@ See the ‘Practicals’ document for more information.
 
 1. Based on just the hydrophobicity/hydrophilicity of the amino acid side chains, which parts of the hypothetical membrane protein LIFIRDNDEPTLIF will be inside the membrane and which will be outside?
 1. Look up what the Hamming distance is and calculate the Hamming distance between the aligned sequences (where a "-" indicates a deletion):  
-`PRI-LFDNRLDEFL  
-DRINLFRNR--NRL`
+
+   ```verbatim
+   PRI-LFDNRLDEFL  
+   DRINLFRNR--NRL
+   ```
+
 1. Look up how the UPGMA clustering method works and draw the dendrogram for the following distance matrix:
 
 |    | Seq1 | Seq2 | Seq3 | Seq4|
@@ -26,13 +30,13 @@ As was mentioned in the previous lab, one of the mechanisms of antibiotic resist
 
 In the last lab you found some non-matching genes in the bacterial genome you chose to examine. Four of these corresponded to antibiotic resistance genes and two to a toxin. If you did not find these six genes in the previous lab, use the files [AB_Resistance_GeneMarkS_proteins.fasta](AB_Resistance_GeneMarkS_proteins.fasta), [toxin_Bact1_aminoacids.fasta](toxin_Bact1_aminoacids.fasta), [toxin_Bact2_aminoacids.fasta](toxin_Bact2_aminoacids.fasta), and [toxin_Bact3_aminoacids.fasta](toxin_Bact3_aminoacids.fasta).
 
-Run one of the tools from your tools booklet to find out if any of the antibiotic resistance genes have a transmembrane efflux pump candidate. Note that you might have to translate the nucleotide sequence to an amino acid sequence for the tool to work.
+Run one of the tools from your [tools booklet](../biotoolsbooklet.md) to find out if any of the antibiotic resistance genes have a transmembrane efflux pump candidate. Note that you might have to translate the nucleotide sequence to an amino acid sequence for the tool to work.
 
 #### Q1
 
 Which tool did you use? Which of the genes had TM helices? What is the 2D structure of this candidate (e.g. how many TM helices are there; does the protein start/end inside/outside the cell)?
 
-One way of finding out the function of a protein is to search the Pfam database, which contains information about functionality of protein families and domains and their protein structure.
+One way of finding out the function of a protein is to search the Pfam database, which contains information about the functionality of protein families and domains and their protein structure.
 
 #### Q2