Skip to content

Commit

Permalink
Updates and addition of lab b2
Browse files Browse the repository at this point in the history
  • Loading branch information
percolator committed Aug 12, 2023
1 parent 0ee8fe6 commit d554267
Show file tree
Hide file tree
Showing 6 changed files with 414 additions and 9 deletions.
110 changes: 110 additions & 0 deletions lab/b2/MarB.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
>gb|AF226275.1|:1823-2035 Salmonella enteritidis multiple antibiotic resistance operon, complete sequence
ATGAAAATGCTGTTTCCCGCCCTGCCGGGTCTGTTACTTATCGCCTCCGGATATGGCATTGCAGAACAAA
CTTTGTTACCTGTGGCGCAAAATAGCCGCGATGTGATGCTGCTGCCCTGTGTAGGCGATCCGCCAAATGA
CCTTCACCCCGTGAGCGTGAACAGCGATAAGTCAGATGAATTAGGCGTGCCCTATTATAACGACCAACAC
CTT
>gb|CP003047.1|:1361810-1362022 Salmonella enterica subsp. enterica serovar Gallinarum/pullorum str. RKS5078, complete genome
AAGGTGTTGGTCGTTATAATAGGGCACGCCTAATTCATCTGACTTATCGCTGTTCACGCTCACGGGGTGA
AGGTCATTTGGCGGATCGCCTACACAGGGCAGCAGCATCACATCGCGGCTATTTTGCGCCACAGGTAACA
AAGTTTGTTCTGCAATGCCATATCCGGAGGCGATAAGTAACAGACCCGGCAGGGCGGGAAACAGCATTTT
CAT
>gb|CP007254.1|:1627993-1628205 Salmonella enterica subsp. enterica serovar Enteritidis str. EC20111095, complete genome
ATGAAAATGCTGTTTCCCGCCCTGCCGGGTCTGTTACTTATCGCCTCCGGATATGGCATTGCAGAACAAA
CTTTGTTACCTGTGGCGCAAAATAGCCGCGATGTGATGCTGCTGCCCTGTGTAGGCGATCCGCCAAATGA
CCTTCACCCCGTGAGCGTGAACAGCGATAAGTCAGATGAATTAGGCGTGCCCTATTATAACGACCAACAC
CTT
>gb|CP007376.1|:1632173-1632385 Salmonella enterica subsp. enterica serovar Enteritidis str. EC20120927 genome
ATGAAAATGCTGTTTCCCGCCCTGCCGGGTCTGTTACTTATCGCCTCCGGATATGGCATTGCAGAACAAA
CTTTGTTACCTGTGGCGCAAAATAGCCGCGATGTGATGCTGCTGCCCTGTGTAGGCGATCCGCCAAATGA
CCTTCACCCCGTGAGCGTGAACAGCGATAAGTCAGATGAATTAGGCGTGCCCTATTATAACGACCAACAC
CTT
>gb|CP007395.1|:1632241-1632453 Salmonella enterica subsp. enterica serovar Enteritidis str. EC20121748 genome
ATGAAAATGCTGTTTCCCGCCCTGCCGGGTCTGTTACTTATCGCCTCCGGATATGGCATTGCAGAACAAA
CTTTGTTACCTGTGGCGCAAAATAGCCGCGATGTGATGCTGCTGCCCTGTGTAGGCGATCCGCCAAATGA
CCTTCACCCCGTGAGCGTGAACAGCGATAAGTCAGATGAATTAGGCGTGCCCTATTATAACGACCAACAC
CTT
>gb|EU900879.1|:1-216 Escherichia coli strain TW14359 multiple antibiotic resistance protein (ECs2139) gene, complete cds
ATGAAACCGCTTTTATCCGCAATAGCAGCTGCGCTTATTCTCTTTTCCGCGCAGGGCGTTGCGGAACAAA
CCCAGCAACCGCTCGTTACTTCCTGTGGCGATGTGGTGGTTGTTCCCCCATCGCAGGAACAACCACCGTT
CGATTTAAATCACATGGGTACAGGCAGTGACAAATCGGATGCGCTGGGCGTGCCCTATTATAACCAACAA
GCCATG
>gb|CP001846.1|:1949318-1949533 Escherichia coli O55:H7 str. CB9615, complete genome
ATGAAACCGCTTTTATCCGCAATAGCAGCTGCGCTTATTCTCTTTTCCGCGCAGGGCGTTGCGGAACAAA
CCCAGCAACCGCTCGTTACTTCCTGTGGCGATGTGGTGGTTGTTCCCCCATCGCAGGAACAACCACCGTT
CGATTTAAATCACATGGGTACAGGCAGTGACAAATCGGATGCGCTGGGCGTGCCCTATTATAACCAACAA
GCCATG
>gb|CP002212.1|:1738314-1738523 Escherichia coli str. 'clone D i14', complete genome
ATGAAACCACTTTTATCCGCAATAGCAACTGCGCTTATTCTCTTTTCTGCGCAGGGCGTTGCGGAACAAA
CCACGCAGCCGGTTGTTACTTCCTGTGGCAATGTCGTGGTTGTTCCCACATCGCAGGAACAACCACCGTT
TGATTTAAATCACATGGGTACTGGCAGTGATAAGTCGGATGCGCTCGGCGTGCCCTATTATAATCAACAC
>gi|544340954:1679621-1679839 Escherichia coli PMV-1 main chromosome, complete genome
ATGAAACCACTTTTATCCGCAATAGCAACTGCGCTTATTCTCTTTTCTGCGCAGGGCGTTGCGGAACAAA
CCACGCAGCCGGTTGTTACTTCCTGTGGCAATGTCGTGGTTGTTCCCACATCGCAGGAACAACCACCGTT
TGATTTAAATCACATGGGTACTGGCAGTGATAAGTCGGATGCGCTCGGCGTGCCCTATTATAATCAACAC
GCTATGTAG
>gb|CP007557.1|:3666787-3667002 Citrobacter freundii CFNIH1, complete genome
CAGGTTGCGCGTATTGTAATAAGGCACACCCAATTCGTCGGACTTATCGCTGCCGGCACCCATATGATTA
AAATCGAACGGGGACTGATCGTGCGCAGGGGGCATAATCATCGCGTCACGGCTGTTCTGGGTGGCTGGTT
GCGCGGTCTTTTCCGCCATGCTCTGGCCCGAAACAATCAACAGCAGGGCCAGCATCGCATAAGACAGAAT
TTTCAT
>gb|CP004887.1|:3422041-3422259 Klebsiella oxytoca HKOPL1, complete genome
TTACAGGCTTTGCTGGTTGTAATAGGGCACGCCAAGTTCATCTGATTTATCACTGCCAGAGGCCATGTGA
TTGAAATCAAAAGGCGAATCATTATGTTCGGAAGGAATAATCATCGCATCCCGGTTGTTGTGGCGAACCG
GGGTGCTGGTTTGCTCCGCATAACTTTGGCTCGAGACCAGCGCCAATAGCACGATGGCGGCGGAAGCGAA
TAGTTTCAT
>gb|CP003683.1|:3309185-3309403 Klebsiella oxytoca E718, complete genome
ATGAAACTATTCGCTTCCGCCGCCATCGTGCTATTGGCGCTGGTCTCGAGCCAAAGTTATGCGGAGCAAA
CCAGCACCCCGGTTCGCCACAACAACCGGGATGCGATGATTATTCCTTCCGAACATAATGATTCGCCTTT
TGATTTCAATCACATGGCCTCTGGCAGTGATAAATCAGATGAACTTGGCGTGCCCTATTACAACCAGCAA
AGCCTGTAA
>gb|KJ694144.1|:401-604 Uncultured bacterium clone S10_CH_30 genomic sequence
ATGAAACTATTCGCTTCCGCCGCCCTCACCGCCCTGGTGCTGGTCTCCGGCCAGAGTTTTGCGGAGCAAA
CTCCACGTGTTCCGCAGCAGAACAACCGCGACACGATGATCCTGCCAACGGCTAACGGCCAGTCGCCCCA
TGACTTTAACCATATGGGCGCAGGCAGCGACAAATCCGACGAGTTAGGCGTCCCTTATTACAAT
>gb|CP000036.1|:1635481-1635699 Shigella boydii Sb227, complete genome
ATGAAACCACTTTCATCCGCAATAGCAGCTGCGCTTATTCTCTTTTCCGCGCAGGGCGTTGCGGAACAAA
CCACGCAGCCAGTTGTTACTTCTTGTGCCAATGTCGTGGTTGTTCCCCCATCGCAGGAACAACCACCGTT
TGATTTAAATCACATGGGTACTGGCAGTGATAAGTCGGATGCGCTCGGCGTGCCCTATTATAATCAACAC
GCTATGTAG
>gb|CP001383.1|:1650797-1651015 Shigella flexneri 2002017, complete genome
CTACATAGCGTGTTGATTATAATAGGGCACGCCGAGCGCATCCGACTTATCACTGCCAGTACCCATGTGA
TTTAAATCAAACGGTGGTTGTTCCTGCGATGGGGGAACAACCACGACATTGGCACAAGAAGTAACAACTG
GCTGCGTGGTTTGTTCCGCAACGCTCTGCGCGGAAAAGAGAATAAGCGCAGCTGCTATTGCGGATGAAAG
TGGTTTCAT
>gb|CP000647.1|:1804847-1805065 Klebsiella pneumoniae subsp. pneumoniae MGH 78578, complete sequence
TCACAGGTCGTGCTGCTGGTAGTACGGCACGCCAAGTTCGTCAGATTTATCATTACCAGCCGCCATATGA
TTGAAATCAAACGGCGAATCATTATGTTCGGAAGGGATAATCATTGTATCACGCTGGTTTTGACGCACCG
GCGTGGTGTTTTGCTCCGCATAGCTGAGGCTGGAGGCCAGCGACAAGAGTACGATAGCTGCGGCAGCGAA
TAGTTTCAT
>gb|CP007727.1|:2602333-2602551 Klebsiella pneumoniae subsp. pneumoniae KPNIH10, complete genome
TCACAGGTCGTGCTGCTGGTAGTACGGCACGCCAAGTTCGTCAGATTTATCATTACCAGCCGCCATATGA
TTGAAATCAAACGGCGAATCATTATGTTCGGAAGGGATAATCATTGTATCACGCTGGTTTTGACGCACCG
GCGTGGTGTTTTGCTCCGCATAGCTGAGGCTGGAGGCCAGCGACAAGAGTACGATAGCTGCGGCAGCGAA
TAGTTTCAT
>gb|CP000964.1|:2815073-2815291 Klebsiella pneumoniae 342, complete genome
ATGAAACTATTCGCTGCCGCAGCTATCGTACTCCTGTCGCTGGTCTCCAGCCTCAGCTATGCGGAGCAAA
ACACCACGCTGGTGCGTCAAAACCAGCGTGATACAATGATTATCCCTTCGGAACATAACGATTCGCCATT
TGATTTCAATCATATGGCGGCTGGCAGTGATAAATCCGACGAACTGGGCGTGCCGTACTACCAGCAGCAC
GACCTGTGA
>gb|CP003312.1|:2087372-2087584 Cronobacter sakazakii ES15, complete genome
TTAACGCGACTGGTTGTAGTACGGCACGCCCAGTTCGTCTGATTTATCGCTGCCTGCGCTGCGATGGCTG
AGATCCAGCGATTCATGATGCTCAATCGGCACCATCATTGTGCTGGTGTCCCCGGCGCACGCGTCAGTTT
TGGGGCTGCCTGCCAGCGCGTAGCCGGAGGTCAACGCCAGCAACAGCGCAGCGGCGCACGTGACGGATTT
CAT
>gi|323575285:2280852-2281064 Cronobacter turicensis z3032 complete genome
ATGAAATCCGTAACGTGCGCCGCAGCGCTGTTGCTGGCGCTGACCTCCGGCTATGCGCTGGCAGGCAGCC
CCAAAACCGACGCGTGCGCCGGGGACCAGAGCACGATGATGGTGCCGATTGAGCATCACGAGTCGCTGGA
TCTCAGCCATCGCAGCGCGGGCAGCGATAAATCAGACGAGCTGGGTGTGCCGTACTACAACCAGTCGCGT
TAA
>gb|CP001918.1|:2045053-2045271 Enterobacter cloacae subsp. cloacae ATCC 13047, complete genome
ATGAACGTTACCGCCTCCGCCGCCCTCGCCTTGCTGGTGCTCTTTTCCAGCCAGACCTTCGCGGAGCAAC
CCCCTCGTGCAACGCAGCAAAATAATCATGACACGATGATTTTGCCGTCAGCACATAGCCAGTCCCCTTA
TGATTTCAACCACATGGGGTCTGGTAGCGACAAATCCGACGAATTAGGCGTGCCTTATTATAATCAGCAC
GGCTTCTGA
>gi|629665248:741-914 Eutypa lata UCREL1 putative amp-binding enzyme protein mRNA
GAAGTACGGCGCGCCCAACGGATCGCTCGTGATCGACAACTGGTGGTCCTCGGAGGCCGGGTCGCCCATC
TCGGGTATCAGCCTGCTGCCACATACTACGAGCGATAGGAAGGCAGGGGTCAAGGACTACCACCCAATGC
CGCTGATCAAGCCGGGGAGCGCGGGGAAGCCCAT
>gb|CP005287.1|:259392-259520 Propionibacterium avidum 44067, complete genome
TACGCCAAGATCGTCCTTCCGGTGTCGATTCCCGGCTTCGTCGTGACCCTCATCTGGCAGTTCACCAGCG
CATGGAATGACTTCCTCTTCGCGCTGTTCCTGACGAACCAGAACAATGGTCCGGTCACC
124 changes: 124 additions & 0 deletions lab/b2/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# LAB B2: rRNA finding, taxonomic classification and multiple sequence alignment

## Preparation questions

1. What is rRNA? Which organisms have this?
2. How does Blast work?
3. What is “bootstrapping”? How does this procedure estimate the statistical support to the results of an analysis?
4. What is “worse” in a sequence alignment? Substitutions, small gaps or big gaps?
5. How does multiple sequence alignment work? Can you explain the “once a gap, always a gap” rule?

## Instructions and questions

Welcome to the second computer exercise in bioinformatics! In the previous bioinformatics lab, we learned about the epidemic of unknown bacteria. We have found genes in their recently sequenced genomes, and we saw how to use Blast to compare genes to databases in different ways. Today we will start off again by finding genes, but only of a particular kind: ribosomal RNA (rRNA) genes. Ribosomal rRNA is often used for identifying unknown bacteria at a species-level. Go to the [BioToolsBooklet](../biotoolsbooklet.md) and find a tool for identifying rRNA genes. Run your bacterial genome through it.

#### Q1

How many rRNA genes did you find? What subunit of the rRNA are they (last column)? How many of each are there?

Download the FASTA results and make a file with just the 16S rRNA gene. This is the subunit that is most commonly used for classification. You will use an online tool for classifying rRNA. Find a tool for doing this in the booklet or online and run it.

#### Q2

Is the classification entirely consistent (do all the matches agree with each other)? Does it have strong bootstrap support?

Look up information on this genus online. Wikipedia might be enough, but look up other websites if you're not convinced.

#### Q3

Does it make sense that these patients are so sick? Justify your answers and include your sources.

Now that you know a bit more about your bacteria, we can use this information to compare this new isolate to previously sequenced ones. This might give interesting clues on what is going on. Find in Canvas a collection of reference genomes for your bacterium of interest and download it.

#### Q4

Is it likely that very closely related bacterial species will have entire genes added or missing? Why/why not?
*Hint: consider mechanisms of horizontal gene transfer.*

Let's try to see what distinguishes your new genome from others in the species. One way of finding this (albeit not necessarily the most efficient) is through Blast.

#### Q5

In which format is the reference genome, nucleotide or protein? What about the file containing the genes you found in the previous bioinformatics lab? In that case, which type of Blast is recommended?

Blast the genes that you found in the previous bioinformatics lab against the reference genome of the bacteria that you chose (reference genomes provided in this lab). Remember to choose a suitable E-value for your search.

#### Q6

Which genes do NOT find a good match from this database?
*Hint: open the Blast result in a text editor and use ctrl+F to find empty headers (headers without any matches).* It’s recommended to use a “tabular” output format. See blast help section for how to choose format. Depending on the version of Blast, you might get the output in the form of:

```verbatim
# BLASTN 2.9.0+
# Query: xxxxx ID= xxxxx
# Database: xxxxxx
# 0 hits found
```

Find these by searching “# 0” and take note of the query.

Retrieve only the non-matching genes and make a separate fasta file with them. Submit these to online Blast.

#### Q7

Which Blast program did you select? Did you change any parameters from the default values? Which and why?

#### Q8

What are the main Blast hits found? Do they explain the toxicity of this new strain of bacteria?

#### Q9

From which organism do these genes seem to come from? Does this make sense? Why/why not?

A class of genes that has very important clinical implications is antibiotic-resistance genes. Common mechanisms of antibiotic resistance are pumps that keep the drugs outside the bacterial cell or enzymes that break down the antibiotic, but bacteria can also mutate in a way that makes their own proteins immune to the antibiotics.

You’ve asked for help from the experts at the sequencing center to characterize the antibiotic-resistance genes in your unknown bacteria. They’ve informed you that it is a multi-antibiotic-resistance operon known as mar. It’s mode of action is still unknown, but some things are already understood. The operon contains 4 protein-coding genes, marA, marB, marC and marR. We’re going to work more with these protein products in later labs. For now, let’s take a closer look at the marB genes.

Download from Canvas a file called [marB.fasta](marB.fasta), which contains marB-related proteins from several different bacteria.

We’ll use online Blast again for pairwise sequence comparison. It would take too long to compare each pair of sequences, so we’ll focus on 3 pairs. For each pair you’ll have to choose between blastn, megablast and tblastx as the best tool for aligning them (by best, understand “the tool that gives the most information”). Justify all your answers with your own words as well as the dot-matrix and other pictures from the blast output that you find relevant.

Compare the first sequence, `gb|AF226275.1|:1823-2035`, with the second, `gb|CP003047.1|:1361810-1362022`.

#### Q10

Which Blast tool(s) did you pick? Using this, how closely related are these two proteins?

Now compare the first sequence with `gb|CP004887.1|:3422041-3422259`.

#### Q11

Which Blast tool(s) did you pick? Using this, how closely related are these two proteins? How does this answer change comparing different tools?

Finally, compare the first sequence with `gi|629665248:741-914`.

#### Q12

Which Blast tool(s) did you pick? Using this, how closely related are these two proteins?

As you’ve noticed, doing pairwise sequence comparison is quite slow. Fortunately, there are tools for comparing several proteins at the same time. Run this file through a multiple sequence alignment tool. Choose ClustalW format for the multiple alignment, it will make the next steps a little easier.

#### Q13

Which multiple sequence alignment tool did you choose? Do the results confirm that the sequences in this file are all related? Justify your answer with your own words as well as copying parts of the alignment.

Keep this result output open and run one more multiple-alignment tool. Choose the same format as before.

#### Q14

Which tool did you use now? Do you see any differences in the result comparing the two? Which tool seems better? Justify your answer with your own words as well as copying parts of the alignments.

Now select a relatively conserved part of the multiple alignment and use an online tool for creating a sequence logo.

#### Q15

What remarkable characteristics do you see in your logo? Can you make any biological hypothesis about this? Include your logo in the answer.

#### Q16

Which sort of information is highlighted in each of these sequence comparison methods: pairwise alignment, multiple alignment and logo? Can you say in which situation you would pick each of these?

#### Q17

From a biological perspective, why are there conserved regions or motifs?
12 changes: 8 additions & 4 deletions lab/b3/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,12 @@ See the ‘Practicals’ document for more information.

1. Based on just the hydrophobicity/hydrophilicity of the amino acid side chains, which parts of the hypothetical membrane protein LIFIRDNDEPTLIF will be inside the membrane and which will be outside?
1. Look up what the Hamming distance is and calculate the Hamming distance between the aligned sequences (where a "-" indicates a deletion):
`PRI-LFDNRLDEFL
DRINLFRNR--NRL`

```verbatim
PRI-LFDNRLDEFL
DRINLFRNR--NRL
```

1. Look up how the UPGMA clustering method works and draw the dendrogram for the following distance matrix:

| | Seq1 | Seq2 | Seq3 | Seq4|
Expand All @@ -26,13 +30,13 @@ As was mentioned in the previous lab, one of the mechanisms of antibiotic resist

In the last lab you found some non-matching genes in the bacterial genome you chose to examine. Four of these corresponded to antibiotic resistance genes and two to a toxin. If you did not find these six genes in the previous lab, use the files [AB_Resistance_GeneMarkS_proteins.fasta](AB_Resistance_GeneMarkS_proteins.fasta), [toxin_Bact1_aminoacids.fasta](toxin_Bact1_aminoacids.fasta), [toxin_Bact2_aminoacids.fasta](toxin_Bact2_aminoacids.fasta), and [toxin_Bact3_aminoacids.fasta](toxin_Bact3_aminoacids.fasta).

Run one of the tools from your tools booklet to find out if any of the antibiotic resistance genes have a transmembrane efflux pump candidate. Note that you might have to translate the nucleotide sequence to an amino acid sequence for the tool to work.
Run one of the tools from your [tools booklet](../biotoolsbooklet.md) to find out if any of the antibiotic resistance genes have a transmembrane efflux pump candidate. Note that you might have to translate the nucleotide sequence to an amino acid sequence for the tool to work.

#### Q1

Which tool did you use? Which of the genes had TM helices? What is the 2D structure of this candidate (e.g. how many TM helices are there; does the protein start/end inside/outside the cell)?

One way of finding out the function of a protein is to search the Pfam database, which contains information about functionality of protein families and domains and their protein structure.
One way of finding out the function of a protein is to search the Pfam database, which contains information about the functionality of protein families and domains and their protein structure.

#### Q2

Expand Down
Loading

0 comments on commit d554267

Please sign in to comment.