Skip to content
jolo2486 edited this page Jan 11, 2016 · 8 revisions

Welcome!

This is a project about gathering human gene data and creating sequence logos based on this data to visualize motifs. This is to be done on the regions before and after the translation start site and on the beginning and end of the first intron of a gene.

This project is part of the course Applied Bioinformatics at KTH.

Directory structure

This is a brief explanation of the content in each directory of the project. This will be updated continuously as new directories are added.

/tests

This is the home of all experimenting and temporary files used by us to figure out how things work.

/bin

This is the home of executable scripts.

/results

This is the home our result files such as graphs and diagrams.

/data

This is the home of data files such as, in our case mostly FASTA files or other genomic data files.

/doc

This is the home of all documentation such as the final report.

Reproducing our results

  • Download this repository: unravel_motifs/
  • Create the directory seq: unravel_motifs/seq
  • Use the queries in unravel_motifs/data/human_genome_biomart_sequence_links.txt to download three sequence files in FASTA format.
  • Put these files in unravel_motifs/seq and rename them to coding_sequences.fa, usg.fa, exons.fa (in the same order as they appear in the above mentioned query file). They should be approx 1.4Gb in total.
  • Run this script: unravel_motifs/bin/run.sh
  • Results will be created in unravel_motifs/results/output

Note:

Biomart seems to have some issues regarding URL queries like this. This has unfortunately come to our attention very late in the project so to fix it we recommend that, in events of the script not working, you type in the queries manually at Biomart as they are described in human_genome_biomart_sequence_links.txt.

Clone this wiki locally