-
Notifications
You must be signed in to change notification settings - Fork 2
Home
This is a project about gathering human gene data and creating sequence logos based on this data to visualize motifs. This is to be done on the regions before and after the translation start site and on the beginning and end of the first intron of a gene.
This project is part of the course Applied Bioinformatics at KTH.
This is a brief explanation of the content in each directory of the project. This will be updated continuously as new directories are added.
This is the home of all experimenting and temporary files used by us to figure out how things work.
This is the home of executable scripts.
This is the home our result files such as graphs and diagrams.
This is the home of data files such as, in our case mostly FASTA files or other genomic data files.
This is the home of all documentation such as the final report.
- Download this repository: unravel_motifs/
- Create the directory seq: unravel_motifs/seq
- Use the queries in unravel_motifs/data/human_genome_biomart_sequence_links.txt to download three sequence files in FASTA format.
- Put these files in unravel_motifs/seq and rename them to coding_sequences.fa, usg.fa, exons.fa (in the same order as they appear in the above mentioned query file). They should be approx 1.4Gb in total.
- Run this script: unravel_motifs/bin/run.sh
- Results will be created in unravel_motifs/results/output
Biomart seems to have some issues regarding URL queries like this. This has unfortunately come to our attention very late in the project so to fix it we recommend that, in events of the script not working, you type in the queries manually at Biomart as they are described in human_genome_biomart_sequence_links.txt.