Skip to content

Latest commit

 

History

History
117 lines (77 loc) · 4.69 KB

index.md

File metadata and controls

117 lines (77 loc) · 4.69 KB
layout title
page
CBB752Spring2016 Final Project

About the Course

  • Title: Bioinformatics: Practical Application of Data Mining & Simulation

  • Instructor: Gerstein, Mark

  • Introduction: Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine-learning approaches to data integration.

  • Check out our awsome course website.

  • Check out this awsome post of related bioinformatics topics.

About the Final Project

  • Why: Instead of generating papers or codes that nobody would ever read (expect for the TAs), we want to encourage the innovative generation of products that could potentially benefit the bioinformatics community.

  • When: Released at April 14th, and will be due at 11:59pm May 5th.

  • How: Each student will coorporate with classmates to work on three (or four for extra credits) different projects. The generated codes and documents will be published on this website to be resources for later students and researches.

  • What: Project topics are as following. Students can choose three to four favorite projects to work on.

For all sub-projects, your group will have to provide

  1. R card: sample input, source code in R, sample output, and documentation on how to execute your code

  2. Python card: sample input, source code in Python, sample output, and documentation on how to execute your code

  3. English card: methodology and background introduction

Available Topics (Note that this is a draft and if you see an issue come to us and we would edit it accordingly)

1. QC steps

1.1 Propose a tool that removes barcode or sequence identifier from FastQ file.(Kevin)

1.2 Propose a tool that generates “quality control statistics” from FastQ file.

1.3 Propose a tool that trims reads based on quality score from FastQ file.

2. Sequence Analysis

2.1 Propose a tool that generates pileup format from SAM file.(Kevin)

2.2 Propose a tool that calculates FPKM (or TPM, and justify your choice) from given SAM and GTF files. (Julian)

2.3 Propose a tool that calculates intersection between two BED files.

2.4 Propose a tool that calls SNVs from pileup file, and generate the output in VCF format.

2.5 Propose a tool that calculates differentially expressed genes from GCT file of gene expressions.(Edmond Dantes; Julian)

2.6 Propose a tool that finds k-mer motif enrichment from a given nucleotide sequence.(Edmond Dantes; Julian)

3. Network Analysis

3.1 Propose a tool that calculates co-expressed gene network from GCT file of gene expressions.(Edmond Dantes; Julian; Kevin)

3.2 Propose a tool that calculate their degree centrality and betweenness centrality from PPI file. PPI data can be downloaded from DIP, BIND, MIPS, MINT, and InAct databases.(Edmond Dantes; Julian; Kevin)

3.3 Propose a tool that calculates enrichment level of gene expression data given pre-defined gene sets (http://software.broadinstitute.org/gsea/msigdb).(Edmond Dantes)

4. Structure Analysis

4.1 Propose a tool that calculate distance between two alpha carbons from a PDB file. (The program should output a distance between two atoms in angstroms) (Nathan)

4.2 Propose a tool that calculate the Lennard-Jones potential based on the input of a PDB file consisting of just alpha carbons and a query point’s xyz coordinates. (Nathan)

4.3 Propose a tool that calculate the dihedral angle based on the input of four points’ xyz coordinates in PDB format.(Kevin; Nathan)

Next Steps to Do

  • Edit this pages! Fix this page and make it better and better!

  • Pick your topics!

  • Form your groups!

  • Make specs for topics you are interested in! You can even modify the topic if you find it too difficult.