String-Graph-Assembly

CS 225 Extreme Extra Credit Projects

Project Summary

Our project attempts to reproduce the genome of the S. Aureus bacteria. Sourcing the data from National Library of Medcine's Sequence Read Archive. The data is proccesed and the De Bruijn Graph is constructed, all single weighted edges are removed and the graph is traversed. The FM-Index is constructed for the k-mers traversed through in the De Bruijn Graph. Now the most repeating patterns are highlighted and extracted using the backward search algorithm. Finally, the generated string is compared to the output genome using the global alignment Needleman-Wunsch algorithm.

Code

All code files can be found in the code/ directory. To run the code:

Compile using make exec
Run using ./bin/exec
Enter the data set to be used eg. data/small.fasta
Enter the length of k-mers to be used eg. 7

There are a few test cases written to check the construction of the DeBruijnGraph and ReadFile method. make tests followed by ./bin/tests can be used to test it.

Data

Our data is originally sourced from the SRA Archive available for download here.

The data files on this repo are subsets of the original file.

data/small.fasta
data/evensmaller.fasta
data/smallest.fasta is not a subset of the original file and was updated throughout the project to test the functionality of the code.

Output

The graph and the other data generated is also outputted to the data/ directory. On running the program data/outputgraph.txt is generated.

Documents

Our signed contract and development log can be found in the documents/ directory.

Feedback

All feedback from our project mentor can be found in the feedback/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
bin		bin
code		code
data		data
documentation		documentation
tests		tests
.DS_Store		.DS_Store
README.md		README.md
main.cpp		main.cpp
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

String-Graph-Assembly

Project Summary

Code

Data

Output

Documents

Feedback

About

Releases

Packages

Contributors 4

Languages

aroaryan/String-Graph-Assembly

Folders and files

Latest commit

History

Repository files navigation

String-Graph-Assembly

Project Summary

Code

Data

Output

Documents

Feedback

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages