Skip to content

ilkhom19/dna_pattern_recognition_algo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Substring Matching Algorithm in C++ / Java / Python

The Karp-Rabin algorithm aims to use skillful modulo and arithmetic operations as well as hashing functions to decrease the time complexity of matching exact substring of the text. Here is the application in Genetics: Looking for DNA sequnces in the precalculated Data Set of DNA sequence

Input:

  • The first line of the input contains the long DNA Sequence.The DNA Sequence cannot be empty.
  • The next line contains two integers 1 <= N <= 1000 and 1 <= L <= 1000 representing the number of short DNA queries in the predefined set and their length.
  • Each of the next N lines contains a DNA sequence consisting of exactly L characters.

Example:

  • Input:

GCTCTAAGGACAATTACGGAGTGGGCGGCCTGGCGGGAGCACTACCCCATCGACGCGTACTCGAATACTGTATATTGCTCTCACATGAACAAATTAGTAGAGTGCCGCTTTCAGCCCCCCTGTCGTCGGCGACGTCTGTAAAATGGCGTTGATGTGGATCGACTCTATAGAGGCATCTACTGATGCGTAGGGAGATCCGGAATGTATTGGCCTATGTCACTGAAACTGTCCAAACACCCCATGTCGTTACTGAACGTATCGACGCATACCTCCTTCGTTGAGAACTCACAATTATACAACTGGGGACATAATCCCTACGCCCATCTTCTACACCCGTCTCTGTGGGTCCAGTTCAAGTGCTGGGAGAGCATCCTCCACAAGGTCTAGTGGTATGGTGGTATAGTAAGCTCGTACTGTGATACATGCGACAGGGGTAAGACCATCAGTAGTAGGGATAGTGCCAAAGCTCACTCACCACTGCCTATAAGGGGTGCTTACCTCTAGAATAAGTGTCAGCCAGTATAACCCCATGAGGAACCGAAAAGGCGAACCGGGCCAGACAATCCGGAGGCACCGGGCTCAAAGCCGCGACACGACGGCTCACAGCCGGTAAGAGTAACCCCGGAGTGAACACCTATGGGGCTGGATAAAACTGCCCTGGTGACCGCCATCAACAACCCGAATACGTGGCATTTCAGGAGGCGGCCGGAGGGGGGATGTTTTCTACTATTCGAGGCCGTTCGTTATAACTTGTTGCGTTCCTAGCCGCTATATTTGTCTCTTTGCCGACTAATGAGAACAACCACACCATAGCGATTTGACGCAGCGCCTCGGAATACCGTATCAGCAGGCGCCTCGTAAGGCCATTGCGAATACCAGGTATCGTGTAAGTAGCGTAGGCCCGTACGCGAGATAAACTGCTAGGGAACCGCGTCTCTACGACCGGTGCTCGATTTAATTTCGCCGACGTGATGACATTCCAGGCAGTGCCTCTGCCGCC
6 10
CCGCTGTGCC
CATCCTAGTA
GCCATCAACA
CTATGACCCT
GGTAAGAGTA
CGGTGGCGGA

  • Output:

The sequence: 'CCGCTGTGCC' not found
The sequence: 'CATCCTAGTA' not found
The sequence: 'GCCATCAACA' found at the position: 667
The sequence: 'CTATGACCCT' not found
The sequence: 'GGTAAGAGTA' found at the position: 609
The sequence: 'CGGTGGCGGA' not found

About

Substring Matching Algorithm in O(n) time.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published