Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert fasta into a Numeric Summarization Vector (NSV) #1895

Open
solshiferaw opened this issue Sep 26, 2019 · 2 comments
Open

Convert fasta into a Numeric Summarization Vector (NSV) #1895

solshiferaw opened this issue Sep 26, 2019 · 2 comments

Comments

@solshiferaw
Copy link

solshiferaw commented Sep 26, 2019

I want to convert fasta file to NSV for k-mers frequency count. What code need to written in python and how to load file? Thank you!

@standage
Copy link
Member

Hi Solshi!

Could you please describe with a bit more detail what the contents of the NSV will be and what sequence characteristics they will summarize? An example of what you expect this vector to look like would help as well.

@solshiferaw
Copy link
Author

Dear!
My NSV file containing frequency of nuclietied, kmers
Suppose ...
Fasta file contain

RF00050|AECL01000049.1/43972-43822
GGUUGUUCUCAGGGCGGGGUGCAAUUCCCCACCGG
RF00050|CP000628.1/2430019-2430165
GACCGUUCUCAGGGCGGGGUGAGAUUCCCCAC
conver to kmer frequency count(a, aa, aaa,aaaaa, aaaaaa....) aa, au, cua, cug, ggg, aaac........
and convert those kmer frequency into vector to analyse in Weka.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants