Skip to content

Fasta to SampleData #674

Answered by hyanwong
reedacartwright asked this question in Q&A
Discussion options

You must be logged in to vote

Alternatively, if you just want to read it in as a massive matrix:

import numpy as np
import tsinfer
# slurp it all into a big matrix: assumes all data for a seq is on one line
# otherwise use a text editor to delete all newlines except those
# followed by ">"
binary_data = np.genfromtxt(
    "tmp.fasta",
    comments=">",  # ignore any lines starting with ">"
    delimiter=1,  # one char per value. 
    dtype=int,
)

with tsinfer.SampleData(
    path="my_data.samples",
    sequence_length=binary_data.shape[1]
) as sd:
    for pos, column in enumerate(binary_data.T):  # iterate over transposed matrix
        sd.add_site(pos, column)

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@reedacartwright
Comment options

Answer selected by reedacartwright
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants