masked DNA strings #32

kchu25 · 2022-10-14T16:39:26Z

There are some DNA strings in the datasets that either partially or entirely consist of masked strings, e.g., the 7th sequence in the DemoHumanOrWorm training set (checked via dset[6]), is a string of 'NNNNNNN....NNNN'. Maybe consider extracting the DNA strings from the unmasked genome?

simecek · 2022-10-14T18:50:24Z

I believe we use unmasked genome but I will look into that. It might still be that we hit the beginning / end of chromosomes that are often unknown. Maybe we should check the randomly chosen sequences and remove long all Ns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

masked DNA strings #32

masked DNA strings #32

kchu25 commented Oct 14, 2022 •

edited

Loading

simecek commented Oct 14, 2022

masked DNA strings #32

masked DNA strings #32

Comments

kchu25 commented Oct 14, 2022 • edited Loading

simecek commented Oct 14, 2022

kchu25 commented Oct 14, 2022 •

edited

Loading