Replies: 1 comment
-
I haven't tested anything but I'm getting more and more interested in using sqlite for quick bam-queries because it could also scale to larger queries using bigquery or parallelized per library. I'm curious your thoughts on using sqlite's inbuilt Levenshtein distance, I guess it depends on how long the sequence is, and if you're matching very small subsequences or the majority of the reads. For single-cell/adjacent, it could be nice to match cell barcodes more accurately, when some programs just using simple hamming distance without even proper indels. Or maybe a variant of this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We'd like to add sequence content indexing, to answer queries for stored DNA/RNA [sub]sequences with similarity to a given one. I'd like advice from the community on what data structures / algos should implement this.
For a table
target_table
where each row has a column storing DNA/RNA text, after some sort of indexing we can query it likewhere
query_sequence
is a literal DNA/RNA text.Wish list:
Non-goals:
Beta Was this translation helpful? Give feedback.
All reactions