You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed a strange behavior: when I switch the process.extract(query, limit=k) from k=10 to k=1000 I get different top k results (better for higher k).
Expected behavior: the top 10 matches should be the same for limit=10 and limit=1000.
The text was updated successfully, but these errors were encountered:
I can confirm this behavior. I have to compare the UMI of a sequence with a long list of known UMIs. As soon as I change the limit of the search, the results change.
Here is an example code to reproduce:
`
import random
from neofuzz import char_ngram_process
umi_list = []
umi_check_list = []
letters = "ATGC"
for i in range(1000):
umi_list.append(''.join(random.choice(letters) for i in range(20)))
umi_check_list.append(''.join(random.choice(letters) for i in range(20)))
process = char_ngram_process()
process.index(umi_list)
difference = False
while not difference:
umi = random.choice(umi_check_list)
found_ten = process.extract(umi, limit=10, refine_levenshtein=True)
found_thousend = process.extract(umi, limit=1000, refine_levenshtein=True)
if found_ten[0][1] != found_thousend[0][1]:
print(found_ten)
print(found_thousend)
difference = True
I noticed a strange behavior: when I switch the
process.extract(query, limit=k)
fromk=10
tok=1000
I get different top k results (better for higher k).Expected behavior: the top 10 matches should be the same for limit=10 and limit=1000.
The text was updated successfully, but these errors were encountered: