Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing limit parameter influences top k results #10

Open
SeanPedersen opened this issue Oct 26, 2024 · 2 comments
Open

Changing limit parameter influences top k results #10

SeanPedersen opened this issue Oct 26, 2024 · 2 comments

Comments

@SeanPedersen
Copy link

I noticed a strange behavior: when I switch the process.extract(query, limit=k) from k=10 to k=1000 I get different top k results (better for higher k).

Expected behavior: the top 10 matches should be the same for limit=10 and limit=1000.

@x-tabdeveloping
Copy link
Owner

Eeery.. Can you give me a minimal reproducible example of this so I can investigate?

@Glombsen
Copy link

Hey

I can confirm this behavior. I have to compare the UMI of a sequence with a long list of known UMIs. As soon as I change the limit of the search, the results change.

Here is an example code to reproduce:

`

import random
from neofuzz import char_ngram_process

umi_list = []
umi_check_list = []
letters = "ATGC"

for i in range(1000):
    umi_list.append(''.join(random.choice(letters) for i in range(20)))
    umi_check_list.append(''.join(random.choice(letters) for i in range(20)))

process = char_ngram_process()
process.index(umi_list)

difference = False
while not difference:
    
    umi = random.choice(umi_check_list)
    found_ten = process.extract(umi, limit=10, refine_levenshtein=True)
    found_thousend = process.extract(umi, limit=1000, refine_levenshtein=True)
    if found_ten[0][1] != found_thousend[0][1]:
        print(found_ten)
        print(found_thousend)
        difference = True

`

Hope it helps and thank for this great Module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants