Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized Thresholding for ColBERT Scores Across Datasets #376

Open
FaisalAliShah opened this issue Oct 31, 2024 · 0 comments
Open

Generalized Thresholding for ColBERT Scores Across Datasets #376

FaisalAliShah opened this issue Oct 31, 2024 · 0 comments

Comments

@FaisalAliShah
Copy link

I’m currently working with ColBERT for document re-ranking and facing challenges in applying a generalized threshold to ColBERT scores across different datasets. Due to the variability in score ranges, it’s difficult to set a fixed threshold for relevance filtering. Unlike typical embedding similarity scores, ColBERT’s late interaction mechanism produces scores that can vary significantly based on query length, token distributions, and dataset characteristics.

I tried using min-max normalization on scores returned for a particular query but turns out even when the search is irrelevant it would give results because I was selecting min_score and max_score from the query responses.

Here are some of the approaches I’ve considered, but each has limitations when applied generally:

  • Normalizing scores by query length or token count
  • Rescaling scores based on observed min-max values in different datasets
  • Z-score normalization based on empirical mean and variance across datasets
  • Using adaptive thresholds or lightweight classifiers to predict relevance

However, each approach tends to be dataset-specific, and I would like a solution that can generalize effectively across datasets. Do you have any recommended strategies for achieving a more standardized scoring range or threshold? Alternatively, is there any built-in functionality planned (or that I might have missed) for scaling or calibrating ColBERT scores in a more generalizable way?

Any guidance or suggestions would be greatly appreciated! I have attached my code snipped below as how I am using it.

Thank you for the fantastic work on ColBERT.

prefetch = [
                models.Prefetch(
                    query=dense_embedding,
                    using=dense_vector_name,
                    limit=20,
                ),
                models.Prefetch(
                    query=sparse_embedding,
                    using=sparse_vector_name,
                    limit=20,
                ),
            ]

            
            search_results = self.qdrant_client.query_points(
                collection_name=kwargs["collection_name"],
                prefetch=prefetch if Config.RETRIEVAL_MODE == QdrantSearchEnums.HYBRID.value else None,
                query=dense_embedding if Config.RETRIEVAL_MODE == QdrantSearchEnums.DENSE.value else colbert_embedding,
                using=dense_vector_name if Config.RETRIEVAL_MODE == QdrantSearchEnums.DENSE.value else colbert_vector_name,
                with_payload=True, 
                limit=10,
                # score_threshold=17,
            ).points
            return search_results`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant