Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable IDF (inverse document frequency) per field #131

Open
mistyn8 opened this issue Oct 14, 2019 · 4 comments
Open

Disable IDF (inverse document frequency) per field #131

mistyn8 opened this issue Oct 14, 2019 · 4 comments

Comments

@mistyn8
Copy link

mistyn8 commented Oct 14, 2019

{ Category: content, LuceneQuery: (hideFromSearch:0 +(__NodeTypeAlias:dtcontenttile) +(tileContentOrigination:external^31.0 tileContentOrigination:partner^32.0 tileContentOrigination:originator^33.0)) }

So I'm trying to artificially boost pages scoring by a type, however, beacuse the lowest boosted type is also the lowest by node count it's score is enhanced due to IDF so ends up first in the results and not last, is there anyway to alter that? ta.

@Shazwazza
Copy link
Owner

Sounds like you know more about this subject than I do ;) I'm not really sure so if you feel like debugging into the cause (prob easiest with a unit test in the solution, there's plenty of examples to get started with) that would be great.

@mistyn8
Copy link
Author

mistyn8 commented Oct 15, 2019

Just a 20min google to try to understand how the scoring worked..
http://www.lucenetutorial.com/advanced-topics/scoring.html

Seems to suggest we can override the idf, though I'd have little idea how to do it.

Also found
https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

But as lucenenet 3.0.3 is 5yrs ago, not sure if that means no bm25 support? I can't find anything to suggest what native lucene version equates to lucenenet version (bm25 I think started in lucene 6?)

@bielu
Copy link
Contributor

bielu commented Oct 25, 2019

@mistyn8 I look into bm25, it was introduced in Lucene Release 4.0.0, it means it is not available in older versions of Lucene.

@captainjackrana
Copy link

captainjackrana commented Nov 29, 2023

In solr, that is based out of lucene, you need to define a field type with a custom similarity class and use that type in the field

Something like

<fieldType name="custom_txt" class="solr.TextField" positionIncrementGap="100">
      <similarity class="com.MySimilarityClass"/>

The custom similarity class

import org.apache.lucene.search.similarities.ClassicSimilarity;

public class MySimilarityClass extends ClassicSimilarity {
    @Override
    public float idf(long docFreq, long numDocs) {
        return 1.0f;
    }
}

And the similiarty class can be overriden and imported as a library in your solrconfig.xml
(create Java jar file and import it in your solr directory)
<lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib/" regex=".*\.jar" />

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants