Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter search on top concepts #128

Open
vpeil opened this issue Oct 27, 2020 · 5 comments
Open

Filter search on top concepts #128

vpeil opened this issue Oct 27, 2020 · 5 comments
Labels
feature Additional functionality question Further information is requested

Comments

@vpeil
Copy link

vpeil commented Oct 27, 2020

This may be beyond the scope of this project, but would be very useful. I would like to filter search results by top concepts.

Any idea, how this could be achieved?

@stefandesu
Copy link
Member

Hi!

You mean that you want to restrict a search to concepts that are descendant from a certain top concept? I think we had other features in mind that have the same premise (looking at a particular subtree only, e.g. gbv/jskos-metrics#9), so it's certainly worth looking into it. I'm wondering how we could implement this efficiently. Maybe @nichtich has an idea?

@nichtich
Copy link
Member

We could generate and index the ancestors field and allow to filter with query parameter ancestor={uri}. This only makes sense for mono-hierarchical vocabularies or cases where there the selected ancestor to filter with is reachable via all broader-pathes - but we don't need to check this. Adding ancestors to the database could be tricky for arbitrary concept updates because an updated concept might modify ancestor chains anywhere.

Maybe MongoDB graphLookup can help. The field to build the graph from is broader[0].uri.

@vpeil
Copy link
Author

vpeil commented Nov 2, 2020

Yes, my use case in monohierarchical.

I will have a look at the graphLookup of MongoDB. I will post my findings here in any case, but this will take some time....

@stefandesu
Copy link
Member

$graphLookup can definitely be used to implement this, but I'm not sure if it's possible to do it efficiently, i.e. without having to go through the whole Concepts collection.

@stefandesu
Copy link
Member

I played around with $graphLookup a little bit (also because it might be useful for a different issue) and found something that could work, however only in a restricted fashion:

db.getCollection('concepts').aggregate([
{
    $match: { uri: "http://rvk.uni-regensburg.de/nt/A" }
},
{
    $graphLookup: {
        from: "concepts",
        startWith: "$uri",
        connectFromField: "uri",
        connectToField: "broader.uri",
        as: "descendant",
        restrictSearchWithMatch: {
            _keywordsLabels: { $regex: "^BIB" }
        }
    }
},
{
    $unwind: "$descendant"
},
{
    $replaceRoot: { newRoot: "$descendant" }
}
])

So we match only the desired parent concept (doesn't have to be a top concept), then we do a graph lookup like @nichtich described, but in reverse (matching from uri to broader.uri, and use the restrictSearchWithMatch to specify the search conditions. Then we unwind and replace the root.

Why did I say "restricted fashion"? The problem is that restrictSearchWithMatch doesn't seem to work with text indexes, and the query needs to be restrictive enough that the results can fit in memory. For reasons I don't fully understand, MongoDB has to load ALL results into memory first even if we only want a subset (e.g. the first 100). So the above example without restrictSearchWithMatch will not fit into memory, for example. I don't see a technical reason for this, either this use case is not common enough that MongoDB can't do it, or I'm missing something.

I'm mostly writing this down to document my findings. I still haven't fully grasped $lookup and $graphLookup and keep expecting them to do things they apparently cannot do. As mentioned somewhere else, sometimes I think a relational database would have been a better choice.

(@nichtich's first solution, i.e. generating and indexing an ancestors field, would still work and be very performant because it could use an index. The downside is, as always with these things, storage space. Having ancestors in the database for every concept takes up quite a lot of space.)

@nichtich nichtich added feature Additional functionality question Further information is requested labels Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Additional functionality question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants