Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dedicated concordance field #480

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

add dedicated concordance field #480

wants to merge 1 commit into from

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Oct 13, 2021

ping! @pelias/contributors this PR is a discussion with code attached 🚀

this year has seen some work around recording and exposing 'concordances' (the WOF term for foreign key references).
these concordances are valuable to organisations who also use the foreign ID system and would like an easy way of joining Pelias GIDs with other datasets.

Screenshot 2021-10-13 at 13 53 33

the existing implementation works great, looking at Germany in WOF you can see it returns a treasure trove of useful concordances in the addendum.

one problem we've identified with using the addendum is that it's (by definition) only semi-structured and comes without many guarantees of correctness or availability.

what would be better is if concordances were more structured and formalised within Pelias so that they could be considered a public API which integrators could rely upon for a 'crosswalk' between datasets.

this PR would potentially open the door for that, it could be combined with a PR to pelias/model to perform the validation.
the validation rules would need a little thought, but things like casing, delimiters, abbreviations, collisions, etc would need to be considered.

there is also a secondary concern (beyond simply displaying the information), which is that users may also wish to search on these values, this is certainly never going to be possible with the addendum.

introducing a new parameter would need a bit more discussion but what comes to mind is the /v1/place endpoint could support concordance lookup, either via the existing ?ids= param or a new one.

thoughts?

@missinglink
Copy link
Member Author

missinglink commented Oct 13, 2021

a bit more info on the code in this PR, the new field is called concordance and is an object type mapping with string keys (so basically it's the same sort of structure as an Object in javascript).

I think this would be preferable to something like how we do category where it's more analogous to a javascript Array.

The dynamic_templates thing is because the object keys are generated dynamically and would (by default) create fields with the default mapping, we instead define a specific mapping which sets the type to keyword.

@orangejulius
Copy link
Member

Yeah, this makes a lot of sense, and I really like the idea of querying for concordances on the place endpoint. What do you think would be a good query format for that?

My memory is a bit hazy, but I think we should be able to query on those keyword fields easily, right? We don't need to do anything else: aggregations, keywords, or regular full text search.

@missinglink
Copy link
Member Author

missinglink commented Oct 13, 2021

Yeah exactly, so it's set to keyword which means there's no analysis (it's just full token exact matching), so no synonyms or anything like that are applied.

It's currently set to doc_values=false because it doesn't make sense to run aggregations on unique values anyway.

So yeah, basically if you write a match query and it matches exactly its a hit, else not, nothing remotely fancy going on.

What do you think would be a good query format for that?

Good question, so you could just /v1/place?ids=gn:id@2222 although I'm not a big fan of mixing and matching our GID values with others, the ?id param isn't ?gid so 🤷‍♂️

Otherwise we could be more explicit and say something like /v1/place?concordance=gn:id@2222,wk:page@Germany

TBH I haven't given that enough thought, neither of those sounds very nice.

[edit] due to using an object type mapping we have key->value pairs, so it would require a convention (such as the @ in the example above) which delimited K from V.

@orangejulius
Copy link
Member

orangejulius commented Oct 13, 2021

I agree that reusing the ids parameter is not ideal.

A concordance= param would work, but like you described we would have to handle both the "field" and "value" side of the concordance query. I also think we'd really want to put some effort into making the concordance names a bit more friendly. gn:id and wk:page (and all the others as they are stored in WOF) are pretty cryptic if you don't know what they stand for.

I guess all this would complicate the /v1/place endpoint a bit, since it would support queries by ids or concordance (but not both?). That might still be worth it.

@Joxit
Copy link
Member

Joxit commented Oct 15, 2021

👍 we should not use ids for concordance.

A feature like this would be very interesting, especially with the OSM data 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants