Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Index config - better TEXT search (instead of keyword) #8584

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

davidblasby
Copy link
Contributor

Working on the GN5 (and gn-microservices) OGCAPI-Records "queryables" search, I've found that many of the elastic indexed properties are type:keyword. This changes, the most likely to be searched ones, to type:text.

type:keyword - these are more for enum-like fields where the possible values are known by the person searching. It only does "exact match"
type:text - These are free-text fields where you can search for text INSIDE the value.

For example, an email address marked as keyword would only match the exact email address ([email protected]). If you searched for david, it would not return the document.
If that email address is a marked as text, then it will be found by searching for david, blasby, david.blasby, geocat (etc...).

This makes the index more friendly to people searching in it. I expect there is a cost to doing this (i.e. index size), but I don't think this will make much of a difference in most people's installations (unless then have millions and millions of records).

The other possible issue will be returning too many documents when searching - especially inside keyword-like fields. I don't think this will be a big issue (mentioning so reviewers would be aware).

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@jahow
Copy link
Contributor

jahow commented Jan 7, 2025

I'm surprised that this could work at all. I thought "text" fields did not allow aggregations on them? This would be an issue for organization names for instance.

Haven't tested either the web-ui or GN-UI apps against this but I expect some things to not work anymore...

@davidblasby
Copy link
Contributor Author

I'm surprised that this could work at all. I thought "text" fields did not allow aggregations on them? This would be an issue for organization names for instance.

Haven't tested either the web-ui or GN-UI apps against this but I expect some things to not work anymore...

oh - thats concerning.

@davidblasby
Copy link
Contributor Author

ok, I've redone this PR. Its mostly the same as before, however, this time I've ADDED type:text fields to the index. This should allow all previous apps to work un-changed.

HOWEVER, PLEASE NOTE THAT GN5 HAS A SOMEWHAT DIFFERENT ELASTIC INDEX (its still in a bit of flux right now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants