-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assemble list of quirks with our full text search engine (esp. punctuation) #255
Comments
See HistoryAtState/frus@fc3f8b3 for an experiment with stopwords. |
For more on stopwords, see:
|
Regarding the Related issues: |
Search scope, index configuration, and plumbing issues
Search omissions
|
From Michael McCoyer:
The exact URL for his search that returned zero hits is: However, I found that if I changed the apostrophe in "president's" from straight ( Thus, it appears that our search engine treats the curly quote as a literal character - like a letter in a word - rather than as punctuation that should be dropped. We need to get the search engine to treat curly quotes as straight quotes. |
From @joshbotts via the mailbox:
--> User story: I want to search for "Goa" and exclude hits that are upper case ("GOA") |
This comment #255 (comment) has been already issued here -> #289 |
This issue has been spliced into different existing and new issues (including backlinks to this one):
Therefore closing this parent issue. |
@joewiz for searching for the term |
As discovered by @mcdanielhn, a search for
Banquo
will not yield documents containingBanquo's
orBanquo’s
. (Lucene's Standard Analyzer doesn't treat an apostrophe (straight or curly) as a word marking boundary, following the Unicode word boundary specification rules.)A search for
f-16
(the airplane) is effectively identical to a search forf 16
- and will yield documents containingf
or16
. Making this a phrase search"f-16"
helps, but returns results without the hyphen likef 16
. Same with the proximity search,"f 16"~0
.A search for an acronym with a slash returns a Lucene parsing error. For example,
s/s
(Office of the Secretariat Staff, Department of State) error:A workaround is to use phrase or proximity search, but again we would get undesired results, e.g., U.S.S.R. and U.S.S. are matches for
"s/s"
.@HistoryAtState/editors: Please post any more examples you know of, and we'll work with @HistoryAtState/existsolutions to try out different Lucene analyzers and/or add advanced search form controls to see what combination can produce our expected results.
The text was updated successfully, but these errors were encountered: