Error parsing specific text within double quotes. #1328

brochington · 2022-07-19T00:45:17Z

I am running into a parsing discrepancy between very similar sentences:

Parsed: "Create a folder named "Right Here"."
Doesn't Parse "Create a folder named "Something Here"."

Note the only difference is replacing the word "Right" with "Something".

Thoughts?

The text was updated successfully, but these errors were encountered:

ampli · 2022-07-23T10:30:51Z

This happens because "Something" is not marked in en/4.0.dict with <marker-common-entity>.
@linas, Will it be an improvement if words within quotes become "common-entity"?

linas · 2022-08-04T12:30:27Z

Sorry for late reply. The correct fix is not obvious.

One possibility is would to run optional de-capitalization on the first word after an open-quote (because it is often the case that quoted text is a full-fledged sentence, including capitalization.)

However, in this case, "something Here" (with lower-case s but upper-case H) is not a valid sentence -- it seems that the entire sequence was meant to be a named entity. Usually, named entities have names like "Great Southern and Northern Railroad Bank Corporation" -- all caps, nouns and adjectives, and "Something Here Banking Corporation LLC" doesn't quite fit that pattern.

My calendar is very busy for the next few weeks. I'm not sure I can think clearly about this just right now. Perhaps one fix is to add an UNKNOWN_CAP_WORD regex, which would use the common-entity disjunct class. Perhaps this is the best fix? That way, anything that consists of All Cap Words and Other Stuff automatically parses as a single entity.

(My main desktop computer died ... I can't do any work just right now; I can't even run LG right now.)

brochington · 2022-08-05T00:58:14Z

@linas Thanks for the response. For my use, basically determining a touch "Something Here", the text capitalization is not too relevant, and everything within the two double parentheses should be captured as a whole. I'm wondering if it's better to go off of a preceding called | named | titled.

Thinking a little ahead: Is there a way to pass in perhaps a "dictionary extension" when parsing occurs? I have a list of proper nouns that I would like to be identified, and that list can change based on context external of the parser. However, I would like to not have to dictionary_create on every call. Thoughts?

ampli · 2022-08-05T06:35:43Z

Without an extension to the library, maybe you can use the following solution (or maybe "solution"), that needs manipulation of the text before parsing:

Identify words within double quotes (e.g. by regex).
Replace the blanks with a special character not in the character set used in your text (e.g. a letter from another language).
You can do this conditionally according to the particular words.

If needed, you can add in the 4.0.regex file a regex to identify strings with the said special character, and add its name to <UNKNOWN-WORD.a> and UNKNOWNB_WORD.n (or even add an additional entry for it with <marker-common-entity>).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error parsing specific text within double quotes. #1328

Error parsing specific text within double quotes. #1328

brochington commented Jul 19, 2022

ampli commented Jul 23, 2022 •

edited

Loading

linas commented Aug 4, 2022

brochington commented Aug 5, 2022

ampli commented Aug 5, 2022

Error parsing specific text within double quotes. #1328

Error parsing specific text within double quotes. #1328

Comments

brochington commented Jul 19, 2022

ampli commented Jul 23, 2022 • edited Loading

linas commented Aug 4, 2022

brochington commented Aug 5, 2022

ampli commented Aug 5, 2022

ampli commented Jul 23, 2022 •

edited

Loading