Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agreement #61

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions src/reynir_correct/pattern.py
Original file line number Diff line number Diff line change
Expand Up @@ -1707,6 +1707,96 @@ def né(self, match: SimpleTree) -> None:
)
)

def agreement_conj(self, match: SimpleTree) -> None:
"""A verb, whose subject precedes a conjunction, is not in agreement with the subject.
E.g. 'Bílarnir eru léttari og gæti verið hraðari.'"""
vp = match.first_match("VP > (so_ft|so_et)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a comment here that describes exactly what this rule is correcting, preferably with an example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

if vp is None:
return
so = vp.first_match("so")
if so is None:
return
start, end = so.span
sbj = match.first_match("NP-SUBJ")
if sbj is None:
return
if len(sbj) > 1: # TODO: more accurate subject selection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever happen, with first_match()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is used in similar cases multiple times in the script, would have to check further if it's necessary

sbj = sbj[0]
variants = [f for f in so.all_variants if f != "vh"]
variants.append("fh")
suggest = self.get_wordform(so.lemma, so.cat, variants)
if not suggest:
return
text = f"Hér á sögnin '{so.lemma}' að samræmast frumlaginu '{sbj.lemma}'"
self._ann.append(
Annotation(
start=start,
end=end,
code="P_NT",
text=text,
original=so.tidy_text,
suggest=suggest,
)
)

def agreement_subpost_sing(self, match: SimpleTree) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No pattern in the script triggers this function. Should I delete it or keep it in case a pattern will be added?

"""A plural verb which precedes its subject is not in agreement with the subject,
which is singular. E.g. 'Í skrúðgöngunni eru fólk klætt...'"""
vp = match.first_match("VP > (so_ft)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, a comment would be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using black for formatting? That line is 153 characters wide! :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using black but for some reason it's not working on a part of this script. Fixed the wide lines I saw :)

if vp is None:
return
so = vp.first_match("so")
if so is None:
return
start, end = so.span
sbj = match.first_match("NP-SUBJ > { (no_nf_et|fn_nf_et) }")
if sbj is None:
return
if len(sbj) > 1: # TODO: more accurate subject selection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever happen, with first_match()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, would have to look further into this

sbj = sbj[0]
variants = [f for f in so.all_variants if f != "vh"]
variants.append("fh")
suggest = self.get_wordform(so.lemma, so.cat, variants)
if not suggest:
return
text = f"Hér á sögnin '{so.lemma}' að samræmast eintölufrumlaginu '{sbj.lemma}'"
self._ann.append(
Annotation(
start=start,
end=end,
code="P_NT",
text=text,
original=so.tidy_text,
suggest=suggest,
)
)

def agreement_concord(self, match: SimpleTree) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no pattern in the script triggers this function. Should I delete it or keep it in case a pattern will be added?

"""A pronoun is not in agreement with the following noun, e.g. 'Við kaupum ákveðin hluti'"""
np = match.first_match("NP")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

if np is None:
return
fn = np.first_match("fn")
if fn is None:
return
no = np.first_match("no")
start, end = np.span
suggest = self.get_wordform(fn.lemma, fn.cat, no.lemma, no.cat)
if not suggest:
return
text = f"Hér á fornafnið '{fn.lemma}' að samræmast nafnorðinu '{no.lemma}'"
self._ann.append(
Annotation(
start=start,
end=end,
code="P_NT",
text=text,
original=so.tidy_text,
suggest=suggest,
)
)

@classmethod
def add_pattern(self, p: PatternTuple) -> None:
"""Validates and adds a pattern to the class global pattern list"""
_, pattern, _, ctx = p
Expand Down Expand Up @@ -2896,6 +2986,24 @@ def dir4loc(verbs: FrozenSet[str], tree: SimpleTree) -> bool:
)
)

self.add_pattern(
(
frozenset(("og", "en", "heldur")), # Trigger lemma for this pattern
"S0 > { S-MAIN > { IP > { NP-SUBJ > { no_et_nf } } } C S-MAIN >> [ VP > { so_ft } .* ] }",
lambda match: self.agreement_conj(match),
None,
)
)

self.add_pattern(
(
"þessi", # Trigger lemma for this pattern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hvað með "sá"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern doesn't make sense since it's triggered by 'þessi' in a possessive noun phrase, which probably wouldn't happen. It then calls agreement_conj() which relates to conjuctions like the ones here above. Maybe it would be best to delete this pattern.

"NP-POSS > { NP-POSS > { fn_et_ef_kk } no_ft_ef_kk }",
lambda self, match: self.agreement_conj(match),
None,
)
)

self.add_pattern(
(
"né", # Trigger lemma for this pattern
Expand Down
Loading