Skip to content

Commit

Permalink
add_page_numbers.py: modified reSENTTERMINAL for better prefix tagging
Browse files Browse the repository at this point in the history
  • Loading branch information
tomlup committed Dec 31, 2023
1 parent 2a0f024 commit 613a8c2
Show file tree
Hide file tree
Showing 3 changed files with 176 additions and 176 deletions.
2 changes: 1 addition & 1 deletion all-examples/add_page_numbers.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
reFXNS = re.compile(r'^((Head|Mod(ifier)?|P(redicator)?|Predicate|Comp[12]?|PredComp|Nucleus|Prenucleus|Subj(ect)?|O(bj(ect)?)?[12]?|Det|Det-Head|Mod-Head|Subj-det|Subj-det-Head|PredicatorPredComp|Marker|Coordinate[12]|Supplement):)+$')
reTreeHeader = re.compile(r'^\[\d+\]a\.(Clause|NP|NPinterrog|PP|VP)b\.(Clause|NP|NPinterrog|PP|VP)(c\.(Clause|NP|NPinterrog|PP|VP))?$')
reNUMERICEX = re.compile(r'^\[\d+\]')
reSENTTERMINAL = re.compile(r'[.!?](\t|$)')
reSENTTERMINAL = re.compile(r'(((?<!etc)\.)|[!?])(\t|$|])')

This comment has been minimized.

Copy link
@nschneid

nschneid Jan 1, 2024

Contributor

Should the last ] be backslash-escaped?

This comment has been minimized.

Copy link
@tomlup

tomlup Jan 2, 2024

Author Collaborator

PyCharm was telling me that the backslash was redundant since it's interpreted as a regular character in this line, but I could add it back for readability.

This comment has been minimized.

Copy link
@nschneid

nschneid Jan 2, 2024

Contributor

Weird. It appears that technically ] does not need to be escaped if there is no corresponding [. But a backslash would help readability IMO.


reLI = re.compile(r'(^\[[0-9A-Z]+\](?=\t))|\t[xvi]+(?=\t)|\t([a-z])\.(?=\t)')

Expand Down
Loading

0 comments on commit 613a8c2

Please sign in to comment.