-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Reference Citation Extractor #191
base: main
Are you sure you want to change the base?
Conversation
Add XYZ at 123 to eyecite capability Improve and recognize parallel citations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good and very well structured. I like that you added several test cases, not just what should work.
Only found one typo in a comment and some suggestions for docstrings . It can now be merged without any problems. These are very small details.
The Eyecite Report 👁️Gains and LossesThere were 0 gains and 15 losses. Click here to see details.
Time ChartGenerated Files |
I don’t think our testing files are as large as we state on the packaging. I downloaded the ten percent sample and ran it locally. It appears to only contain 7600 rows of opinions. A far cry from ten percent moniker of the 10 million opinion objects in the database. On the flip side that extrapolates to 126,000 reference citations that could be added to the citation database. Also - the auto generated markdown here appears to reverse the gains and losses columns. I'm not sure why - but locally it did not do that - seems to create the markdown correctly—identifying the gains as gains. Above, it shows these are classified as losses, but you can see from the output that this isn’t the case. I’ll add some notes to the Eyecite report issue to clarify this. On a final note, the Eyecite report did catch a regex bug that was causing a number of essentially empty citations to be found. I fixed the bug and added several additional tests to ensure this is properly handled moving forward. |
Nice to see the eyecite report finding bugs; weird that it's backwards, but I guess it must have always been that way. I don't know why the 10 percent file is the wrong size, but probably I made it using a random sample method that doesn't guarantee a particular count (and probably I had an error setting the percentage?). Seems to be work OK though, I guess.
That comes out to |
@mlissner - I think our wires are crossed here. this found 91 reference citations (excluding the much more common I suspect references to cases) in the 7,600 sample file. So unless my math is wrong (10,549,603 opinions / 7,600) * 91 ~= 126,317 reference citations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, I don't know this code all that well anymore, but I think this looks pretty good. I guess one thing that'd give me more confidence would be more tests. Would it be possible to add a few more, including ones where the current code isn't good enough (like, perhaps, it can't find the plaintiff, or other known failure modes)?
I can't quite suss them out, but I think it'd be helpful to have them written down, even if they're known to fail.
and isinstance(preceding, FullCaseCitation) | ||
) | ||
if is_parallel: | ||
# if parallel merge plaintiff/defendant data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# if parallel merge plaintiff/defendant data | |
# if parallel get plaintiff/defendant data from | |
# the earlier citation, since it won't be on the | |
# parallel one. |
def filter_citations(citations: List[CitationBase]) -> List[CitationBase]: | ||
"""Filter and order citations that may have reference cites out of order | ||
|
||
:param citations: List of citation` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param citations: List of citation` | |
:param citations: List of citation |
@@ -307,6 +307,27 @@ def disambiguate_reporters( | |||
] | |||
|
|||
|
|||
def filter_citations(citations: List[CitationBase]) -> List[CitationBase]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test case for this, so I can see what it's supposed to do?
Add ReferenceCitation to find citations like Foo at 123,
Requires a full citation to be present and previous something like Foo v. Bar. 1 U.S. 1.
Also fixes the extraction of defendant/plaintiff name when parallel citations exist.