Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Reference Citation Extractor #191

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Add Reference Citation Extractor #191

wants to merge 11 commits into from

Conversation

flooie
Copy link
Contributor

@flooie flooie commented Jan 10, 2025

Add ReferenceCitation to find citations like Foo at 123,
Requires a full citation to be present and previous something like Foo v. Bar. 1 U.S. 1.

Also fixes the extraction of defendant/plaintiff name when parallel citations exist.

Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good and very well structured. I like that you added several test cases, not just what should work.

Only found one typo in a comment and some suggestions for docstrings . It can now be merged without any problems. These are very small details.

tests/test_AnnotateTest.py Outdated Show resolved Hide resolved
eyecite/helpers.py Outdated Show resolved Hide resolved
eyecite/resolve.py Show resolved Hide resolved
Copy link
Contributor

The Eyecite Report 👁️

Gains and Losses

There were 0 gains and 15 losses.

Click here to see details.
id Gain Loss
2060699 Corp. at 564
2060699 Corp. at 565
2060699 Beckler at 775
2060699 Frohlich at 301
2829730 Layne at 405
2414924 Robinson at 1211
2414924 Brzonkala at 3
2414924 Brzonkala at 874
2414924 Brzonkala at 37
2414924 Robinson at 1210
2414924 Boerne at 2170
2414924 Brzonkala at 834
2414924 Brzonkala at 887
1433305 Gullings at 244
2267203 Fisher at 1347

Time Chart

image

Generated Files

Branch 1 Output
Branch 2 Output
Full Output CSV

@flooie flooie requested a review from mlissner January 14, 2025 19:52
@flooie flooie assigned mlissner and unassigned quevon24 Jan 14, 2025
@flooie
Copy link
Contributor Author

flooie commented Jan 14, 2025

I don’t think our testing files are as large as we state on the packaging. I downloaded the ten percent sample and ran it locally. It appears to only contain 7600 rows of opinions. A far cry from ten percent moniker of the 10 million opinion objects in the database. On the flip side that extrapolates to 126,000 reference citations that could be added to the citation database.

Also - the auto generated markdown here appears to reverse the gains and losses columns. I'm not sure why - but locally it did not do that - seems to create the markdown correctly—identifying the gains as gains. Above, it shows these are classified as losses, but you can see from the output that this isn’t the case. I’ll add some notes to the Eyecite report issue to clarify this.

On a final note, the Eyecite report did catch a regex bug that was causing a number of essentially empty citations to be found. I fixed the bug and added several additional tests to ensure this is properly handled moving forward.

@mlissner

@mlissner
Copy link
Member

Nice to see the eyecite report finding bugs; weird that it's backwards, but I guess it must have always been that way.

I don't know why the 10 percent file is the wrong size, but probably I made it using a random sample method that doesn't guarantee a particular count (and probably I had an error setting the percentage?). Seems to be work OK though, I guess.

7600 rows of opinions [...] that extrapolates to 126,000 reference citations

That comes out to 126,000 ÷ 7600 = 16.6 additional citations per case. Neat.

@flooie
Copy link
Contributor Author

flooie commented Jan 15, 2025

@mlissner -that comes out to 126,000 ÷ 7600 = 16.6 additional citations per case. Neat.

I think our wires are crossed here. this found 91 reference citations (excluding the much more common I suspect references to cases) in the 7,600 sample file.

So unless my math is wrong

(10,549,603 opinions / 7,600) * 91 ~= 126,317 reference citations

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, I don't know this code all that well anymore, but I think this looks pretty good. I guess one thing that'd give me more confidence would be more tests. Would it be possible to add a few more, including ones where the current code isn't good enough (like, perhaps, it can't find the plaintiff, or other known failure modes)?

I can't quite suss them out, but I think it'd be helpful to have them written down, even if they're known to fail.

and isinstance(preceding, FullCaseCitation)
)
if is_parallel:
# if parallel merge plaintiff/defendant data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# if parallel merge plaintiff/defendant data
# if parallel get plaintiff/defendant data from
# the earlier citation, since it won't be on the
# parallel one.

def filter_citations(citations: List[CitationBase]) -> List[CitationBase]:
"""Filter and order citations that may have reference cites out of order

:param citations: List of citation`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param citations: List of citation`
:param citations: List of citation

@@ -307,6 +307,27 @@ def disambiguate_reporters(
]


def filter_citations(citations: List[CitationBase]) -> List[CitationBase]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case for this, so I can see what it's supposed to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: PRs to Review
Development

Successfully merging this pull request may close these issues.

3 participants