Customized stats command #113

laurejt · 2024-11-01T19:52:57Z

Primary: Additional command recipe for displaying progress specialized to our annotation task

Secondary: Renaming recipe files to better describe the kinds of Prodigy recipes they will contain.

codecov · 2024-11-01T19:57:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.34%. Comparing base (4f9d0ca) to head (e35a19b).
Report is 24 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #113      +/-   ##
===========================================
+ Coverage    58.62%   62.34%   +3.71%     
===========================================
  Files            7        8       +1     
  Lines          568      725     +157     
===========================================
+ Hits           333      452     +119     
- Misses         235      273      +38

rlskoeser

This reporting recipe looks really great! Asked a few questions to clarify things and made a couple of small suggestions, but I think it's fine to merge whenever you're happy with it.

Question about structure, since you've moved things around a bit: what do you think about dropping the poetry_detection directory and just putting all of this in corppa.annotation ? There's nothing specific to poetry detection in the annotation recipes that I can think of.

src/corppa/poetry_detection/annotation/command_recipes.py

rlskoeser · 2024-11-06T15:30:45Z

src/corppa/poetry_detection/annotation/command_recipes.py

+    # Get frequencies of page-level annotation counts
+    count_freqs = Counter()
+    total = 0
+    for count in examples_by_page.values():
+        count_freqs[count] += 1
+        total += count


What you're doing here wasn't immediately obvious based on the variable names - am I understanding correctly that you're counting the number of examples/pages that have the same number of annotations, so you can report something like 100 examples have 1 annotation each, 50 have 2 annotation each, etc?

You can let Counter do the aggregation for you by using Counter(examples_by_page.values()) .

I think it would be more readable to tally this way:

Suggested change

# Get frequencies of page-level annotation counts

count_freqs = Counter()

total = 0

for count in examples_by_page.values():

count_freqs[count] += 1

total += count

# Get frequencies of page-level annotation counts

count_freqs = Counter(examples_by_page.values())

total = Sum(examples_by_page.values())

These are the count frequencies, the frequency at which each page image has been annotated so far (like you described)

Good point on the Counter / sum usage. Not sure why I missed that.

rlskoeser · 2024-11-06T15:34:13Z

src/corppa/poetry_detection/annotation/command_recipes.py

+    # Build session table
+    data = []
+    total = 0
+    for session, pages in sorted(examples_by_session.items()):


Is this sorting so you display sessions in alpha order?

I guess so? I don't remember the reason for this; it might be residual code since the other commands sorted things.

rlskoeser · 2024-11-06T15:34:51Z

src/corppa/poetry_detection/annotation/command_recipes.py

+    # info = {
+    #    "Session": "Session name",
+    #    "Count": "Completed annotations",
+    #    "Unique": "Unique annotations (distinct pages)",
+    #    "Total": "Total annotations collected",
+    # }
+    # msg.table(info, title="Legend")


leftover comments to be cleaned up?

No, it's a design choice. Uncommented, this prints out the legend for this table, but it's fairly verbose.

ah, I see. Maybe add a comment about the comment, then, so someone else doesn't clean it up?

I mean, I'm happy to remove it, but this was more of "do we want a legend" question that I had forgotten about.

rlskoeser · 2024-11-06T15:35:42Z

src/corppa/poetry_detection/annotation/command_recipes.py

+    for session, pages in sorted(examples_by_session.items()):
+        count = len(pages)
+        unique = len(set(pages))
+        total += count


Is the total cumulative here? Maybe worth renaming the variable to clarify

Fair enough, this is also residual code in terms of naming conventions for total.

...but yes, it's the total annotations collected as described here

I can rename the variable to cumulative_total if it helps but that seems too long for the reporting output itself

That makes sense. 👍 to renaming as cumulative_total

rlskoeser · 2024-11-06T15:36:27Z

src/corppa/poetry_detection/annotation/command_recipes.py

@@ -0,0 +1,87 @@
+from collections import Counter, defaultdict


Would be helpful to add a docstring either at the top or with the recipe explaining how you run this and showing some sample output.

Thanks, that's a good idea. Although the styling itself is a bit outside of my understanding (I believe it's rendered by an external library)

Co-authored-by: Rebecca Sutton Koeser <[email protected]>

rlskoeser · 2024-11-06T21:25:26Z

As part of this refactor, I suggest moving out the tested part of the prodigy/annotation code into a some kind of shared utility methods file so that we can exclude the recipe code from code coverage.

Here's the syntax for excluding files from codecov reporting: https://docs.codecov.com/docs/ignoring-paths

Added command recipe & reorganized recipy files

8db1f85

laurejt requested a review from rlskoeser November 1, 2024 19:52

laurejt self-assigned this Nov 1, 2024

Fixed unit tests broken by file renaming

4be663a

rlskoeser approved these changes Nov 6, 2024

View reviewed changes

Update src/corppa/poetry_detection/annotation/command_recipes.py

e35a19b

Co-authored-by: Rebecca Sutton Koeser <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customized stats command #113

Customized stats command #113

laurejt commented Nov 1, 2024

codecov bot commented Nov 1, 2024 •

edited

Loading

rlskoeser left a comment

rlskoeser Nov 6, 2024

laurejt Nov 6, 2024

rlskoeser Nov 6, 2024

laurejt Nov 6, 2024

rlskoeser Nov 6, 2024

laurejt Nov 6, 2024

rlskoeser Nov 6, 2024

laurejt Nov 6, 2024

rlskoeser Nov 6, 2024

laurejt Nov 6, 2024

laurejt Nov 6, 2024

laurejt Nov 6, 2024

rlskoeser Nov 6, 2024

rlskoeser Nov 6, 2024

laurejt Nov 6, 2024

rlskoeser commented Nov 6, 2024

		@@ -0,0 +1,87 @@
		from collections import Counter, defaultdict

Customized stats command #113

Are you sure you want to change the base?

Customized stats command #113

Conversation

laurejt commented Nov 1, 2024

codecov bot commented Nov 1, 2024 • edited Loading

Codecov Report

rlskoeser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlskoeser commented Nov 6, 2024

codecov bot commented Nov 1, 2024 •

edited

Loading