-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customized stats command #113
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #113 +/- ##
===========================================
+ Coverage 58.62% 62.34% +3.71%
===========================================
Files 7 8 +1
Lines 568 725 +157
===========================================
+ Hits 333 452 +119
- Misses 235 273 +38 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reporting recipe looks really great! Asked a few questions to clarify things and made a couple of small suggestions, but I think it's fine to merge whenever you're happy with it.
Question about structure, since you've moved things around a bit: what do you think about dropping the poetry_detection
directory and just putting all of this in corppa.annotation
? There's nothing specific to poetry detection in the annotation recipes that I can think of.
# Get frequencies of page-level annotation counts | ||
count_freqs = Counter() | ||
total = 0 | ||
for count in examples_by_page.values(): | ||
count_freqs[count] += 1 | ||
total += count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you're doing here wasn't immediately obvious based on the variable names - am I understanding correctly that you're counting the number of examples/pages that have the same number of annotations, so you can report something like 100 examples have 1 annotation each, 50 have 2 annotation each, etc?
You can let Counter
do the aggregation for you by using Counter(examples_by_page.values())
.
I think it would be more readable to tally this way:
# Get frequencies of page-level annotation counts | |
count_freqs = Counter() | |
total = 0 | |
for count in examples_by_page.values(): | |
count_freqs[count] += 1 | |
total += count | |
# Get frequencies of page-level annotation counts | |
count_freqs = Counter(examples_by_page.values()) | |
total = Sum(examples_by_page.values()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the count frequencies, the frequency at which each page image has been annotated so far (like you described)
Good point on the Counter
/ sum
usage. Not sure why I missed that.
# Build session table | ||
data = [] | ||
total = 0 | ||
for session, pages in sorted(examples_by_session.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sorting so you display sessions in alpha order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so? I don't remember the reason for this; it might be residual code since the other commands sorted things.
# info = { | ||
# "Session": "Session name", | ||
# "Count": "Completed annotations", | ||
# "Unique": "Unique annotations (distinct pages)", | ||
# "Total": "Total annotations collected", | ||
# } | ||
# msg.table(info, title="Legend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover comments to be cleaned up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's a design choice. Uncommented, this prints out the legend for this table, but it's fairly verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see. Maybe add a comment about the comment, then, so someone else doesn't clean it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I'm happy to remove it, but this was more of "do we want a legend" question that I had forgotten about.
for session, pages in sorted(examples_by_session.items()): | ||
count = len(pages) | ||
unique = len(set(pages)) | ||
total += count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the total cumulative here? Maybe worth renaming the variable to clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, this is also residual code in terms of naming conventions for total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...but yes, it's the total annotations collected as described here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename the variable to cumulative_total
if it helps but that seems too long for the reporting output itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. 👍 to renaming as cumulative_total
@@ -0,0 +1,87 @@ | |||
from collections import Counter, defaultdict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be helpful to add a docstring either at the top or with the recipe explaining how you run this and showing some sample output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a good idea. Although the styling itself is a bit outside of my understanding (I believe it's rendered by an external library)
Co-authored-by: Rebecca Sutton Koeser <[email protected]>
As part of this refactor, I suggest moving out the tested part of the prodigy/annotation code into a some kind of shared utility methods file so that we can exclude the recipe code from code coverage. Here's the syntax for excluding files from codecov reporting: https://docs.codecov.com/docs/ignoring-paths |
ref: #112
Primary: Additional command recipe for displaying progress specialized to our annotation task
Secondary: Renaming recipe files to better describe the kinds of Prodigy recipes they will contain.