-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scripts for PR metrics from github API #13
base: main
Are you sure you want to change the base?
Conversation
(Previously a personal project.)
Hardcoding the path to python doesn't play nice with venv.
Force users to check whether the PR's still open, and make sure an exception is raised if end date is used when it shouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that I could generally understand what the scripts are doing and that it matches what they're documented to do. I also checked that I could run the scripts on my machine and that their output looks plausible.
I left a number of comments suggesting improvements to the documentation, the coding style or the code structure. The only one I consider to be a blockers is about the script docstrings not really explaining how to run the scripts. I would prefer the other documentation improvements as well. Non-comment changes are merely nice to haves or things to do in the future, but not obstacles to merging what's already there. In the future, I would like to pass the same checks on Python code as in the mbedtls repository.
|
||
from github import Github | ||
|
||
if "GITHUB_API_TOKEN" in os.environ: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for later: make the token API (command line options, file storage, …) compatible with the official command line tool gh
. (This is not the markedly less popular python gh
.) I'd like to make my own scripts compatible too — we (you, me, and I guess at least @bensze01 as well) should coordinate and write a python module for that, if there isn't already one. (I didn't find one in a cursory search but “python github gh” are not very specific search terms.)
#!/usr/bin/env python3 | ||
# coding: utf-8 | ||
|
||
"""Produce summary or PRs pending per branch and their mergeability status.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document the requirements to run this script (run get-pr-data.py
to produce pr-data.p
). Goes for the other scripts as well.
c_open = Counter() | ||
c_mergeable = Counter() | ||
c_recent = Counter() | ||
c_recent2 = Counter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocker: is recent2
more recent than recent
or less? Are they exclusive or staggered? Better names would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer cnt_open
or even count_open
(etc), which I think help readability quite a bit
# coding: utf-8 | ||
|
||
"""Get PR data from github and pickle it.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General suggestion, not a blocker: all scripts should support --help
, and should have most of their code inside functions or classes so that they can be called from another script bypassing the command line interface.
last_q = quarter(last) | ||
|
||
cnt_all = Counter() | ||
cnt_com = Counter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could guess that cnt
stands for “count” or “counter” even without seeing = Counter()
, but what's com
? “Command”? “Communication”?
(Obviously by reading the rest of the code I can tell that it's “community”. But I wouldn't have guessed.)
And actually, as I wasn't familiar with collections.Counter
, I first thought that these were counters, but they're actually dictionaries of counters, or multisets. (I consider this a bad choice of name in the Python standard library. It's fine for this script to follow this naming choice.) It would help to know what the keys are. This goes in other scripts that use Counter
as well.
|
||
import matplotlib.pyplot as plt | ||
|
||
cnt_tot = Counter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cnt_tot[date]
is the number of open PR at date
, right?
@@ -0,0 +1,88 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really do anything useful when called as a standalone program. It should probably not have a shebang line. It already isn't executable.
pr-metrics/prs.py
Outdated
|
||
def is_community(pr): | ||
"""Return False if the PR is from a team member or from inside Arm.""" | ||
# starting from 2021 we consistently label community PRs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not that consistent, actually. There's some ambiguity over whether PR from other Arm teams or from close contributors such as Silabs are considered community. I'm not sure “has the community label” is any better than the old heuristic.
|
||
|
||
def quarter(date): | ||
"""Return a string decribing this date's quarter, for example 19q3.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful information: comparisons on the returned value implement chronological order.
(Our successors will deal with the Y21K problem.)
with open("pr-data.p", "rb") as f: | ||
prs = pickle.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing a module shouldn't have side effects. All the code here should go into a class and the loading should go into the constructor.
Labeling "size:S" based on review time (I would guess a size:M based on coding time, but this is relatively easy to review since checking what a library function does is easier than looking for a library function that does a given thing). |
- matplotlib >= 3.1 (3.0 doesn't work) | ||
- PyGithub >= 1.43 (any version should work, that was just the oldest tested) | ||
|
||
### Ubuntu 20.04 (and probaly 18.04) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: probaly -> probably
### Ubuntu 20.04 (and probaly 18.04) | |
### Ubuntu 20.04 (and probably 18.04) |
- matplotlib >= 3.1 (3.0 doesn't work) | ||
- PyGithub >= 1.43 (any version should work, that was just the oldest tested) | ||
|
||
### Ubuntu 20.04 (and probaly 18.04) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also works on 22.04 (tested by me) in case you want to note that for the future's sake.
|
||
prs = list() | ||
for p in r.get_pulls(state="all"): | ||
print(p.number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to make this more informative - maybe changing it from p.number
to "Fetching #" + str(p.number)
Might not be necessary for such a simple script though
c_open = Counter() | ||
c_mergeable = Counter() | ||
c_recent = Counter() | ||
c_recent2 = Counter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer cnt_open
or even count_open
(etc), which I think help readability quite a bit
|
||
|
||
print(" branch: open, mergeable, <31d, <8d") | ||
for b in sorted(c_open, key=lambda b: c_open[b], reverse=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd traditionally define lambda functions as lambda x: etc
, notable here since this makes it seem like there's some relationship between the lambda b
and for b
syntactically, which there isn't.
|
||
print(" branch: open, mergeable, <31d, <8d") | ||
for b in sorted(c_open, key=lambda b: c_open[b], reverse=True): | ||
print("{:>20}: {: 10}, {: 10}, {: 10}, {:10}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This produces hard to read tables if a branch name is > 20 chars long (see below). However, I can't immediately see a good/simple way to fix this, so it might just have to be accepted as a limitation
branch: open, mergeable, <31d, <8d
development: 177, 91, 51, 41
mbedtls-2.28: 6, 6, 3, 3
dev/gilles-peskine-arm/psa-test-op-fail: 1, 1, 1, 1
mbedtls-2.16: 1, 0, 0, 0
pr-metrics/prs.py
Outdated
|
||
|
||
first = datetime.date(2015, 1, 1) | ||
last = datetime.date(2099, 12, 31) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names are tricky to guess the purpose of, particularly when imported into a different file so the value isn't known. Maybe date_range_start
and date_range_end
?
ax.set_ylabel("median lifetime in days of PRs created that quarter") | ||
ax.tick_params(axis="x", labelrotation=90) | ||
bot, top = ax.set_ylim() | ||
ax.set_ylim(0, min(365, top)) # we don't care about values over 1 year |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we don't care about values over a year? Several quarters (for example 16q1
) exceed this value
Otherwise we end up including too recent PRs in the quarterly reports, which introduces unwanted variability. Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Merge today's members of the mbed-tls-reviewers into the list. Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
This reverts commit e69fb3a. This was misguided attempt: it makes the result harder to read without actually solving the underlying problem that we sometimes ask for the median of a series that's still not complete enough.
The statistics module entered the standard library in 3.4, so it should be safe to use by now. Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
- default dates - defaults should correspond to typical use - one legend was not in the same place as the others Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
|
||
import matplotlib.pyplot as plt | ||
|
||
new_days = 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a bit of a push to standardise ages for what is considered recent/old in OSS a while back. The thresholds picked were 15 and 90.
It might be useful to have a few extra ranges, e.g. <15, 15-90, 90-365, >365 to align better with this? If I have time I'll push an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking more, this is a lot like the median lifetime graph, but with a couple of thresholds which are based on age rather than percentiles. Would it be better to have a graph that shows e.g., median, 75th percentile, 95th percentile? Pros and cons either way I think, maybe worth exploring it if it's quick/easy to do but I'm not sure if it would be better or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't know about the push for standardised thresholds, indeed it would be good to align with that (and add some of our own if needed). Unfortunately the way the script is structured currently makes it a very manual change (you can't just give a list of threesholds at the top and have everything else work automagically).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding adding other percentiles to the median one, that's something I've been considering for a while, but the problem is over what set. Currently it's over "PRs created this quarter", which means, considering Q1 for the sake of concreteness we can only compute the median (or an upper bound for it) after we've closed at least 50% of PRs created in Q1. Fortunately, that's usually the case at the very beginning of Q2 when we prepare our report. (Even then, we might get only a range, as the value could still get lower if we closed a lot of PRs created near the end of Q1 at the very beginning of Q2.)
But for the 4th quartile, resp. 95th percentile, we'd need to have closed 75%, resp. 95% of PRs created in Q1 by the time we produce our report in early Q2, which realistically is not going to happen most of the time.
We could avoid the problem with incomplete data entirely by considering instead the set of PRs we closed this quarter - there all the lifetimes are known for sure and we can do stats without uncertainty. But I doesn't really tell the same thing: for example, doing at lot of historical review one quarter would raise the median age of PRs closed this quarter, but that's still a good thing. So the data might become more difficult to interpret.
So, I think there are basically three ways to select PRs over which we make stats / grouping / etc for one quarter:
- PRs created this quarter;
- PRs still open at the end of the quarter (or any specific date);
- PRs closed this quarter;
and each set will give slightly different information.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, there isn't an easy answer or obvious better way. So let's leave as-is for now.
These scripts allow to track a number of metrics about PRs over time, using the github API.
So far they've only be used by me on my machine, but they should be a shared team resource, so here's a PR. The current versions are not intended to be final, they're a starting point that we can evolve in future PRs as needed.
(Some previous review history at https://github.com/ARMmbed/mbedtls-test-old/pull/153 (private).)