Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts for PR metrics from github API #13

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Conversation

mpg
Copy link
Contributor

@mpg mpg commented Oct 18, 2021

These scripts allow to track a number of metrics about PRs over time, using the github API.

So far they've only be used by me on my machine, but they should be a shared team resource, so here's a PR. The current versions are not intended to be final, they're a starting point that we can evolve in future PRs as needed.

(Some previous review history at https://github.com/ARMmbed/mbedtls-test-old/pull/153 (private).)

Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that I could generally understand what the scripts are doing and that it matches what they're documented to do. I also checked that I could run the scripts on my machine and that their output looks plausible.

I left a number of comments suggesting improvements to the documentation, the coding style or the code structure. The only one I consider to be a blockers is about the script docstrings not really explaining how to run the scripts. I would prefer the other documentation improvements as well. Non-comment changes are merely nice to haves or things to do in the future, but not obstacles to merging what's already there. In the future, I would like to pass the same checks on Python code as in the mbedtls repository.


from github import Github

if "GITHUB_API_TOKEN" in os.environ:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for later: make the token API (command line options, file storage, …) compatible with the official command line tool gh. (This is not the markedly less popular python gh.) I'd like to make my own scripts compatible too — we (you, me, and I guess at least @bensze01 as well) should coordinate and write a python module for that, if there isn't already one. (I didn't find one in a cursory search but “python github gh” are not very specific search terms.)

#!/usr/bin/env python3
# coding: utf-8

"""Produce summary or PRs pending per branch and their mergeability status."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the requirements to run this script (run get-pr-data.py to produce pr-data.p). Goes for the other scripts as well.

c_open = Counter()
c_mergeable = Counter()
c_recent = Counter()
c_recent2 = Counter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocker: is recent2 more recent than recent or less? Are they exclusive or staggered? Better names would help.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer cnt_open or even count_open (etc), which I think help readability quite a bit

# coding: utf-8

"""Get PR data from github and pickle it."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General suggestion, not a blocker: all scripts should support --help, and should have most of their code inside functions or classes so that they can be called from another script bypassing the command line interface.

last_q = quarter(last)

cnt_all = Counter()
cnt_com = Counter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could guess that cnt stands for “count” or “counter” even without seeing = Counter(), but what's com? “Command”? “Communication”?

(Obviously by reading the rest of the code I can tell that it's “community”. But I wouldn't have guessed.)

And actually, as I wasn't familiar with collections.Counter, I first thought that these were counters, but they're actually dictionaries of counters, or multisets. (I consider this a bad choice of name in the Python standard library. It's fine for this script to follow this naming choice.) It would help to know what the keys are. This goes in other scripts that use Counter as well.


import matplotlib.pyplot as plt

cnt_tot = Counter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cnt_tot[date] is the number of open PR at date, right?

@@ -0,0 +1,88 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really do anything useful when called as a standalone program. It should probably not have a shebang line. It already isn't executable.


def is_community(pr):
"""Return False if the PR is from a team member or from inside Arm."""
# starting from 2021 we consistently label community PRs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not that consistent, actually. There's some ambiguity over whether PR from other Arm teams or from close contributors such as Silabs are considered community. I'm not sure “has the community label” is any better than the old heuristic.



def quarter(date):
"""Return a string decribing this date's quarter, for example 19q3."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful information: comparisons on the returned value implement chronological order.

(Our successors will deal with the Y21K problem.)

Comment on lines +10 to +11
with open("pr-data.p", "rb") as f:
prs = pickle.load(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing a module shouldn't have side effects. All the code here should go into a class and the loading should go into the constructor.

@gilles-peskine-arm gilles-peskine-arm added the size-s Estimated task size: small (~2d) label Dec 16, 2021
@gilles-peskine-arm
Copy link
Contributor

Labeling "size:S" based on review time (I would guess a size:M based on coding time, but this is relatively easy to review since checking what a library function does is easier than looking for a library function that does a given thing).

- matplotlib >= 3.1 (3.0 doesn't work)
- PyGithub >= 1.43 (any version should work, that was just the oldest tested)

### Ubuntu 20.04 (and probaly 18.04)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: probaly -> probably

Suggested change
### Ubuntu 20.04 (and probaly 18.04)
### Ubuntu 20.04 (and probably 18.04)

- matplotlib >= 3.1 (3.0 doesn't work)
- PyGithub >= 1.43 (any version should work, that was just the oldest tested)

### Ubuntu 20.04 (and probaly 18.04)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also works on 22.04 (tested by me) in case you want to note that for the future's sake.


prs = list()
for p in r.get_pulls(state="all"):
print(p.number)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to make this more informative - maybe changing it from p.number to "Fetching #" + str(p.number)

Might not be necessary for such a simple script though

c_open = Counter()
c_mergeable = Counter()
c_recent = Counter()
c_recent2 = Counter()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer cnt_open or even count_open (etc), which I think help readability quite a bit



print(" branch: open, mergeable, <31d, <8d")
for b in sorted(c_open, key=lambda b: c_open[b], reverse=True):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd traditionally define lambda functions as lambda x: etc, notable here since this makes it seem like there's some relationship between the lambda b and for b syntactically, which there isn't.


print(" branch: open, mergeable, <31d, <8d")
for b in sorted(c_open, key=lambda b: c_open[b], reverse=True):
print("{:>20}: {: 10}, {: 10}, {: 10}, {:10}".format(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces hard to read tables if a branch name is > 20 chars long (see below). However, I can't immediately see a good/simple way to fix this, so it might just have to be accepted as a limitation

              branch:       open,  mergeable,       <31d,        <8d
         development:        177,         91,         51,         41
        mbedtls-2.28:          6,          6,          3,          3
dev/gilles-peskine-arm/psa-test-op-fail:          1,          1,          1,          1
        mbedtls-2.16:          1,          0,          0,          0



first = datetime.date(2015, 1, 1)
last = datetime.date(2099, 12, 31)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names are tricky to guess the purpose of, particularly when imported into a different file so the value isn't known. Maybe date_range_start and date_range_end?

ax.set_ylabel("median lifetime in days of PRs created that quarter")
ax.tick_params(axis="x", labelrotation=90)
bot, top = ax.set_ylim()
ax.set_ylim(0, min(365, top)) # we don't care about values over 1 year
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we don't care about values over a year? Several quarters (for example 16q1) exceed this value

@mpg mpg self-assigned this Apr 25, 2022
Otherwise we end up including too recent PRs in the quarterly reports,
which introduces unwanted variability.

Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Merge today's members of the mbed-tls-reviewers into the list.

Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
This reverts commit e69fb3a.

This was misguided attempt: it makes the result harder to read without
actually solving the underlying problem that we sometimes ask for the
median of a series that's still not complete enough.
The statistics module entered the standard library in 3.4, so it should
be safe to use by now.

Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
mpg added 4 commits April 6, 2023 08:48
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>
- default dates - defaults should correspond to typical use
- one legend was not in the same place as the others

Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>

import matplotlib.pyplot as plt

new_days = 90
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bit of a push to standardise ages for what is considered recent/old in OSS a while back. The thresholds picked were 15 and 90.

It might be useful to have a few extra ranges, e.g. <15, 15-90, 90-365, >365 to align better with this? If I have time I'll push an update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking more, this is a lot like the median lifetime graph, but with a couple of thresholds which are based on age rather than percentiles. Would it be better to have a graph that shows e.g., median, 75th percentile, 95th percentile? Pros and cons either way I think, maybe worth exploring it if it's quick/easy to do but I'm not sure if it would be better or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know about the push for standardised thresholds, indeed it would be good to align with that (and add some of our own if needed). Unfortunately the way the script is structured currently makes it a very manual change (you can't just give a list of threesholds at the top and have everything else work automagically).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding adding other percentiles to the median one, that's something I've been considering for a while, but the problem is over what set. Currently it's over "PRs created this quarter", which means, considering Q1 for the sake of concreteness we can only compute the median (or an upper bound for it) after we've closed at least 50% of PRs created in Q1. Fortunately, that's usually the case at the very beginning of Q2 when we prepare our report. (Even then, we might get only a range, as the value could still get lower if we closed a lot of PRs created near the end of Q1 at the very beginning of Q2.)

But for the 4th quartile, resp. 95th percentile, we'd need to have closed 75%, resp. 95% of PRs created in Q1 by the time we produce our report in early Q2, which realistically is not going to happen most of the time.

We could avoid the problem with incomplete data entirely by considering instead the set of PRs we closed this quarter - there all the lifetimes are known for sure and we can do stats without uncertainty. But I doesn't really tell the same thing: for example, doing at lot of historical review one quarter would raise the median age of PRs closed this quarter, but that's still a good thing. So the data might become more difficult to interpret.

So, I think there are basically three ways to select PRs over which we make stats / grouping / etc for one quarter:

  • PRs created this quarter;
  • PRs still open at the end of the quarter (or any specific date);
  • PRs closed this quarter;
    and each set will give slightly different information.

Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, there isn't an easy answer or obvious better way. So let's leave as-is for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs: work priority-low size-s Estimated task size: small (~2d)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants