Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Link Copybara-generated PRs to the Googler who authored them #448

Open
bhack opened this issue Dec 21, 2022 Discussed in #63 · 30 comments
Open

Better Link Copybara-generated PRs to the Googler who authored them #448

bhack opened this issue Dec 21, 2022 Discussed in #63 · 30 comments
Assignees
Labels
enhancement New feature or request

Comments

@bhack
Copy link
Contributor

bhack commented Dec 21, 2022

Discussed in #63

Originally posted by bhack December 8, 2022
We are still having the same well known issues with copy-bara bot PRs.

See our full thread with @qlzh727 at Keras keras-team/keras#17183 (comment)

@GMNGeoffrey
Copy link
Contributor

It's not possible to have the PR marked as created by the original author, as that would require the bot to be able to have each user's credentials, but we can indeed make this better. In documentation for procedures for the separate XLA repo, we are calling out the requirement to list yourself in the authorship mapping. So that gets us the commit in the PR marked with the correct author, at least. I may be able to make this a hard failure if the mapping doesn't work, although there will still have to be some way to bypass that, as XLA will be required to accept changes for large migrations across the monorepo. I will investigate.

I think Copybara does not currently expose the GitHub username, which would allow us to @ mention the author in the PR as well. I vaguely recall wanting to do this with another project in the past. I will file a feature request for that.

@GMNGeoffrey GMNGeoffrey changed the title PR author/subcription/notification Better Link Copybara-generated PRs to the Googler who authored them Feb 2, 2023
@GMNGeoffrey GMNGeoffrey self-assigned this Feb 2, 2023
@bhack
Copy link
Contributor Author

bhack commented Feb 2, 2023

If I remember correctly by a comment in an old ticket not all the authors have a GitHub account (or it was just not mapped?).
If we solve this point I don't know if we could notify with:

google/copybara#155 (comment)

@GMNGeoffrey
Copy link
Contributor

Yes not all Google authors will necessarily have a GitHub account. That should only be the case for people outside the team, e.g. the large-scale cleanups I mentioned above. We can't "fix" that in that I'm not in a position to mandate that all Google engineers create a GitHub account :-)

We might be able to tag the reviewer in the case where the author doesn't have an account though, and perhaps even fall back finally to a default person to debug the issue. My understanding is that the current behavior where the commit author is set isn't sufficient because:

  1. It's less visible
  2. They aren't subscribed to PR notifications, so someone commenting on the PR has to remember to look them up and @ them.
  3. Sometimes the author may not have a GitHub account or it may not be mapped, so then the PR is orphaned

@bhack
Copy link
Contributor Author

bhack commented Feb 2, 2023

Yes not all Google authors will necessarily have a GitHub account. That should only be the case for people outside the team, e.g. the large-scale cleanups I mentioned above. We can't "fix" that in that I'm not in a position to mandate that all Google engineers create a GitHub account :-)

Unfortunately, this is a recurrent and general problem visible in many non Github first opensource projects. At least it seems different the case where authors doesn't have a Github and when they have not mapped one.

My understanding is that the current behavior where the commit author is set isn't sufficient becaus

Yes these are my same points and quite the same issues with default copybara managed repos.

Until we can all stay on the same page on a more Github friendly development, a Googler with a github account must act as a fallback between the external opensource world and the internal one before push on "anon" PRs.

@GMNGeoffrey
Copy link
Contributor

Yup makes sense to me. I agree that projects that don't have a GitHub SoT are less friendly to open development. I generally advocate for moving to a GitHub SoT for that reason (I work on the IREE project), but in cases where it's not feasible/resourced we can at least make the experience much better than it is today. The Copybara team has responded to my feature request to expose the GitHub username though I don't have an idea on timeline yet. We may actually be able to do better than just @ing the PR author and instead correctly map the author as assignee on the PR and the reviewers as reviewers.

I know there have been a number of issues with Copybara/google3 source-of-truth ergonomics in the past and I'm sure you've filed tickets for them, but would you mind linking them here or in a new issue in this repo? My current list off the top of my head:

  1. Useless "internal change" commit descriptions
  2. Confusing rollback commit descriptions
  3. Rollbacks without clear explanations
  4. Convoluted main branch git commit history (only merge commits)
  5. Opaque internal-only failures

@GMNGeoffrey
Copy link
Contributor

For Googlers, Copybara FR is http://b/267547747

@bhack
Copy link
Contributor Author

bhack commented Feb 2, 2023

The tickets are sparse and distributed over many years and repos.

Just something that I have in mind:
tensorflow/community#412
tensorflow/community#425

Another point is that the code naturally accumulate over time many inline TODOs in internal only b/ format or mentioning internal username not visible on GitHub.

@bhack
Copy link
Contributor Author

bhack commented Feb 6, 2023

@GMNGeoffrey Another general point is that the development pipeline and dev-env misalignment it is going to general impact also inclusivity metrics.

Over time dev could care more about the internal system than the external one.

One metric is:
https://chaoss.community/kb/metric-issue-label-inclusivity/

But we are generally going to impact many more interesting metrics (e.g. occasional contributors, elephant factor, etc..):
https://chaoss.community/kbtopic/all-metrics/

@theadactyl
Copy link
Member

Thanks for your feedback @bhack. So we don't lose the particular feedback/discussion on improving author attribution, let's keep this issue on that topic.

@bhack
Copy link
Contributor Author

bhack commented Feb 11, 2023

Nothing new honestly, but I hope we can do more than just the author attribution. @GMNGeoffrey has done a great work on managing IREE on Github (now migrating under OpenXLA) to put the community on the same page as internal devs.

@GMNGeoffrey
Copy link
Contributor

GMNGeoffrey commented Feb 13, 2023

I'm glad that our efforts on IREE have been appreciated. We have put quite some thought into it. We had the advantage of planning for OSS from even before we released on GitHub. XLA has much more history and stuff depending on it right now, so we're more constrained, but we are working to improve the situation. I agree with @theadactyl that it's better to keep this issue on topic, so that we can track the work required (sorry I asked tangential questions). If there are other specific problems you see, please do file issues. I'll create separate issues out of your comments here.

@GMNGeoffrey
Copy link
Contributor

FYI the Copybara team implemented some of the blocking functionality for this in google/copybara@95aff19. There's still some blockers, but we're thinking this should be functioning soon. @ddunl will be handling actually adding this to our config

@bhack
Copy link
Contributor Author

bhack commented Apr 13, 2023

Thanks for the update.

@ddunl
Copy link
Member

ddunl commented Apr 14, 2023

This should be working now. Example here: #2532 . I'm not sure there will be a commit to refer to (modification only touches internal files), but I'll update this comment if there is. It looks like we'll need to reach out to the copybara team again to be able to assign reviewers, so I'll leave this open for now

@bhack
Copy link
Contributor Author

bhack commented Apr 14, 2023

But what happens where the author is the anon tensorflower-gardener? Like in #2527

@ddunl
Copy link
Member

ddunl commented Apr 15, 2023

There isn't much we can do in that case - I'm planning on making a wider push to ensure all regular contributors to XLA have their github accounts registered so that they won't show up as anonymous, but that PR in particular is from someone who comes from a completely different team and has never committed to XLA in the past. Cases like those we could only solve by mandating that every Googler has a Github account, which is out of my control

@bhack
Copy link
Contributor Author

bhack commented Apr 15, 2023

Cases like those we could only solve by mandating that every Googler has a Github account, which is out of my control

Yes I suppose that this will really require to scale the problem up to a general policy when contributing CL to published OSS project. But it was never resolved in many years so I don't think we could solve this just for OpenXLA.

Is the gardner still the top contributor in this repo?

@ddunl
Copy link
Member

ddunl commented Apr 15, 2023

That's a good point - I'm actually a bit surprised that the gardener is still so high. Part of that is due to LLVM integrates, but there are still many more commits than I would've thought that are anonymous, including some authored by people I can reach out to directly.

@bhack
Copy link
Contributor Author

bhack commented Apr 16, 2023

As this is a global issue cause historically we have this problem in many Google opensoruce projects and we don't have a global policy and the related technical tool to pre-check/enforce that a published CL authors have a mapped GitHub Account I think that best practical and independent solution is to adopt and maintain a good GitHub code ownership configured for the repo.

In this way we are at least always sure that there is an non-anonymous reviewer/proxy for "anon PR" whit which normal GitHub users can interact.

@ddunl
Copy link
Member

ddunl commented Apr 20, 2023

This is a good point - we will probably avoid using GitHub's codeowners, but something along these lines would make sense. Some document which records who is a good person to tag for different parts of the codebase. (We'll probably avoid codeowners because everyone would get assigned to the Copybara generated PRs that originate internally, so there'd be a ton of spam).

@bhack
Copy link
Contributor Author

bhack commented Apr 20, 2023

because everyone would get assigned to the Copybara generated PRs that originate internally

Why everyone? I think only the specific component/subfolder owner/proxy.
Isn't this exactly the point to interact with a real GitHub user?

We can just exclude specific folders/file patterns related to recurrent commits (e.g. llvm updates)

@GMNGeoffrey
Copy link
Contributor

One of the issues with GitHub's codeowners system is that everyone who is a code owner for every path in the PR will get assigned. That's pretty noisy. We use it in IREE to make sure things don't fall through the cracks, but I definitely get a lot of unnecessary email.

It's even worse if you want to use it to gate submission (code owners must approve) because who can approved is tied to who must email. This makes it difficult to deal with ownership at different scopes. Like if I have general responsibility over everything in directory /a/, but someone else has it for the more narrow /a/b/.

I think David's concern is with emails when Copybara generates a PR from an internal CL. The reviewer will already know about the change and so the additional email will just be spam and would likely lead to people ignoring GitHub emails more broadly. I think we will actually have the same issue with setting reviewers to the internal reviewers. It should be relatively easy for people to only filter these emails though. They should look like:

copybara-service requested your review on

and cc'ed to [email protected]

@bhack
Copy link
Contributor Author

bhack commented Apr 21, 2023

It seems you don't like the default service that GitHub is offering with the current features/granularity.
But there are other bots to explore if you don't have your own already perfectly fitted.

Just to make an example but there are others to evaluate:
https://github.com/quarkusio/quarkus-github-bot

Also I think that as a plain alternative being subscribed to all the PRs cause we can potentially have comment to "anon" CL in PR stage it is not going to create less traffic for the owners as everybody need to be subscribed to everything or at least we are going directly into the triage loop that it is not the best solution in fast merging CL originated PRs.

I could be wrong but it seems that the point is that as the CL was already reviewed/approved internally we don't care too much about it in "Gtibub world".

But I think it is always an opportunity to involve people on GitHub that were potentially excluded in the CL (private) lifecycle phase.

@GMNGeoffrey
Copy link
Contributor

It seems you don't like the default service that GitHub is offering with the current features/granularity.
But there are other bots to explore if you don't have your own already perfectly fitted.

Yes, this is why David said that we might not use the GitHub CODEOWNERS mechanism, specifically

@bhack
Copy link
Contributor Author

bhack commented Apr 25, 2023

Do we have an edge case with this one #2656?

@GMNGeoffrey
Copy link
Contributor

Hmm thanks for pointing that out. I see an error in the logs, but I don't know what's different about this CL/PR compared to the others. @ddunl can you investigate? I'll link you the logs

@bhack
Copy link
Contributor Author

bhack commented May 1, 2023

I cannot see the log but probably are the force pushed anon. The authored commit is in the middle.

@GMNGeoffrey
Copy link
Contributor

Hmm looks like it's not a one off. This one has a problem too: #2749

@ddunl
Copy link
Member

ddunl commented May 1, 2023

Apologies for not responding to this earlier at @bhack, when you originally commented about this I looked into it briefly but couldn't find anything. It seems like this is due to the Googlers authoring the PRs not being in the openxla org and Github (for some reason) won't let them be assigned for that reason. (Even though it seems like I can assign them manually). We've already sent invites to the whole team so I guess I'll need to make sure everyone knows they got an email that will let them join!

@bhack
Copy link
Contributor Author

bhack commented Jul 20, 2023

Thanks @GMNGeoffrey for you great OSS aligment effort. Goodluck for your next project.

@penpornk penpornk added the enhancement New feature or request label Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants