Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: github orgs for /bake #41

Open
wants to merge 19 commits into
base: beta
Choose a base branch
from
Open

Conversation

mtfoley
Copy link

@mtfoley mtfoley commented Sep 1, 2023

Description

Adds optional org and archives payload params to /bake endpoint. If an org is specified in the format "https://github.com/open-sauced", the server will perform a GET request to https://api.github.com/orgs/open-sauced/repos and leverage the returned JSON to generate a list of repository URLs to process. By default, it will skip the repos that are archived, but this can be changed by setting archives=true in POST to /bake.

What type of PR is this? (check all applicable)

  • 🍕 Feature
  • 🐛 Bug Fix
  • 📝 Documentation Update
  • 🎨 Style
  • 🧑‍💻 Code Refactor
  • 🔥 Performance Improvements
  • ✅ Test
  • 🤖 Build
  • 🔁 CI
  • 📦 Chore (Release)
  • ⏩ Revert

Related Tickets & Documents

Resolves #32

Mobile & Desktop Screenshots/Recordings

N/A

Added tests?

  • 👍 yes
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need help

Added to documentation?

  • 📜 README.md
  • 📓 docs.opensauced.pizza
  • 🍕 dev.to/opensauced
  • 📕 storybook
  • 🙅 no documentation needed

[optional] Are there any post-deployment tasks we need to perform?

[optional] What gif best describes this PR or how it makes you feel?

pkg/server/server.go Outdated Show resolved Hide resolved
@mtfoley mtfoley requested a review from jpmcb September 1, 2023 13:31
@mtfoley
Copy link
Author

mtfoley commented Sep 1, 2023

hey @jpmcb thank you very much for the feedback! I think I've addressed it and I added a couple more comments to what I've got so far.

go.mod Outdated Show resolved Hide resolved
pkg/server/server.go Outdated Show resolved Hide resolved
@mtfoley mtfoley changed the base branch from main to beta September 6, 2023 00:52
@mtfoley mtfoley marked this pull request as ready for review September 15, 2023 03:59
@mtfoley mtfoley requested a review from jpmcb September 15, 2023 04:01
@mtfoley
Copy link
Author

mtfoley commented Sep 15, 2023

the review can be held off until i fix some of these linter things

@mtfoley
Copy link
Author

mtfoley commented Sep 18, 2023

@jpmcb I think I have the lint issues fixed now, should be ready for a review.

func GetGithubRepoHTMLUrls(repos []*github.Repository) []string {
var urls []string
for _, repo := range repos {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space here could be removed.

return repoList
}

func TestFilterGithubArchivedRepos(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests! <3

Comment on lines +14 to +20
func NewGithubTokenClient(token string) *GithubAPIClient {
ctx := context.Background()
s := &GithubAPIClient{
client: github.NewTokenClient(ctx, token),
}
return s
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Looks like this is dead code and isn't used. Makes sense as part of a github client package if we needed a client generated from a token in the future. Curious to hear why it was included?

Comment on lines +104 to +112
/*
Accepts URL value and calls:
normalized = common.NormalizeGitURL()
endpoint = transport.NewEndpoint(normalized)
common.IsValidGitRepo(endpoint)
return endpoint

returns first encountered error, appropriate HTTP error message, and endpoint string value
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment is abit confusing. In this case I don't think we need to state what it's calling: simply summarizing that it normalizes, validates the endpoint, and gets the valid url is fine while returning the first encountered error.

defer wg.Done()
}(htmlURL, wg, errorChannel)
}
defer wg.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't defer here: this won't block on waiting for all the go routines to finish in the waitgroup and will continue on without finishing processing squashing any potential errors that might arrise in the errorChannel

http.Error(w, "Could not process input", http.StatusInternalServerError)
return
}
} else {
go func() {
err = p.processRepository(repoURLendpoint.String())
htmlURLs, err := p.processOrg(data.Org, data.Archives)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why re-call processOrg again here? Don't you already call it to get the cloneUrls earlier?

Comment on lines +249 to +256
go func(htmlURL string, c chan error) {
_, repoURLendpoint, err := p.getValidRepoEndpoint(htmlURL)
if err != nil {
c <- err
} else {
c <- p.processRepository(repoURLendpoint)
}
}(htmlURL, errorChannel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a wait group or some way to block on all the go routines finishing. Otherwise, these will fire off and immediately get orphaned when the top level http request thread closes.

You'll also squash errors since you immediately go into checking the errorChannel without waiting for the work in the processRepository to potentially finish.

p.Logger.Errorf("Could not process repository input: %v with error: %v", r.Body, err)
errorChannel := make(chan error)
wg := new(sync.WaitGroup)
wg.Add(len(cloneURLs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I'm not sure this is the right approach.

This could cause CPU starving if we fire off a huge number of threads at once and start work in them without concern for the underlying processes. Additionally, the top level http request thread is still servicing incoming requests which may require additional threads.

I wonder if a buffered channel that we can block on is a better approach. Open to suggestions.

@jpmcb
Copy link
Member

jpmcb commented Oct 3, 2023

Hi @mtfoley - any progress on this? Happy to help where I can!

@mtfoley
Copy link
Author

mtfoley commented Oct 4, 2023

@jpmcb Time has been fleeting lately. I've done some things with them but go routines and their trappings are definitely areas I could use some more practice with. Practical examples you know of re: buffered channels would help if you know of them. Thx for the bump!

@jpmcb
Copy link
Member

jpmcb commented Oct 7, 2023

Go by example has some good stuff and a pretty light overview of the entire Go paradigm: https://gobyexample.com/channel-buffering

And you can't go wrong with the standard library's documentation: https://pkg.go.dev/github.com/eapache/channels

Good luck!! Please feel free to tag me in specific bits of code you are wondering about

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: ability to fetch all repos in a GitHub org
2 participants