-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api/controller): support filtering rules Git subscriptions #1477
Conversation
❌ Deploy Preview for docs-kargo-akuity-io failed.
|
Many apologies for the delay getting to this -- 👀 -- top priority for tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @maksimstankevic! 🍏
Have made a couple of comments and suggestions, but as I am quite new to the project I would wait for the final suggestions from @krancour before making any adjustments.
internal/controller/git/git.go
Outdated
Shallow bool | ||
// Shallow indicates depth of cloning. This is useful for speeding up | ||
// cloning process. | ||
Shallow uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally rename this to Depth
(*uint8
) to match the actual argument to git
. Shallow
could then theoretically continue to be a shorthand for Depth = 1
.
However, as this is internal machinery. Just going with Depth
would probably be fine as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
var cloneShallowness uint8 | ||
if sub.ScanPaths != nil || sub.IgnorePaths != nil { | ||
cloneShallowness = 2 | ||
} else { | ||
cloneShallowness = 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var cloneShallowness uint8 | |
if sub.ScanPaths != nil || sub.IgnorePaths != nil { | |
cloneShallowness = 2 | |
} else { | |
cloneShallowness = 1 | |
} | |
var cloneDepth uint8 = 1 | |
if sub.ScanPaths != nil || sub.IgnorePaths != nil { | |
cloneDepth = 2 | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
commit, err := r.getLastCommitIDFn(repo) | ||
// head of the branch unless there are scanPaths/ignorePaths configured to | ||
// handle. | ||
commit, err1 := r.getLastCommitIDFn(repo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think the error should be dealt with before further looking at the scan or ignore paths. Reason for this is that otherwise the commit
you are making use of may actually not be available.
In addition, all err<num>
can be changed to err
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
repo, err := git.Clone( | ||
sub.RepoURL, | ||
*creds, | ||
&git.CloneOptions{ | ||
Branch: sub.Branch, | ||
SingleBranch: true, | ||
Shallow: true, | ||
Shallow: cloneShallowness, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If my Depth
comment is taken into account)
Shallow: cloneShallowness, | |
Depth: &cloneDepth, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
// strings (regexList) to check if any of them match the stringToMatch string, if | ||
// a regexp fails to compile it returns false and error, if match is found it | ||
// returns true and if match is not found it returns false. | ||
func matchesRegexList(stringToMatch string, regexList []string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a costly function, as recompilation of regular expressions happens repeatedly for every diffPath
, while scanPaths
and ignorePaths
are static.
Given this, I would restructure the code in such a way that the lists of regular expressions are compiled once before attempting to match them against diff paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
@hiddeco thanks for your review! Agree with all points, @krancour , @kencochrane, please, give me a go if you agree as well with proposed changes. |
@maksimstankevic I'm going to look at this myself today as well, but I agree with all of @hiddeco's proposed changes. (He's a new maintainer on our team, btw. 🎉 ) |
My approach was wrong, now I'm getting it - we need to obtain diffs between the commit that produced last Freight and the HEAD as you say, so Depth of clone loses its sense probably as the mentioned Freight-producing commit is at unknown Depth... Thanks a lot for your review @krancour ! I will reiterate. |
@maksimstankevic right... that was my next bit of feedback was that depth required to figure this out correctly cannot be predicted, but want to make sure we were on the same page with what diffs we need to find first... sounds like we now are. Can't tell you how much we appreciate your effort on this! |
d8aff35
to
208b4c9
Compare
api/v1alpha1/warehouse_types.go
Outdated
@@ -216,6 +230,8 @@ type WarehouseStatus struct { | |||
// ObservedGeneration represents the .metadata.generation that this Warehouse | |||
// was reconciled against. | |||
ObservedGeneration int64 `json:"observedGeneration,omitempty"` | |||
// FreightReference refers to the last Freight produced by this Warehouse | |||
FreightReference string `json:"freightReference,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maksimstankevic there is a FreightReference
type. I propose something like this:
FreightReference string `json:"freightReference,omitempty"` | |
LastFreight *FreightReference `json:"lastFreight,omitempty"` |
The benefits of this are twofold:
- No need to make a separate call later to retrieve Freight.
- Guards against the possibility that the last Freight shipped from the Warehouse has since been deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krancour, thanks, absolutely, I'm completely new to this k8s programming so still need to wrap my mind around a lot of ways :) I will refactor that once testing is successful. But I have a couple of points to raise, during my work on this I saw 2 instances of what I believe are bugs, can you give your take on those:
- I think it should not be possible to create a Warehouse with duplicated Git type subscriptions, verification should be added I suppose, below is an excerpt from a Warehouse running in my DEV env:
uid: 46e4a742-0a97-48d5-8f9a-a719922820cb
spec:
subscriptions:
- git:
branch: helm
commitSelectionStrategy: NewestFromBranch
repoURL: https://github.com/maksimstankevic/kargo-demp-876.git
- git:
commitSelectionStrategy: NewestFromBranch
repoURL: https://github.com/maksimstankevic/kargo-demp-877.git
- git:
commitSelectionStrategy: NewestFromBranch
repoURL: https://github.com/maksimstankevic/kargo-demp-877.git
status:
freightReference: 70c665c38cf6f238e7fb0a6b7763ea902b29547d
- the interesting one: per my testing I see that in case of multiple Git supscriptions in a Warehouse a valid failure to get a new commit from one of Git repo subscriptions blocks normal progress of others, meaning new valid commits in other Git subscriptions will not be able to produce new Freight if one Git subscription fails to select a commit due to valid reasons (scanPaths/ignorePaths or Tag configurations): https://github.com/akuity/kargo/blob/c9aebcd37e20d40f75c31504297126fafc926968/internal/controller/warehouses/git.go#L58C1-L60C29
Don't you think that in such case we should just let the failing Git repo subscription to stay where it was is but if others have new valid commits - we should produce new Freight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not be possible to create a Warehouse with duplicated Git type subscriptions, verification should be added I suppose
Additional validations seldom hurt. I would suggest opening a new issue for it and not dealing with it here.
per my testing I see that in case of multiple Git supscriptions in a Warehouse a valid failure to get a new commit from one of Git repo subscriptions blocks normal progress of others
This is by design. If you think of a piece of Freight as a box containing all the artifacts required for your application, including image, manifests, etc., a failure to fetch any one of those means you have an incomplete set.
Don't you think that in such case we should just let the failing Git repo subscription to stay where it was
I guess this suggests that if I'm looking for versions of x and y and I find x, but don't find y, we should use the last known version of y... the problem is that if there were a version of y that fit your current constraints, it would have been found, meaning the last known version of y does not fit your constraints. The main scenario where I can see this happening is when someone has changed their constraints. They previously wanted ^1.0.0 of y and now want ^2.0.0. If nothing matching ^2.0.0 was found, it's incorrect to move forward with the old y that matched ^1.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that makes sense, for validation of duplications I'll open an issue, that was my thought initially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krancour I pushed the changes, please, review when you get a chance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maksimstankevic at a glance, this is looking like it's almost there. I promise a more thorough review tomorrow (Tuesday).
Let me say again how much we appreciate this. This was an ambitious feature to bite off and it's coming along nicely!
selectedCommits, err := r.selectCommitsFn( | ||
ctx, | ||
warehouse.Namespace, | ||
warehouse.Spec.Subscriptions, | ||
repoCommitMappings, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think things would be tidier and more testable overall if we passed the warehouse.Status.LastFreight
here and let selectCommits()
build the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored:
b8667c6
@maksimstankevic this looks amazing! I commented inline with just small bits of feedback -- nothing major. An unfortunate thing we do have to contend with here is a lot of merge conflicts. Most of them seem to arise from our codegen process having undergone an extreme makeover in #1515. Those are probably not resolvable by hand. If you want to resolve them yourself, the easiest approach would be:
Alternatively, I am happy to run point on resolving those because it's my slow review that put this behind #1515 in the first place. 🙏 Just let me know how you want to proceed. |
@krancour this is already being handled by "repo#branch - commit" mappings, such a new wild first commit in a reconfigured repo is not there in our LastFreight history - so it produces Freight by default, please take a look at my testing above. |
I will leave this to be answered by @krancour, because I took the concept from existing commit tag filtering rules that as well result in an error getting last commit when rules are not satisfied. |
After a good night of sleep, I can not reproduce the first issue I reported anymore either. Sorry about creating noise, as I probably messed something up due to being tired. 🤦
But it's not a black box in this case, you can know that HEAD == base commit. Which means the expectation is there to be no divergence, because there simply are not any changes which should be filtered. At present, I find the combination of last freight having a reference to the same commit the error complains about highly confusing. Because it makes me think something is wrong with my filtering configuration that I need to address (which actually is not the case, it's just waiting for a new HEAD).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the error is more of a user experience thing. I am inclined to say that this can be merged as is, and we can do refinements later.
Signed-off-by: Maksim Stankevic <[email protected]>
@hiddeco I see your point now and agree actually, I sent a tiny change, now logs are clean on reconciliations when latest commit == baseCommit |
Think this addresses it nicely, thanks 🙇 |
@maksimstankevic and @hiddeco great to sit down this morning and see you guys have hashed everything out! I want to raise one final UX thing that I am sorry to say I overlooked until just now... I get that it was likely modeled after tag filters, but are regular expressions really the best way to do path matching? I have some doubts about that. At least it doesn't feel like something I have commonly encountered. wdyt? |
I was on the fence about this myself as well. Based on experience in another project, we used a
This worked really well, except that extensive ignore rules can become complex in cases where you have patterns in what you want to include (e.g. |
My 2 cents, I am working as devops and part-time developer for 10 years now and when I saw this issue open I knew it has to absolutely be regexps in both filters right from the start. It gives me as a user the flexibility I need and if it hadn't regexp support I would bomb you with CRs for implementing it. I'm pretty sure most devops and developers know how to use regexp and that should not be an obstacle. I don't feel it's hitting performance either, but I don't pretend to be an expert here. |
@maksimstankevic I think my worry is that someone who doesn't read the docs might intuitively do something like |
What I suggest to overcome this issue (and also to potentially allow adding e.g. glob support later) is to require an identifier to be specified. For example:
Based on experience with these kind of identifiers, I would both allow |
@krancour yes I understand that perfectly. And I still need regexp here as a user. Anything you decide, I can work on it. Not before the release though, I actually need a break, can that be done as a refinement after 0.5.0? |
@maksimstankevic you've been so accommodating. We're intent on getting this into v0.5.0. At the start of our current release cycle, we were fully prepared to be implementing this ourselves and it was a surprise to see a community member step up. @hiddeco will take this the last mile and change nothing other than requiring a prefix to signal use of a regex and will otherwise assume items in the lists to be full paths or path prefixes. All credit for this PR goes to you. You're going to end up with a big thank you in the release notes. |
Signed-off-by: Hidde Beydals <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets an enthusiastic LGTM from me.
@hiddeco kept it as regex matching only, but simply requiring the "regex" or "regexp" prefix.
We can add prefix and glob based path matching in v0.6.0.
@maksimstankevic thank you so much for the extensive work on this! |
@maksimstankevic that sounds great! Thank you. |
#1270