Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Add race option to detect raced codes #10899

Merged
merged 6 commits into from
Oct 23, 2024

Conversation

sivchari
Copy link
Member

What this PR does / why we need it:

I added -race option to go test command. This option can find the raced code.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 18, 2024
@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area PR is missing an area label size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 18, 2024
@sivchari
Copy link
Member Author

/assign

@fabriziopandini
Copy link
Member

/area testing
/test pull-cluster-api-e2e-full-main

@k8s-ci-robot k8s-ci-robot added the area/testing Issues or PRs related to testing label Aug 2, 2024
@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-30-1-31-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/area testing
/test pull-cluster-api-e2e-full-main

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Aug 2, 2024
@fabriziopandini
Copy link
Member

/test pull-cluster-api-e2e-main

@fabriziopandini
Copy link
Member

Overall lgtm, running E2E to validate changes in the in memory provider

Makefile Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 3, 2024
@chrischdi
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 5, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5931a1618319937c284bd75a36f1709a484a6c7e

@@ -50,10 +50,10 @@ func (c *cache) startSyncer(ctx context.Context) error {
c.syncQueue.ShutDown()
}()

syncLoopStarted := false
syncLoopStarted := make(chan struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would drop this entirely.

This now just checks that we got until l.56. I'm not sure I understand why we are waiting for that. At this point the only guarantee is that the log was written, which doesn't make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I fixed it.

@sbueringer
Copy link
Member

@sivchari 2 smaller findings. Sorry for the misunderstanding here: #10899 (comment)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2024
@sivchari
Copy link
Member Author

/retest

@sivchari
Copy link
Member Author

When I remove the syncLoopStarted, the other race errors are spawned.
I'd investigate later.

@sbueringer
Copy link
Member

sbueringer commented Aug 20, 2024

When I remove the syncLoopStarted, the other race errors are spawned. I'd investigate later.

Hm not sure why that is, but these errors look entirely unrelated (they are even in a different go module)

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 24, 2024
@sivchari
Copy link
Member Author

sivchari commented Sep 24, 2024

@sbueringer
Sorry for too late reply and thank you so much for your efforts.
I updated this branch.
Perhaps a little speed reduction, but I think it's more important to detect race codes than its speed.
Again, thanks for your brilliant jobs.

@sbueringer
Copy link
Member

@sivchari Can you re-add the -race flags to the test targets that don't have it already? So we can compare how muich longer the Job tests if we enable the race detector everywhere

@sivchari
Copy link
Member Author

sivchari commented Oct 2, 2024

@sbueringer
Sorry, I was on vacation.

Can you re-add the -race flags to the test targets that don't have it already? So we can compare how muich longer the Job tests if we enable the race detector everywhere

I'm not sure what you mean. -race is already added on each task (e.g. test, test-junit). It's enough, isn't it ? If not, please teach me what you want to do.

@sbueringer
Copy link
Member

No worries. -race is only set on some test targets. A previous version of this PR was setting it on all. I would like to set it on all test targets again that don't have it at the moment

@sivchari
Copy link
Member Author

sivchari commented Oct 4, 2024

You mean, you want to remove !race tag in all test targets, right ?

@sbueringer
Copy link
Member

sbueringer commented Oct 4, 2024

No, I want to add -race to all test Makefile targets that don't have it yet. For example test-docker-infrastructure

A previous version of your PR already had it correctly on all targets. My PR only added it to the most important test targets

@sivchari
Copy link
Member Author

sivchari commented Oct 7, 2024

Okay, I got it. Sorry for taking your time.
I'd take care of it by the end of this week.

@sbueringer
Copy link
Member

Thx! No problem at all and no rush! 😀

Signed-off-by: sivchari <[email protected]>
Signed-off-by: sivchari <[email protected]>
Signed-off-by: sivchari <[email protected]>
Signed-off-by: sivchari <[email protected]>
Signed-off-by: sivchari <[email protected]>
@sivchari
Copy link
Member Author

I re-added the -race to each target

@sbueringer
Copy link
Member

/test pull-cluster-api-test-main

@@ -50,10 +50,8 @@ func (c *cache) startSyncer(ctx context.Context) error {
c.syncQueue.ShutDown()
}()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sivchari I took another look. Let's keep this but make it concurrency safe
l.53

var syncLoopStarted atomic.Bool

l.56

syncLoopStarted.Store(true)

l.85

if !syncLoopStarted.Load() {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SubhasmitaSw
Thank you for commenting about it. Surely, it might be right, but I think it's not necessary. In l.83, we check if all workers starts by atomic.Load(&workers) < int64(c.syncConcurrency) and I believe it's enough to achieve to block data race. Thanks.

Copy link
Member

@sbueringer sbueringer Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check in l.83++ does not include the syncloop (l.53-l.63)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you for your comment. Sorry, I mentioned codes after changing.

syncLoopStarted is only used in codes below

	go func() {
		log.Info("Starting sync loop")
		syncLoopStarted = true
		for {
			select {
			case <-time.After(c.syncPeriod / 4):
				c.syncGroup(ctx)
			case <-ctx.Done():
				return
			}
		}
	}()
	if err := wait.PollUntilContextTimeout(ctx, 50*time.Millisecond, 5*time.Second, false, func(context.Context) (done bool, err error) {
		if !syncLoopStarted {
			return false, nil
		}
		return true, nil
	}); err != nil {
		return fmt.Errorf("failed to start sync loop: %v", err)
	}

c.syncGroup is originally concurrency program since using goroutine. So it depends on the timing of Go runtime to change syncLoopStarted from false to true. In other words, we don't originally consider the timing when this variable is true. Thus, I think the syncLoopStarted is unnecessary. We can keep concurrency safe here.
What do you think ?

Copy link
Member

@sbueringer sbueringer Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you are saying. Yes the old code was not correctly waiting for syncLoopStarted to be true. So it might not be that important. But I would prefer cleanly waiting for the "syncLoop" go routine to be started compared to not waiting for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to say is that syncLoopStarted might we can be worth removing. But when I confirmed your previous comment, I'd decide to keep this variable while changing the type to atomic.Bool. Thanks.

Signed-off-by: sivchari <[email protected]>
@sbueringer
Copy link
Member

Thx!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 17be7766e93a71470cfcbe0052cc592aaf013aa4

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 23, 2024
@k8s-ci-robot k8s-ci-robot merged commit 214ab6d into kubernetes-sigs:main Oct 23, 2024
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Issues or PRs related to testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
Development

Successfully merging this pull request may close these issues.

5 participants