Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage/dataflux): increase test coverage #11093

Closed
wants to merge 62 commits into from

Conversation

akansha1812
Copy link
Contributor

@akansha1812 akansha1812 commented Nov 6, 2024

feat:increase test coverage
Dataflux fast-listing leverages worksteal algorithm to quickly list objects in a bucket by running several parallel processes.

Adding unit tests that uses storage/emulator to improve test coverage.
Follow up PR to : #10966

Fixes #10731

@akansha1812 akansha1812 requested review from a team as code owners November 11, 2024 19:22
Copy link
Contributor

@BrennaEpp BrennaEpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional tests!

Comment on lines +204 to +205
func TestNextBatchContextCancelEmulated(t *testing.T) {
func TestNextBatchContextCancelEmulated(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func TestNextBatchContextCancelEmulated(t *testing.T) {
func TestNextBatchContextCancelEmulated(t *testing.T) {
func TestNextBatchContextCancelEmulated(t *testing.T) {

Comment on lines +436 to +437
func createObject(ctx context.Context, bucket *storage.BucketHandle, numObjects int, prefix string) error {
func createObject(ctx context.Context, bucket *storage.BucketHandle, numObjects int, prefix string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func createObject(ctx context.Context, bucket *storage.BucketHandle, numObjects int, prefix string) error {
func createObject(ctx context.Context, bucket *storage.BucketHandle, numObjects int, prefix string) error {
func createObjects(ctx context.Context, bucket *storage.BucketHandle, numObjects int, prefix string) error {

Comment on lines +152 to +154
if df.method != worksteal {
t.Errorf("expected df.method to be %v, got %v", worksteal, df.method)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How sure are we that this won't be flaky and occasionally sequential wins out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both sequential and worksteal uses gcs api to list objects. Worksteal isn't flaky.
If sequential wins over worksteal, its an error and needs to be caught. For 300k+ objects worksteal should finish faster than sequential listing.

childCtx, cancel := context.WithCancel(ctx)
cancel()
result, err := c.NextBatch(childCtx)
if err != nil && !strings.Contains(err.Error(), context.Canceled.Error()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error check should use errors.Is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRPC has context.Canceled.Error() in desc and not as error. So this succeeds for http but fails for GRPC.

t.Errorf("NextBatch() expected to fail with %v, got nil", context.Canceled)
}
if len(result) > 0 {
t.Errorf("NextBatch() got object %v, want 0 objects", len(result))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
t.Errorf("NextBatch() got object %v, want 0 objects", len(result))
t.Errorf("NextBatch() got %v objects, want 0 objects", len(result))

Comment on lines +402 to +403
// os.Setenv("STORAGE_EMULATOR_HOST", "http://localhost:7000")
// os.Setenv("STORAGE_EMULATOR_HOST_GRPC", "localhost:8888")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove.

desc string
skipDirectoryObjects bool
query storage.Query
method listingMethod
Copy link
Contributor

@BrennaEpp BrennaEpp Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see method actually being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: implement dataflux fast listing
3 participants