Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: validate uploaded parts before completing upload #13763

Conversation

eppjame
Copy link

@eppjame eppjame commented Aug 28, 2024

Description of changes

  • Simplify string size evaluation in byteLength to align with upload part generator. When validating parts, it's required to have the same size calculation. Other data types are already aligned.
  • Before sending CompleteMultipartUpload, validate that we have all expected parts based on object size.

Issue #, if available

Description of how you validated changes

  • New unit tests mocking cache to create invalid parts

Checklist

  • PR description included
  • yarn test passes
  • Unit Tests are changed or added
  • Relevant documentation is changed or added (and PR referenced)

Checklist for repo maintainers

  • Verify E2E tests for existing workflows are working as expected or add E2E tests for newly added workflows
  • New source file paths included in this PR have been added to CODEOWNERS, if appropriate

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@eppjame eppjame requested a review from a team as a code owner August 28, 2024 20:10
@@ -8,16 +8,9 @@
export const byteLength = (input?: any): number | undefined => {
if (input === null || input === undefined) return 0;
if (typeof input === 'string') {
let len = input.length;
const blob = new Blob([input]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worried this might introduce compatibility issues on RN. What was wrong with the old logic for string inputs?

Copy link
Author

@eppjame eppjame Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just aligning the length evaluation with getDataChunker.
I was concerned at the possibility of byteLength and getDataChunker determining different sizes.
There shouldn't be any issue with RN, because it already has to work in getDataChunker. Unless there is an expected RN limitation to singlepart uploads.

@@ -193,6 +195,8 @@ export const getMultipartUploadHandlers = (
});
}

validateCompletedParts(inProgressUpload.completedParts, size!);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to coerce the size? Maybe we should only run the validation if size is set. Specifically from the inline docs on byteLength: "The total size is not required for multipart upload, as it's only used in progress report." If we do want to make sure this always run, can we just update the types to not require the assertion?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byteLength is a bit odd, because in any case where it would return undefined the upload is rejected by getDataChunker.
For any valid multipart upload, byteLength must be able to determine size, which makes sense because we can't do a multipart upload without knowing the size.

I considered updating byteLength to throw if it can't determine size, but it would change behaviour of that failure case and takes more refactoring.

As for only used in progress report, it's used throughout the upload process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things here,

  1. what happens here when the size is undefined for some reason? i m still unclear on that.
  2. As for only used in progress report, it's used throughout the upload process.
    for this, can we update inline docs accordingly then?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If size is undefined, the upload will default to a multipart upload. When it gets to getDataChunker, it will throw an error and upload will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I will make the total length explicitly defined. Right now the contract is implicitely defined but marked as optional.

@AllanZhengYP AllanZhengYP self-requested a review October 18, 2024 22:39
@AllanZhengYP
Copy link
Member

Rebased the PR and fixed the unit test.

A more noticeably change I made is that now uploadData API explicitly throw validation error when the input data size cannot be determined. This is not a breaking change because we implicitly require the data size explicitly before. Based on the comment: #13763 (comment)

Comment on lines 22 to 23
// If the upload source sticks to the suggested types, the byteLength can be
// determined here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it doesn't "stick to the suggested types"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will throw the same validation error later to when are chunking the data for multipart upload:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should update the comment here, "suggested types" reads as though the upload doesn't have to be the expected shape 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calebpollman Addressed the comment in 052f11d (#13763)

calebpollman
calebpollman previously approved these changes Nov 4, 2024
@AllanZhengYP AllanZhengYP merged commit 4ff7b52 into aws-amplify:storage-browser/integrity Nov 5, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants