-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cldsrv 546 post object #5601
base: epic/RING-45960-postObject-api
Are you sure you want to change the base?
Feature/cldsrv 546 post object #5601
Conversation
Hello kaztoozs,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
7948c37
to
1f132e5
Compare
1f132e5
to
084eb5b
Compare
084eb5b
to
7c42b2f
Compare
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command:
Alternatively, the |
7c42b2f
to
625603a
Compare
625603a
to
d18f9ec
Compare
57f34b9
to
d3b76f0
Compare
00752ae
to
252205a
Compare
5325ca1
to
7ef48e1
Compare
7ef48e1
to
95140e3
Compare
c8e4647
to
92d9273
Compare
/* eslint-disable no-param-reassign */ | ||
let formDataParser; | ||
try { | ||
formDataParser = busboy({ headers: request.headers }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more details about multipart/form-data
, parsing by busboy
and aws behavior with some links to documentation:
Content-Type
Documentation for the Content-Type, with the boundary directive and example with body: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type
Again, busboy
will accept application/x-www-form-urlencoded
(https://github.com/fastify/busboy/blob/master/lib/types/urlencoded.js#L9)
That will unnecessarily read the whole body and trigger an error 400 InvalidArgument
instead of the 412 Precondition Failed
returned by aws and stopping the request early.
Content-Disposition
The documentation about Content-Disposition
gives you details for its usage in multipart/form-data
: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
In the body, a part will look like
--boundary
Content-Disposition: form-data; name="f1"
value1
--boundary
Here busboy
will parse this and emit a field
event with fieldname
being f1
You can send this with:
- an html form with an
<input type="text" name="f1">
- curl (https://curl.se/docs/manpage.html#-F)
curl -F f1=value1
curl -F "f1=</etc/hostname"
(this put the content of the file but not the filename)
- nodejs with
form-data
or raw
You can also have
--boundary
Content-Disposition: form-data; name="f2"; filename="example.txt"
value2
--boundary
You can send this with:
- an html form with an
<input type="file" name="f2">
- curl (https://curl.se/docs/manpage.html#-F)
curl -F "[email protected]"
(filename will be example.txt)- `curl -F "f2=value2;filename=example.txt"
- nodejs with
form-data
or raw
Notice the filename
directive, this will make busboy
emit a file
event with fieldname
being f2
.
As described in busboy documentation: https://www.npmjs.com/package/@fastify/busboy#busboy-methods with the default config isPartAFile
.
You can notice in AWS documentation about the file
form field: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html#RESTObjectPOST-requests-form-fields
The file or text content.
...
Type: File or text content
So you can either send:
Content-Disposition: form-data; name="file"
(with an input of type text)Content-Disposition: form-data; name="file"; filename="whatever.txt"
Which means by default the file
form field can be emitted either by the field
or file
busboy event.
Also by testing on AWS, it accepts other form fields like key
, acl
having a filename
directive (meaning coming from an input type="file"), example this should be possible:
Content-Disposition: form-data; name="key"; filename="whatever.txt"
Content-Disposition: form-data; name="acl"; filename="whatever.txt"
So any form field can be emitted by either the field
or file
busboy event.
The processing / validation should be performed based on the fieldname
and not the busboy event.
Note that it could be interesting to replace the default behavior of busboy and defining isPartAFile
to consider a file only the fieldname === 'file'
(with potentially presence of filename
).
When doing that there should also be some limits defined on busboy, if a form field other than file
is being a big file and now parsed as a string instead of a stream, limit would help busboy to stop reading it. (The default limit.fieldSize
is 1MiB but AWS accepts only 20KB for the form-data exluding file).
The form data and boundaries (excluding the contents of the file) cannot exceed 20KB.
Note: it looks like it's not just the field value but potentially the whole boundary withContent-Disposition
and potential headers of the part, but I'm not sure, might need to be tested.
${filename} replacement
About ${filename}
replacement, it's not only on the key, but all form fields: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-HTTPPOSTForms.html
The variable ${filename} is automatically replaced with the name of the file provided by the user and is recognized by all form fields. If the browser or client provides a full or partial path to the file, only the text following the last slash (/) or backslash () is used (for example, C:\Program Files\directory1\file.txt is interpreted as file.txt). If no file or file name is provided, the variable is replaced with an empty string.
Also the documentation defines how filename
should be sanitized (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition#filename)
Busboy should sanitize it, but you could add some tests with filename containing ../xxx
, a/b/c
, etc to ensure the ${filename}
works as aws.
And validation on form fields, like key
length, should then be performed after ${filename}
replacement as it could impact the value.
Additional validation
You can receive a Content-Disposition: form-data
without name, that would make an undefined fieldname
and crash:
TypeError: Cannot read properties of undefined (reading 'toLowerCase')
Also by testing on AWS, they seem to trim the fieldname
of whitespaces: name=" \n key \n "
will should match key
.
It's also possible to have multiple parts with the same name
.
For example with this:
<input type="text" name="key">
<input type="text" name="key">
<input type="text" name="acl">
<input type="text" name="acl">
<input type="file" name="file" multiple>
or curl -F f1=value1 -F f1=value2
We should ensure, just like AWS that we don't have multiple times the same form field, whether busboy emits it via field
or file
.
Maybe the file should not be read if the parsing failed and an error was caught.
And when reading (pipe) the file stream, we need to catch error, otherwise it can crash:
Error: Part terminated early due to unexpected end of multipart data
It seems the finish
event is triggered once the file is completely read, and there is a validation logic in there, the logic could happen earlier before writting file to disk.
There is a validation on Content-Length
as well:
<Code>EntityTooLarge</Code>
<Message>Your proposed upload exceeds the maximum allowed size</Message>
<ProposedSize>6442453100</ProposedSize>
<MaxSizeAllowed>5368730624</MaxSizeAllowed>
The key
should be validated just like it's done for regular PutObject
.
See https://github.com/scality/Arsenal/blob/7eb2701f21aba91d45c2454ee8c5c257fea1c0db/lib/s3routes/routes.ts#L65-L73
Maybe there could be some blacklisted object prefix on cloudserver
You should find attached a javascript script to write the raw body form-data easier than using `nc`.
It allows to test invalid bodies that are usually well formated by lib like `form-data` or by `curl` or `postman`.
Also you can add a delay between writes to test server timeout behavior.
const http = require('http');
const CRLF = '\r\n';
const body = `\
--test
Content-Disposition: form-data; name="key"
some_key
--test
Content-Disposition: form-data; name="file"; filename="some_name"
a
--test--
`.replace(/\n/g, CRLF);
const options = {
method: 'POST',
headers: {
'Host': 'micktestpost.localhost',
'Content-Type': 'multipart/form-data; boundary=test',
'Content-Length': Buffer.byteLength(body),
},
};
const req = http.request('http://localhost:8000', options, (res) => {
console.log(req)
console.log('RESPONSE', { statusCode: res.statusCode, message: res.statusMessage, headers: res.headers });
res.on('data', (chunk) => {
console.log(`BODY: ${chunk}`);
});
res.on('end', () => {
console.log('END');
});
});
req.on('error', (e) => {
console.error(`problem with request: ${e.toString()}`);
});
// Split body in block of 20 chars
const split = body.match(/[\s\S]{1,20}/g);
// Increase delay without writting to test server timeout
const writeDelayMs = 1_000;
function writeBodyPart(i) {
if (i < split.length) {
console.log('Writting', i, split[i])
req.write(split[i]);
setTimeout(() => writeBodyPart(i + 1), writeDelayMs);
} else {
console.log('Done writting body')
req.end();
}
}
// setTimeout(() => writeBodyPart(0), writeDelayMs)
req.write(body);
req.end();
Generating the body raw without a lib can help you test any kind of invalid body to make sure you use busboy correctly and validate all input before using them to prevent a crash of the server.
Busboy already takes care of all the parsing, we need to configure it correctly (cf busboy readme) to match AWS spec and validate inputs to prevent a crash.
Ideally you could extract the parsing / validation (parseFormData function) with busboy into its own isolated class to make it easier for testing. And if you want to be consistent with callbacks usage instead of promises and async await, you can pass a callback as input of your parsing function to call once parsing /validation is done.
request.formData[lowerFieldname] = val; | ||
} | ||
// add only the recognized fields to the formData object | ||
if (POST_OBJECT_OPTIONAL_FIELDS.some(field => lowerFieldname.startsWith(field))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startsWith
will allow things like acl abc
. Known fields should be checked with equality and the special x-amz-meta-
can check checked aside
const xml2js = require('xml2js'); | ||
const axios = require('axios'); | ||
const crypto = require('crypto'); | ||
const FormData = require('form-data'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency as well does not appear on package.json
.
This is present only because it's a dependency of a dependency of a dependency, and is an old version.
Make sure to install dependencies you want to use, with latest version if possible (in devDependencies
for tests)
> npm ls form-data
[email protected] /home/mickael/scality/cloudserver
└─┬ [email protected]
└─┬ [email protected]
└── [email protected]
> npm why form-data
[email protected]
node_modules/form-data
form-data@"~2.3.2" from [email protected]
node_modules/request
request@"^2.86.0" from [email protected]
node_modules/azure-storage
azure-storage@"^2.1.0" from the root project
azure-storage@"~2.10.7" from [email protected]
node_modules/arsenal
arsenal@"git+https://github.com/scality/arsenal#4ef5748c028619edff10d6d38b21df43c8d63d88" from the root project
azure-storage@"~2.10.7" from [email protected]
node_modules/bucketclient/node_modules/arsenal
azure-storage@"~2.10.7" from [email protected]
node_modules/utapi/node_modules/arsenal
azure-storage@"~2.10.7" from [email protected]
node_modules/vaultclient/node_modules/arsenal
request@"^2.79.0" from [email protected]
node_modules/google-auto-auth
google-auto-auth@"^0.9.1" from the root project
request@"^2.88.0" from [email protected]
node_modules/oas-tools
oas-tools@"^2.2.2" from [email protected]
node_modules/utapi
utapi@"git+https://github.com/scality/utapi#7.70.4" from the root project
method: 'POST', | ||
hostname: url.hostname, | ||
port: url.port, | ||
path: url.pathname + url.search, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review the documentation: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html#RESTObjectPOST-requests
POST / HTTP/1.1 Host: destinationBucket.s3.amazonaws.com
Anything (more than /
) in the path should return a 405 (see #5601 (comment) and scality/Arsenal#2248 (comment))
And any querystring (url.search
) should return a 400 (see scality/Arsenal#2248 (comment))
Don't put them in tests if you don't want them to fail once url validation is correct
|
||
req.on('response', res => { | ||
try { | ||
assert.equal(res.statusCode, 204); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer strictEqual
const req = http.request(options); | ||
|
||
req.on('response', res => { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this request handling make the same code snippet duplicated in all tests.
Ideally, you could encapsulate all this request sending and response / error callback.
You could promisify the request to simplify the code snippet in every tests, (some most recent tests with the aws-sdk uses promise).
formData.getLength((err, length) => { | ||
if (err) { | ||
return done(err); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getLengthSync
can be interesting to not have a callback
}); | ||
}); | ||
|
||
it('should handle error when bucket does not exist', done => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could include the statusCode in description for quicker troubleshooting when it fails
}); | ||
}); | ||
|
||
it('should successfully upload an object using a POST form', done => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for organization you could have an additionnal sub describe
to regroup testing of successful request and testing of errors
const req = http.request(options); | ||
|
||
req.on('response', res => { | ||
if (res.statusCode === 404) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally it should look like assert.strictEqual(res.statusCode, 404)
so it's easy to see in the logs tha actual value vs expected
|
||
req.on('response', res => { | ||
if (res.statusCode === 204) { | ||
s3.listObjectsV2({ Bucket: bucketName }, (err, data) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a HeadObject
or GetObject
would be more appropriate
Description
Motivation and context
This API adds the ability to upload objects via HTML web forms per customer demand. This is a preliminary implementation and does not yet include Authz, POST policy checking or extra parameters.
Related issues
ARSN-422 - Adds support for postObject API
Notes