Add `php://input` and `$_FILES` to trace metadata #95

roborourke · 2023-05-11T11:06:33Z

This lets us report on REST API body payloads as well as regular form data.

rmccue · 2023-05-11T11:52:45Z

This isn't free to read the data, both in terms of memory and time usage, so I'm not sure it makes sense for us to add for all requests. (Also, I seem to recall that php://input can only be read from once but maybe that's no longer the case.)

roborourke · 2023-05-11T12:06:23Z

Yep:

php://input is a read-only stream that allows you to read raw data from the request body. php://input is not available with enctype="multipart/form-data".

There's no way to check the type of form submission reliably that I can find. So this could be limited to just POST, PUT, PATCH, DELETE requests.

Should we include $_FILES while we're at it?

rmccue · 2023-05-11T12:12:51Z

Sorry, typo, that should have said:

that php://input can only be read from once

That is, the stream is exhausted once it's read from. I can't see any info on whether that's still the case though.

There's no way to check the type of form submission reliably that I can find. So this could be limited to just POST, PUT, PATCH, DELETE requests.

HTTP only allows request bodies on POST, PUT, and PATCH; GET, HEAD, and DELETE can't have them.

roborourke · 2023-05-11T12:15:49Z

Well I was close. Also kinda wild regarding php://input being a one time read. Makes zero sense to me.

roborourke · 2023-05-11T12:26:49Z

Quick test - created a file input.php:

<?php
$input1 = file_get_contents( 'php://input' );
var_dump( 'once', $input1 );
$input2 = file_get_contents( 'php://input' );
var_dump( 'twice', $input2 );

Run

php -S localhost:8090
echo -n '{"show":"me","what":"you","got":"!"}' | http POST :8090/input.php

Output:

HTTP/1.1 200 OK
Connection: close
Content-type: text/html; charset=UTF-8
Date: Thu, 11 May 2023 12:26:06 GMT
Host: localhost:8092
X-Powered-By: PHP/8.0.28

string(4) "once"
string(36) "{"show":"me","what":"you","got":"!"}"
string(5) "twice"
string(36) "{"show":"me","what":"you","got":"!"}"

rmccue · 2023-05-11T12:34:39Z

Yeah, it was an exhaustible stream so you had to cache the data; it's why the REST API code reads it once and stores it. I think it depends on which SAPI you're using though.

In any case: I think this needs to be behind some kind of flag, because otherwise there's potentially a lot of data being read in, parsed, and pushed out to X-Ray.

roborourke · 2023-05-11T12:45:12Z

In any case: I think this needs to be behind some kind of flag, because otherwise there's potentially a lot of data being read in, parsed, and pushed out to X-Ray.

See #94

rmccue · 2023-05-11T13:05:56Z

Truncation isn't a straight solution to that, because you still have the overhead of reading data in and parsing it.

Say, for example, I do curl -X POST example.com/wp-json/wp/v2/media < myfile.jpg, where myfile.jpg is a 1GB file. file_get_contents( 'php://input' ) will return a 1GB string, which will exhaust PHP's memory and cause it to error the whole request.

Even with regular sized JSON data that can be generated from page builders/etc, reading the whole data into memory will use up a decent chunk of memory, plus has the overhead of then truncating if necessary and pushing off to X-Ray. I can see this having an appreciable impact on request times, and causing errors.

(This is one of the reasons that eg Apache access logs don't include HTTP bodies as well.)

Also, the data sanitisation will need adapting for any of this, given that it otherwise will include the data we're sanitising elsewhere.

I see the utility of why you'd want to add this, but without being flagged behind (say) an opt-in header I don't think we can enable it by default.

roborourke · 2023-05-11T13:12:20Z

Well, we can close this one then. I was going off a throwaway comment on #80 that we should add this while I had the repo open.

The most crucial things for me right now are the truncation and being able to redact data on the initial progress.

joehoyle · 2023-07-26T12:55:32Z

Ok think we can check the stream length on php://input to do this. I do think it's a good idea to have this

roborourke · 2023-07-26T15:21:36Z

Ah smart

Add php://input to trace metadata

a09af74

roborourke requested a review from kovshenin May 11, 2023 11:06

Only read input on requests with body allowed

e449222

roborourke changed the title ~~Add php://input to trace metadata~~ Add php://input and $_FILES to trace metadata May 11, 2023

roborourke marked this pull request as draft May 12, 2023 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `php://input` and `$_FILES` to trace metadata #95

Add `php://input` and `$_FILES` to trace metadata #95

roborourke commented May 11, 2023

rmccue commented May 11, 2023 •

edited

Loading

roborourke commented May 11, 2023

rmccue commented May 11, 2023

roborourke commented May 11, 2023

roborourke commented May 11, 2023

rmccue commented May 11, 2023

roborourke commented May 11, 2023

rmccue commented May 11, 2023

roborourke commented May 11, 2023 •

edited

Loading

joehoyle commented Jul 26, 2023

roborourke commented Jul 26, 2023

Add php://input and $_FILES to trace metadata #95

Are you sure you want to change the base?

Add php://input and $_FILES to trace metadata #95

Conversation

roborourke commented May 11, 2023

rmccue commented May 11, 2023 • edited Loading

roborourke commented May 11, 2023

rmccue commented May 11, 2023

roborourke commented May 11, 2023

roborourke commented May 11, 2023

rmccue commented May 11, 2023

roborourke commented May 11, 2023

rmccue commented May 11, 2023

roborourke commented May 11, 2023 • edited Loading

joehoyle commented Jul 26, 2023

roborourke commented Jul 26, 2023

Add `php://input` and `$_FILES` to trace metadata #95

Add `php://input` and `$_FILES` to trace metadata #95

rmccue commented May 11, 2023 •

edited

Loading

roborourke commented May 11, 2023 •

edited

Loading