Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add php://input and $_FILES to trace metadata #95

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

roborourke
Copy link
Contributor

This lets us report on REST API body payloads as well as regular form data.

@roborourke roborourke requested a review from kovshenin May 11, 2023 11:06
@rmccue
Copy link
Member

rmccue commented May 11, 2023

This isn't free to read the data, both in terms of memory and time usage, so I'm not sure it makes sense for us to add for all requests. (Also, I seem to recall that php://input can only be read from once but maybe that's no longer the case.)

@roborourke
Copy link
Contributor Author

Yep:

php://input is a read-only stream that allows you to read raw data from the request body. php://input is not available with enctype="multipart/form-data".

There's no way to check the type of form submission reliably that I can find. So this could be limited to just POST, PUT, PATCH, DELETE requests.

Should we include $_FILES while we're at it?

@rmccue
Copy link
Member

rmccue commented May 11, 2023

Sorry, typo, that should have said:

that php://input can only be read from once

That is, the stream is exhausted once it's read from. I can't see any info on whether that's still the case though.

There's no way to check the type of form submission reliably that I can find. So this could be limited to just POST, PUT, PATCH, DELETE requests.

HTTP only allows request bodies on POST, PUT, and PATCH; GET, HEAD, and DELETE can't have them.

@roborourke
Copy link
Contributor Author

Well I was close. Also kinda wild regarding php://input being a one time read. Makes zero sense to me.

@roborourke
Copy link
Contributor Author

Quick test - created a file input.php:

<?php
$input1 = file_get_contents( 'php://input' );
var_dump( 'once', $input1 );
$input2 = file_get_contents( 'php://input' );
var_dump( 'twice', $input2 );

Run

  • php -S localhost:8090
  • echo -n '{"show":"me","what":"you","got":"!"}' | http POST :8090/input.php

Output:

HTTP/1.1 200 OK
Connection: close
Content-type: text/html; charset=UTF-8
Date: Thu, 11 May 2023 12:26:06 GMT
Host: localhost:8092
X-Powered-By: PHP/8.0.28

string(4) "once"
string(36) "{"show":"me","what":"you","got":"!"}"
string(5) "twice"
string(36) "{"show":"me","what":"you","got":"!"}"

@rmccue
Copy link
Member

rmccue commented May 11, 2023

Yeah, it was an exhaustible stream so you had to cache the data; it's why the REST API code reads it once and stores it. I think it depends on which SAPI you're using though.

In any case: I think this needs to be behind some kind of flag, because otherwise there's potentially a lot of data being read in, parsed, and pushed out to X-Ray.

@roborourke
Copy link
Contributor Author

In any case: I think this needs to be behind some kind of flag, because otherwise there's potentially a lot of data being read in, parsed, and pushed out to X-Ray.

See #94

@roborourke roborourke changed the title Add php://input to trace metadata Add php://input and $_FILES to trace metadata May 11, 2023
@rmccue
Copy link
Member

rmccue commented May 11, 2023

Truncation isn't a straight solution to that, because you still have the overhead of reading data in and parsing it.

Say, for example, I do curl -X POST example.com/wp-json/wp/v2/media < myfile.jpg, where myfile.jpg is a 1GB file. file_get_contents( 'php://input' ) will return a 1GB string, which will exhaust PHP's memory and cause it to error the whole request.

Even with regular sized JSON data that can be generated from page builders/etc, reading the whole data into memory will use up a decent chunk of memory, plus has the overhead of then truncating if necessary and pushing off to X-Ray. I can see this having an appreciable impact on request times, and causing errors.

(This is one of the reasons that eg Apache access logs don't include HTTP bodies as well.)

Also, the data sanitisation will need adapting for any of this, given that it otherwise will include the data we're sanitising elsewhere.

I see the utility of why you'd want to add this, but without being flagged behind (say) an opt-in header I don't think we can enable it by default.

@roborourke
Copy link
Contributor Author

roborourke commented May 11, 2023

Well, we can close this one then. I was going off a throwaway comment on #80 that we should add this while I had the repo open.

The most crucial things for me right now are the truncation and being able to redact data on the initial progress.

@roborourke roborourke marked this pull request as draft May 12, 2023 11:15
@joehoyle
Copy link
Member

Ok think we can check the stream length on php://input to do this. I do think it's a good idea to have this

@roborourke
Copy link
Contributor Author

Ah smart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants