Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security/EscapeOutput: use PHPCSUtils, allow for modern PHP and slew of bug fixes #2326

Merged
merged 44 commits into from
Jul 28, 2023

Conversation

jrfnl
Copy link
Member

@jrfnl jrfnl commented Jul 28, 2023

I realize this is a huge PR, so please take your time reviewing it.

Note: this PR will be easiest to review by checking each commit individually.

Also note: there is always more which can be done, so, while not perfect, this PR will make significant (and HUGE) improvements to the escape output sniff and sets the sniff up to make it more straight forward to make further improvements in the future.


Security/EscapeOutput: rename test case file

... to allow for adding additional test case files.

Security/EscapeOutput: move parse error related test to separate file

... to prevent it interfering with other tests.

Security/EscapeOutput: use Tokens::$magicConstants

The Tokens::$magicConstants array was introduced to PHPCS in version 3.5.6, so we can now defer to the upstream array.

Security/EscapeOutput: use PHPCSUtils

Security/EscapeOutput: normalize the $safe_components array

  • Use token constants instead of type strings.
  • Have the token constant as both the key and the value.

Security/EscapeOutput: bug fix - expand the "safe components" list

Operator tokens in PHP are not insecure, it is the operands which are potentially insecure.

As things were, we've seen all sorts of messages in bugs reported, where the sniff would report on && or === tokens, which is nonsensical.

The "safe components" list already contained a number of operators, most notably the arithmetic operators (though the PHP 5.6+ ** (T_POW) operator was missing).

I'm proposing to expand this list with a much larger list of operators using the predefined token arrays from PHPCS itself + one of the PHPCSUtils token arrays + the T_BOOLEAN_NOT (!) operator.

With this expansion, we shouldn't be seeing any more issues reported for the sniff flagging operators, which should make debugging the issues reported more straight forward.

The only operators I have not (yet) included in the list are the assignment operators as assignments within an output statement should be exceedingly rare anyway.

Includes unit tests, all of which would throw an error on the operator prior to this change.

Note: in all honesty, the sniff should take operator precedence into account properly and, for instance, disregard a $var or function call after a boolean operator as that is automatically cast to a boolean. Similarly, the operands before and after an arithmetic operator will always be cast to numeric. Having said that, that's a rabbit hole for a future release as there's enough to fix in the sniff without going down that path.

Security/EscapeOutput: split process_token() method [1]

Split off the code which does the actual output escaping check into a separate check_code_is_escaped() method.

No functional changes.

Security/EscapeOutput: split process_token() method [2]

Split off the code which checks for function calls into a separate process_tstring_token() method.

No functional changes.

👉 This commit will be easier to review while ignoring whitespace changes.

Security/EscapeOutput: bug fix - false positives on non-function calls

As things were, the sniff would look for T_STRING tokens with certain contents, but never do a proper check if those were actually function calls.

This would lead to false positives, but also potentially to an Internal.Exception for the $end_of_statement not being defined.

This commit is a first step towards fixing this.

Includes tests for select situations fixed by this particular commit.

Helpers/PrintingFunctionsTrait: add get_printing_functions() method

... which is needed for the switch to another base class is the next commit.

Security/EscapeOutput: bug fix - false positives on non-global function calls

As things were, the sniff would look for T_STRING tokens with certain contents, but never do a proper check if those were actually function calls to the WP/PHP native global function.

This would lead to false positives for calls to methods, function declarations etc.

By changing the parent class to the AbstractFunctionRestrictionsSniff, we can re-use the "if this a global function call" logic already in place in that abstract, while still maintaining the handling of the other tokens in the pre-existing manner.

There are three caveats to this:

  1. The AbstractFunctionRestrictionsSniff contains an exclude property, which allows to silence notices about certain function for a complete group.
    As this is a security sniff, we don't want end-users to be able to silence notices about groups of functions in that way.
    To that end, the ability for end-users to set the exclude property for this sniff is being disabled.
  2. The AbstractFunctionRestrictionsSniff class will call the process_matched_token() method once for each match in each function group.
    As the "unsafe printing functions" are also included in the "printing functions", this would lead to the process_matched_token() method being called twice for those methods, leading to duplicate error notices.
    I'm preventing this by removing the "unsafe printing functions" from the "printing functions" list prior to passing it off to the parent class.
  3. Lastly, the AbstractFunctionRestrictionsSniff class expects the list of target functions to be available when the sniff is instantiated. However, in test situations, that list can fluctuate, as the "printing functions" supports the ability to add "custom printing functions", which is done on the fly in the test case file using // phpcs:set ....
    To get round that, I'm re-registering the "function groups" every time a T_STRING (potential function call) triggers the sniff, but only when the tests are being run.
    While it is possible for end-users to use the // phpcs:set syntax, it is little known and is not mentioned anywhere in the documentation as a feature for end-users, so just re-registering the function groups for test runs would seem sufficient.

Includes tests for situations fixed by this particular commit.
Includes a test using the PHP 8.0+ nullsafe object operator.

Note: the PrintingFunctionsTrait::is_printing_function() method has now become unused and could be removed.
For consistency with other traits/helper classes, I'm electing to leave it be (for now) though.

Security/EscapeOutput: bug fix - function names are case-insensitive

While the initial function name check was previously already made case-insensitive in #2273 (commit 53309ac) and remained case-insensitive with the switch to the AbstractFunctionRestrictionsSniff base class, the special casing function name checks were not done in a case-insensitive manner.

This could lead to false positives as well as false negatives.

Fixed now.

Includes tests proving the bug.

Security/EscapeOutput: prevent false positive due to comments in param

The 'raw' key in the parameter arrays returned from the PassedParameters class contains - as per the name - the raw contents of the parameter.

Since PHPCSUtils 1.0.0-alpha4, the return array also contain a 'clean' index, which contains the contents of the parameter cleaned of comments.

By switching to using that key, a potential false positive gets fixed.

Includes unit test demonstrating the issue and safeguarding the fix.

Security/EscapeOutput: code readability improvements [1]

Minor tweaks to the code in the process_token() method for improved comprehensibility of the code.

Rename some variables, add some documentation, split up long conditions.

Includes preventing the code from changing the $stackPtr as that is just really confusing.

Security/EscapeOutput: bug fix - only examine exit/die with parentheses

exit/die are language constructs and can be used as part of an expression.

Additionally, an exit/die can be stand-alone or can pass a status code/message. If a status code/message is passed, we want it to be escaped for output, but a status code/message can only be passed within parentheses.

As things were, the sniff would not ignore exit/die statements without status and when used within an expression, would walk the tokens after the exit/die, leading to false positives.

Fixed now by using custom logic for T_EXIT tokens.
Includes parse error/live coding protection, where the sniff should stay silent.

As the method will now use a switch control structure, move the special casing for T_STRING also into the switch.

Includes tests for situations fixed by this particular commit.

👉 This commit will be easier to review while ignoring whitespace changes.

Security/EscapeOutput: bug fix - improve start/end determination for print statements

print is a language construct and can be used as part of an expression.

The sniff did not take this into account correctly. This would lead to the sniff walking too far and examining code which is not part of the print statement. This would lead to both false positives as well as false negatives.

This commit improves that situation, but doesn't completely fix it.

First off, the sniff now uses the BCFile::findEndOfStatement() method to more accurately retrieve the end token for the expression. This should improve the handling of the most common cases.
This part includes live coding/parse error protection for unfinished print statements at the end of a file.

However, the [BC]File::findEndOfStatement() method does not take ternaries into account when determining the end of the expression.

So, secondly, code has been added to special case print statements in the "then" part of a ternary and determine the end of such an expression correctly.

Includes tests for situations fixed by this particular commit.

Additional notes:

  • I can still come up with more (exotic) test cases with print in ternaries which would still not be handled correctly.
    I'm leaving those as "future scope".
    Example:
    ( $fop === 1 ) ? true && print $foo : print $bar; // The first print will now examine everything to the end of the statement, not to the end of the expression, which means that `$bar` will not be flagged as the sniff bow out before it reached $bar.

Fixes #2209

Security/EscapeOutput: efficiency tweak for parse errors in echo statements

Bow out earlier in case of a parse error for echo statements and short open echo tag statements.

While the sniff would not throw false positives/negatives as the for() loop, which does the token walking in the check_code_is_escaped() method, would not get started, as $i < false as stop condition would always return true, we can (and should) bow out earlier for parse errors/live coding.

Fixed now.

Includes switching the order of the token searches for maximum efficiency.

Includes tests safeguarding the tweak.

Security/EscapeOutput: fix up after refactor

The refactor in the previous few commits "broke" the preliminary check for a non-parenthesized ternary ìn a print statement.

There were no tests covering this.

I'm fixing this now by moving that particular code block out of the switch and executing it for both print, echo and <?= (though not for exit/die statements as those will always have parentheses, otherwise there will be nothing to check).

I'm also extracting the actual logic for finding the ternary to a separate method, as this logic will be expanded in follow-up commits.

Includes adding tests safeguarding the logic.

Security/EscapeOutput: bug fix - allow for nested statements with ternaries

As things were, the check for statements with ternaries without wrapping parentheses, did not take into account that the whole statement could be nested within parentheses if the statement is a print.

This commit updates the code block handling the skipping of the condition for a ternary without wrapping parentheses to allow for the statement being nested in parentheses.

Includes adding tests safeguarding the fix.

Security/EscapeOutput: bug fix - [improved] allow for nested statements with ternaries

While the previous commit already prevented a ternary being regarded as the "applicable" ternary when it wasn't at the right nesting level, that still disregarded the fact that ternaries can be chained and that an "applicable" ternary token could be found after a ternary which should be disregarded.

This led to false positives.

This commit changes the logic in the find_ternary() method to properly search the token group for an applicable ternary token, which fixes that issue.

Includes tests proving the bug and safeguarding the fix.

Security/EscapeOutput: bug fix - allow non-matching parentheses

As things were, the prelim check for statements with ternaries without wrapping parentheses, did not take into account that there could be a parentheses at the start and at the end of the statement, which didn't belong together.
In that case, the preliminary ternary check should still be executed.

Fixed now.

Includes adding tests safeguarding the fix.

Fixes #677

This also fixes the third case reported in #1507

Security/EscapeOutput: bug fix - fix ternary finding within the loop

As things were, when an open parenthesis was found in the loop, the sniff would attempt to find a relevant ternary operator and if found, skip over the condition.

However, this check did not take nested parentheses into account correctly, which meant that if the complete ternary was wrapped in parentheses, but the condition was not wrapped in parentheses and the "then" or the "else" part would contain parentheses, the ternary was incorrectly disregarded.

Fixed now by switching the logic used within the loop over to use the new find_ternary() method, which should handle the determination of whether the ternary is applicable correctly.

Note: it is important for the $start token passed to the find_ternary() method to be the token after the open parenthesis, otherwise, the method will not look within the parentheses.

Includes adding tests safeguarding the fix.

Fixes #1219

Security/EscapeOutput: bug fix - false negatives for short ternary

While the use of short ternaries is/will be discouraged in WP, this doesn't mean it won't be used in non-WP Core code.

When a short ternary is used, the sniff should not ignore the code before the ternary operator, but should check it for being correctly escaped as the value will be used if it is "truthy".

Fixed by changing the find_ternary() method to a find_long_ternary() method, which disregards short ternaries.

Includes tests proving the bug and safeguarding the fix.

Fixes #1617

Security/EscapeOutput: code readability improvements [2]

Minor tweaks to the code in the process_matched_token() method for improved comprehensibility of the code.

Rename some variables, simplify/split up long conditions.

Includes preventing the code from changing the $stackPtr as that is just really confusing.

Security/EscapeOutput: efficiency tweak [1]

Bow out earlier.

There is one function call related check which may bow out without checking parameters, so move that check up for efficiency.

Security/EscapeOutput: efficiency tweak [2]

If the parameter which needs to be examined doesn't exist, we can skip the complete function call.

Security/EscapeOutput: bug fix - sniff would skip too much

If a non-function call T_STRING token would be named the same as any of the escaping/formatting/array walking functions the sniff looks for, the sniff would walk to the next parenthesis opener (if found) and skip everything in between.

Fixed now.

Includes a test proving the bug and safeguarding the fix.

Note: the test shows the sniff now throwing two errors, one for the $var and one for the unrelatedFunction() function call.
The sniff does not throw an error for the (unescaped) constant being used. This may need to be looked into separately.

Security/EscapeOutput: bug fix - sniff did not handle function call parameters correctly

As things were, the sniff would examine all parameters from a function call as one long code block.

This would easily lead to false positives (and possibly also false negatives) largely related to the sniff not correctly starting/stopping the watching for each new parameter and getting confused over ternaries (again).

This commit make two closely related changes to fix this.

  1. It changes the behaviour of the sniff to examine each function call parameter individually.
  2. It moves the "should a ternary condition at the start of the code snippet be skipped ?" check, which was previously only used for echo/print/<?= statements/expressions, to the check_code_is_escaped() method, which means it will now be applied to each individual parameter.

This gives much higher accuracy when examining parameters.

As a side-effect of this change, function calls to printing functions using named parameters will not yield false positives on the parameter label.

Includes tests proving the bug(s) and safeguarding the fix.

Fixes the issue reported in #677 (comment)

Security/EscapeOutput: add tests for PHP 7.4 numeric literals + PHP 8.1 octal literals

No changes needed. Just making sure.

Security/EscapeOutput: add support for PHP 8.0+ named parameters [1]

  1. Adjusted the way the correct parameters are retrieved from a call to trigger_error()/user_error() to use the new PHPCSUtils 1.0.0-alpha4 PassedParameters::getParameterFromStack() method.
  2. The parameter names used are in line with the names as per the PHP 8.0 release.
    PHP itself renamed a lot of parameters in PHP 8.0. As named parameters did not exist before PHP 8.0, the parameter name as per PHP 8.0 (or above) are the only relevant names.

Includes additional unit tests.

ecurity/EscapeOutput: add support for PHP 8.0+ named parameters [2]

  1. Adjusted the way the correct parameters are retrieved from a call to _deprecated_file() to use the new PHPCSUtils 1.0.0-alpha4 PassedParameters::getParameterFromStack() method.
  2. Verified the parameter names used are in line with the name as per the WP 6.3 release.
    WP has been renaming parameters and is probably not done yet, but it doesn't look like those changes (so far) made it into changelog entries....
    For the purposes of this exercise, I've taken the current parameter name as the "truth" as support for named parameters hasn't officially been announced yet, so any renames after this moment are the only ones relevant.

Includes additional unit tests.

Security/EscapeOutput: add support for PHP 8.0+ match expressions [1]

PHP 8.0 introduced match expressions, which can be used inline, which means they can be used in an output generating statement.

If the match expression is wrapped inside an escaping function, it is already handled correctly.

This commit is the first step in adding additional support for match expressions to the sniff and adds logic to disregard a complete match expression if it is preceded by a "safe cast".

Includes tests.

Security/EscapeOutput: add support for PHP 8.0+ match expressions [2]

PHP 8.0 introduced match expressions, which can be used inline, which means they can be used in an output generating statement.

This second commit adds logic to correctly check inside a match expression.

In a match, there can be multiple matching conditions before the double arrow and one statement to be returned after the double arrow.
So basically, we should ignore everything before the double arrow as it is effectively a comparison.
And we should verify that everything after the double arrow - up to the first comma at the same "level" as the arrow - is escaped correctly.

Now as the conditions, when there is more than one, are also comma-separated, the parsing of a match expression is non-trivial.

With this in mind, I'm introducing a helper method to do the parsing. This helper method will pass each individually identified "return value" part to the check_code_is_escaped() method for stand-alone analysis.

Once the method is finished analyzing the match expression, it will hand back off to the original call to check_code_is_escaped() and continue analyzing the code after the match expression.

Note: I'm considering adding a utility method to PHPCSUtils for the parsing of match expressions. If and when that method is available, we should be able to simplify the code I've now added to the sniff.

Includes tests.

Security/EscapeOutput: add support for *::class and PHP 8.0+ $obj::class

The result of Name::class or the PHP 8.0+ $obj::class will always be a plain string, along the same lines as the use of __CLASS__, so this should be allowed and not need escaping.

This commit adds support for that code pattern to the sniff, which fixes some false positives.

Includes tests.

Fixes #1989

Security/EscapeOutput: add support for examining throw statements and PHP 8.0+ throw expressions

Any exception which isn't caught runs the risk of being displayed and should therefore be output escaped, along the same lines as is already required for parameters passed to trigger_error() function calls.

This commit add the T_THROW token to the tokens the sniff listens for. Parameters passed to the exception creation function call/class instantiation will be examined for being correctly escaped.

Notes:

  • As custom exceptions may expect different parameters from the PHP native parameters, all parameters will be examined, not just the $message parameter.
  • When a throw statement is wrapped within a try - catch control structure, it is presumed that the exception will be caught and the throw statement will be ignored.
  • The current logic supports a lot of different ways of exception creation, but does not (yet) support variable variables for the exception name and other exotics like that.

Includes tests.

Fixes #884

Security/EscapeOutput: improve handling of inline expressions

When the sniff would encountered one of the keywords the sniff listens to as an inline expression in the statement which was being examined, the sniff would flag the keyword as needing escaping.

This is wrong as that expression should be examined separately by the sniff (but wouldn't be as the sniff skips to the end of the examined code).

This commit fixes that by adjusting the logic in the loop in the check_code_is_escaped() method to check for keywords the sniff listens too and then recursing into the sniff again to examine that expression separately, after which the loop for the original statement will continue from the end of the examined expression.

Includes tests.

Fixes #1861

Security/EscapeOutput: improve handling of params in unsafe printing functions

As things were, the sniff will flag "unsafe printing functions" when encountered, but if the UnsafePrintingFunction error code is silenced, the sniff would examine all parameters passed to the function for output escaping, while only the first parameter will actually be used in the output.

This updates the sniff to only examine the first ($text) parameter for function calls to any of the unsafe printing functions, which should prevent some (rare) false positives.

Includes named parameter support for these functions.

Note: while the target parameter position and name in this case is the same for both functions, I have set it up to be flexible, in case additional functions with different target parameters would be added to the list.

Includes tests.

Security/EscapeOutput: very minor code readability improvements [3]

Security/EscapeOutput: bug fix - sniff did not handle arrays correctly

In particular when examining the parameters passed to a function call, the sniff may encounter parameters which are arrays.

As things were, the sniff would skip over the array opener and then continue like before.

This is buggy, in the same way as examining all function parameters as one long code block was buggy and would lead to errors throw on the incorrect token and/or false positives when the array item contains a ternary.

To fix this, I'm special casing arrays, parsing the array items and then recursively calling the check_code_is_escaped() method to examine each array item individually. After this, the original call to check_code_is_escaped() will continue from the array closer onwards.

This gives much higher accuracy when examining array items and fixes the bug.

Includes tests proving the bug(s) and safeguarding the fix.

Security/EscapeOutput: group common token check together

At various points within the check_code_is_escaped() loop, the sniff would check whether the current token is a token which can be ignored.

This commit just groups those simple checks, which are not dependent on other logic in the loop, together and places them at the top of the loop for the highest efficiency.

Security/EscapeOutput: improve handling of heredocs, include PHP 7.3+ flexible heredocs

As things were, when the sniff would encounter a heredoc, it would throw an error on the heredoc opener (<<<EOD) token and be done with it.

However, heredocs can, but don't necessarily always contain interpolated variables and expressions and when the heredoc doesn't contain any interpolation, it should be treated the same as any other plain text string.

This commit fixes the sniff to handle heredocs properly and throw errors only on heredoc lines containing interpolation.

Includes tests.

Security/EscapeOutput: improve handling of params in formatting functions

Similar to multiple other fixes made before, the parameters passed to a formatting function should be examined individually and not as one big code block.

Fixed now.

Includes tests.

Security/EscapeOutput: special case get_search_query( false )

As per the reported issue, get_search_query() is unsafe when the $escaped parameter is set, but not set to true.

This commit special cases the function and checks the parameter. If the parameter is passed, but not set to true, a custom error message will be thrown.

Includes tests.

Fixes #1354

Security/EscapeOutput: various minor docs fixes

jrfnl added 9 commits July 28, 2023 02:43
... to allow for adding additional test case files.
... to prevent it interfering with other tests.
The `Tokens::$magicConstants` array was introduced to PHPCS in version 3.5.6, so we can now defer to the upstream array.
* Use token constants instead of type strings.
* Have the token constant as both the key and the value.
Operator tokens in PHP are not insecure, it is the **_operands_** which are potentially insecure.

As things were, we've seen all sorts of messages in bugs reported, where the sniff would report on `&&` or `===` tokens, which is nonsensical.

The "safe components" list already contained a number of operators, most notably the arithmetic operators (though the PHP 5.6+ `**` (`T_POW`) operator was missing).

I'm proposing to expand this list with a much larger list of operators using the predefined token arrays from PHPCS itself + one of the PHPCSUtils token arrays + the `T_BOOLEAN_NOT` (`!`) operator.

With this expansion, we shouldn't be seeing anymore issues reported for the sniff flagging operators, which should make debugging the issues reported more straight forward.

The only operators I have not (yet) included in the list are the assignment operators as assignments within an output statement should be exceedingly rare anyway.

Note: in all honesty, the sniff should take operator precedence into account properly and, for instance, disregard a `$var` or function call after a boolean operator as that is automatically cast to a boolean. Similarly, the operands before and after an arithmetic operator will always be cast to numeric. Having said that, that's a rabbit hole for a future release as there's enough to fix in the sniff without going down that path.

Includes unit tests, all of which would throw an error on the operator prior to this change.
Split off the code which does the actual output escaping check into a separate `check_code_is_escaped()` method.

No functional changes.
Split off the code which checks for function calls into a separate `process_tstring_token()` method.

No functional changes.

:point_right: This commit will be easier to review while ignoring whitespace changes.
As things were, the sniff would look for `T_STRING` tokens with certain contents, but never do a proper check if those were actually function calls.

This would lead to false positives, but also potentially to an `Internal.Exception` for the `$end_of_statement` not being defined.

This commit is a first step towards fixing this.

Includes tests for select situations fixed by this particular commit.
jrfnl added 13 commits July 28, 2023 03:43
... which is needed for the switch to another base class is the next commit.
…on calls

As things were, the sniff would look for `T_STRING` tokens with certain contents, but never do a proper check if those were actually function calls to the WP/PHP native _global_ function.

This would lead to false positives for calls to methods, function declarations etc.

By changing the parent class to the `AbstractFunctionRestrictionsSniff`, we can re-use the "if this a global function call" logic already in place in that abstract, while still maintaining the handling of the other tokens in the pre-existing manner.

There are three caveats to this:
1. The `AbstractFunctionRestrictionsSniff` contains an `exclude` property, which allows to silence notices about certain function for a complete group.
    As this is a security sniff, we don't want end-users to be able to silence notices about groups of functions in that way.
    To that end, the ability for end-users to set the `exclude` property for this sniff is being disabled.
2. The `AbstractFunctionRestrictionsSniff` class will call the `process_matched_token()` method once for each match in each function group.
    As the "unsafe printing functions" are also included in the "printing functions", this would lead to the `process_matched_token()` method being called twice for those methods, leading to duplicate error notices.
    I'm preventing this by removing the "unsafe printing functions" from the "printing functions" list prior to passing it off to the parent class.
3. Lastly, the `AbstractFunctionRestrictionsSniff` class expects the list of target functions to be available when the sniff is instantiated. However, in test situations, that list can fluctuate, as the "printing functions" supports the ability to add "custom printing functions", which is done on the fly in the test case file using `// phpcs:set ...`.
    To get round that, I'm re-registering the "function groups" every time a `T_STRING` (potential function call) triggers the sniff, but only when the tests are being run.
    While it is possible for end-users to use the `// phpcs:set` syntax, it is little known and is not mentioned anywhere in the documentation as a feature for end-users, so just re-registering the function groups for test runs would seem sufficient.

Includes tests for situations fixed by this particular commit.
Includes a test using the PHP 8.0+ nullsafe object operator.

Note: the `PrintingFunctionsTrait::is_printing_function()` has now become unused and _could_ be removed.
For consistency with other traits/helper classes, I'm electing to leave it be (for now) though.
While the initial function name check was previously already made case-insensitive in 2273 (commit 53309ac) and remained case-insensitive with the switch to the `AbstractFunctionRestrictionsSniff` base class, the _special casing_ function name checks were **not** done in a case-insensitive manner.

This could lead to false positives as well as false negatives.

Fixed now.

Includes tests proving the bug.
The `'raw'` key in the parameter arrays returned from the `PassedParameters` class contains - as per the name - the _raw_ contents of the parameter.

Since PHPCSUtils 1.0.0-alpha4, the return array also contain a `'clean'` index, which contains the contents of the parameter cleaned of comments.

By switching to using that key, a potential false positive gets fixed.

Includes unit test demonstrating the issue and safeguarding the fix.
Minor tweaks to the code in the `process_token()` method for improved comprehensibility of the code.

Rename some variables, add some documentation, split up long conditions.

Includes preventing the code from changing the `$stackPtr` as that is just really confusing.
`exit`/`die` are language constructs and can be used as part of an expression.

Additionally, an `exit`/`die` can be stand-alone or can pass a status code/message. If a status code/message is passed, we want it to be escaped for output, but a status code/message can only be passed within parentheses.

As things were, the sniff would not ignore `exit`/`die` statements without status and when used within an expression, would walk the tokens after the `exit`/`die`, leading to false positives.

Fixed now by using custom logic for `T_EXIT` tokens.
Includes parse error/live coding protection, where the sniff should stay silent.

As the method will now use a `switch` control structure, move the special casing for `T_STRING` also into the `switch`.

Includes tests for situations fixed by this particular commit.

:point_right: This commit will be easier to review while ignoring whitespace changes.
…print statements

`print` is a language construct and can be used as part of an expression.

The sniff did not take this into account correctly. This would lead to the sniff walking too far and examining code which is not part of the `print` statement. This would lead to both false positives as well as false negatives.

This commit improves that situation, but doesn't completely fix it.

First off, the sniff now uses the `BCFile::findEndOfStatement()` method to more accurately retrieve the end token for the expression. This should improve the handling of the most common cases.
This part includes live coding/parse error protection for unfinished `print` statements at the end of a file.

However, the `[BC]File::findEndOfStatement()` method does not take ternaries into account when determining the end of the expression.

So, secondly, code has been added to special case `print` statements in the "then" part of a ternary and determine the end of such an expression correctly.

Includes tests for situations fixed by this particular commit.

Additional notes:
* I can still come up with more (exotic) test cases with `print` in ternaries which would still not be handled correctly.
    I'm leaving those as "future scope".
    Example:
    ```php
    ( $fop === 1 ) ? true && print $foo : print $bar; // The first print will now examine everything to the end of the statement, not to the end of the expression, which means that `$bar` will not be flagged as the sniff bow out before it reached $bar.
    ```

Fixes 2209
…ements

Bow out earlier in case of a parse error for echo statements and short open echo tag statements.

While the sniff would not throw false positives/negatives as the `for()` loop, which does the token walking in the `check_code_is_escaped()` method, would not get started, as `$i < false` as stop condition would always return `true`, we can (and should) bow out earlier for parse errors/live coding.

Fixed now.

Includes switching the order of the token searches for maximum efficiency.

Includes tests safeguarding the tweak.
The refactor in the previous few commits "broke" the preliminary check for a non-parenthesized ternary ìn a `print` statement.

There were no tests covering this.

I'm fixing this now by moving that particular code block out of the `switch` and executing it for both `print`, `echo` and `<?=` (though not for `exit`/`die` statements as those will always have parentheses, otherwise there will be nothing to check).

I'm also extracting the actual logic for finding the ternary to a separate method, as this logic will be expanded in follow-up commits.

Includes adding tests safeguarding the logic.
…naries

As things were, the check for statements with ternaries without wrapping parentheses, did not take into account that the whole statement could be nested within parentheses if the statement is a `print`.

This commit updates the code block handling the skipping of the condition for a ternary without wrapping parentheses to allow for the statement being _nested in_ parentheses.

Includes adding tests safeguarding the fix.
…ts with ternaries

While the previous commit already prevented a ternary being regarded as the "applicable" ternary when it wasn't at the right nesting level, that still disregarded the fact that ternaries can be chained and that an "applicable" ternary token could be found _after_ a ternary which should be disregarded.

This led to false positives.

This commit changes the logic in the `find_ternary()` method to properly search the token group for an applicable ternary token, which fixes that issue.

Includes tests proving the bug and safeguarding the fix.
As things were, the prelim check for statements with ternaries without wrapping parentheses, did not take into account that there could be a parentheses at the start and at the end of the statement, which didn't belong together.
In that case, the preliminary ternary check should still be executed.

Fixed now.

Includes adding tests safeguarding the fix.

Fixes 677

This also fixes the third case reported in 1507
As things were, when an open parenthesis was found in the loop, the sniff would attempt to find a relevant ternary operator and if found, skip over the condition.

However, this check did not take nested parentheses into account correctly, which meant that if the complete ternary was wrapped in parentheses, but the condition was _not_ wrapped in parentheses and the "then" or the "else" part _would_ contain parentheses, the ternary was incorrectly disregarded.

Fixed now by switching the logic used within the loop over to use the new `find_ternary()` method, which should handle the determination of whether the ternary is applicable correctly.

Note: it is important for the `$start` token passed to the `find_ternary()` method to be the token _after_ the open parenthesis, otherwise, the method will not look _within_ the parentheses.

Includes adding tests safeguarding the fix.

Fixes 1219
jrfnl added 22 commits July 28, 2023 03:43
While the use of short ternaries is/will be discouraged in WP, this doesn't mean it won't be used in non-WP Core code.

When a short ternary is used, the sniff should **not** ignore the code before the ternary operator, but should check it for being correctly escaped as the value will be used if it is "truthy".

Fixed by changing the `find_ternary()` method to a `find_long_ternary()` method, which disregards short ternaries.

Includes tests proving the bug and safeguarding the fix.

Fixes 1617
Minor tweaks to the code in the `process_matched_token()` method for improved comprehensibility of the code.

Rename some variables, simplify/split up long conditions.

Includes preventing the code from changing the `$stackPtr` as that is just really confusing.
Bow out earlier.

There is one function call related check which may bow out without checking parameters, so move that check up for efficiency.
If the parameter which needs to be examined doesn't exist, we can skip the complete function call.
If a non-function call `T_STRING` token would be named the same as any of the escaping/formatting/array walking functions the sniff looks for, the sniff would walk to the next parenthesis opener (if found) and skip everything in between.

Fixed now.

Includes a test proving the bug and safeguarding the fix.

Note: the test shows the sniff now throwing two errors, one for the `$var` and one for the `unrelatedFunction()` function call.
The sniff does not throw an error for the (unescaped) constant being used. This may need to be looked into separately.
…arameters correctly

As things were, the sniff would examine all parameters from a function call as one long code block.

This would easily lead to false positives (and possibly also false negatives) largely related to the sniff not correctly starting/stopping the watching for each new parameter and getting confused over ternaries (again).

This commit make two closely related changes to fix this.
1. It changes the behaviour of the sniff to examine each function call parameter individually.
2. It  moves the "should a ternary condition at the start of the code snippet be skipped ?" check, which was previously only used for `echo`/`print`/`<?=` statements/expressions, to the `check_code_is_escaped()` method, which means it will now be applied to each individual parameter.

This gives much higher accuracy when examining parameters.

As a side-effect of this change, function calls to printing functions using named parameters will not yield false positives on the parameter label.

Includes tests proving the bug(s) and safeguarding the fix.

Fixes the issue reported in WordPress/WordPress-Coding-Standards 677#issuecomment-470407780
….1 octal literals

No changes needed. Just making sure.
1. Adjusted the way the correct parameters are retrieved from a call to `trigger_error()`/`user_error()` to use the new PHPCSUtils 1.0.0-alpha4 `PassedParameters::getParameterFromStack()` method.
2. The parameter names used are in line with the names as per the PHP 8.0 release.
    PHP itself renamed a lot of parameters in PHP 8.0. As named parameters did not exist before PHP 8.0, the parameter name as per PHP 8.0 (or above) are the only relevant names.

Includes additional unit tests.
1. Adjusted the way the correct parameters are retrieved from a call to `_deprecated_file()` to use the new PHPCSUtils 1.0.0-alpha4 `PassedParameters::getParameterFromStack()` method.
2. Verified the parameter names used are in line with the name as per the WP 6.3 release.
    WP has been renaming parameters and is probably not done yet, but it doesn't look like those changes (so far) made it into changelog entries....
    For the purposes of this exercise, I've taken the _current_ parameter name as the "truth" as support for named parameters hasn't officially been announced yet, so any renames _after_ this moment are the only ones relevant.

Includes additional unit tests.
PHP 8.0 introduced match expressions, which can be used inline, which means they can be used in an output generating statement.

If the match expression is wrapped inside an escaping function, it is already handled correctly.

This commit is the first step in adding additional support for `match` expressions to the sniff and adds logic to disregard a complete match expression if it is preceded by a "safe cast".

Includes tests.
PHP 8.0 introduced match expressions, which can be used inline, which means they can be used in an output generating statement.

This second commit adds logic to correctly check _inside_ a match expression.

In a `match`, there can be multiple matching conditions before the double arrow and one statement to be returned after the double arrow.
So basically, we should ignore everything _before_ the double arrow as it is effectively a comparison.
And we should verify that everything after the double arrow - up to the first comma at the same "level" as the arrow - is escaped correctly.

Now as the conditions, when there is more than one, are also comma-separated, the parsing of a `match` expression is non-trivial.

With this in mind, I'm introducing a helper method to do the parsing. This helper method will pass each individually identified "return value" part to the `check_code_is_escaped()` method for stand-alone analysis.

Once the method is finished analyzing the `match` expression, it will hand back off to the original call to `check_code_is_escaped()` and continue analyzing the code after the `match` expression.

Note: I'm considering adding a utility method to PHPCSUtils for the parsing of `match` expressions. If and when that method is available, we should be able to simplify the code I've now added to the sniff.

Includes tests.
The result of `Name::class` or the PHP 8.0+ `$obj::class` will always be a plain string, along the same lines as the use of `__CLASS__`, so this should be allowed and not need escaping.

This commit adds support for that code pattern to the sniff, which fixes some false positives.

Includes tests.

Fixes 1989
… PHP 8.0+ throw expressions

Any exception which isn't caught runs the risk of being displayed and should therefore be output escaped, along the same lines as is already required for parameters passed to `trigger_error()` function calls.

This commit add the `T_THROW` token to the tokens the sniff listens for. Parameters passed to the exception creation function call/class instantiation will be examined for being correctly escaped.

Notes:
* As custom exceptions may expect different parameters from the PHP native parameters, *all* parameters will be examined, not just the  `$message` parameter.
* When a `throw` statement is wrapped within a `try - catch` control structure, it is presumed that the exception will be caught and the `throw` statement will be ignored.
* The current logic supports a _lot_ of different ways of exception creation, but does not (yet) support variable variables for the exception name and other exotics like that.

Includes tests.

Fixes 884
When the sniff would encountered one of the keywords the sniff listens to as an inline expression in the statement which was being examined, the sniff would flag the keyword as needing escaping.

This is wrong as that expression should be examined separately by the sniff (but wouldn't be as the sniff skips to the end of the examined code).

This commit fixes that by adjusting the logic in the loop in the `check_code_is_escaped()` method to check for keywords the sniff listens too and then recursing into the sniff again to examine that expression separately, after which the loop for the original statement will continue from the end of the examined expression.

Includes tests.

Fixes 1861
…functions

As things were, the sniff will flag "unsafe printing functions" when encountered, but if the `UnsafePrintingFunction` error code is silenced, the sniff would examine all parameters passed to the function for output escaping, while only the first parameter will actually be used in the output.

This updates the sniff to only examine the first (`$text`) parameter for function calls to any of the unsafe printing functions, which should prevent some (rare) false positives.

Includes named parameter support for these functions.

Note: while the target parameter position and name in this case is the same for both functions, I have set it up to be flexible, in case additional functions with different target parameters would be added to the list.

Includes tests.
In particular when examining the parameters passed to a function call, the sniff may encounter parameters which are arrays.

As things were, the sniff would skip over the array opener and then continue like before.

This is buggy, in the same way as examining all function parameters as one long code block was buggy and would lead to errors throw on the incorrect token and/or false positives when the array item contains a ternary.

To fix this, I'm special casing arrays, parsing the array items and then recursively calling the `check_code_is_escaped()` method to examine each array item individually. After this, the original call to `check_code_is_escaped()` will continue from the array closer onwards.

This gives much higher accuracy when examining array items and fixes the bug.

Includes tests proving the bug(s) and safeguarding the fix.
At various points within the `check_code_is_escaped()` loop, the sniff would check whether the current token is a token which can be ignored.

This commit just groups those simple checks, which are not dependent on other logic in the loop, together and places them at the top of the loop for the highest efficiency.
… flexible heredocs

As things were, when the sniff would encounter a heredoc, it would throw an error on the heredoc opener (`<<<EOD`) token and be done with it.

However, heredocs _can_, but don't necessarily always contain interpolated variables and expressions and when the heredoc doesn't contain any interpolation, it should be treated the same as any other plain text string.

This commit fixes the sniff to handle heredocs properly and throw errors only on heredoc lines containing interpolation.

Includes tests.
…ions

Similar to multiple other fixes made before, the parameters passed to a formatting function should be examined individually and not as one big code block.

Fixed now.

Includes tests.
As per the reported issue, `get_search_query()` is unsafe when the `$escaped` parameter is set, but not set to `true`.

This commit special cases the function and checks the parameter. If the parameter is passed, but not set to `true`, a custom error message will be thrown.

Includes tests.

Fixes 1354
@jrfnl jrfnl force-pushed the feature/escapeoutput-review-phpcsutils-modern-php branch from bfc0f03 to bb0b7d6 Compare July 28, 2023 01:44
Copy link
Member

@GaryJones GaryJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! This is intense and fantastic! ✅

@dingo-d dingo-d merged commit efbaa1d into develop Jul 28, 2023
@dingo-d dingo-d deleted the feature/escapeoutput-review-phpcsutils-modern-php branch July 28, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment