Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conserve memory when resolving paths, recursively handle symlinks, check methods now more verbose and granular. #18

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

LucasFaudman
Copy link
Contributor

  • Generators used when resolving paths to avoid storing paths all in memory at once.
  • Checks performed on paths as they are generated to catch banned paths as early as possible.
  • _recursive_resolve_symlinks handles symlinks to symlinks and symlinks to directories which may contain symlinks
  • max_resolved_paths kwarg can be used to limit the number of paths resolved to prevent the case where a large directory is injected into a command causing long rglob run-times needed to accurately preform checks which could allow the injection to be used as a DDoS mechanism.
  • rglobdirs kwarg can be used to control the behavior of whether files in directories are considered when preforming checks. This is useful to reduce check run-time when a user knows the command they are using does not need to consider files in directories since it does not have recursive capabilities.

To understand why this is neccessary consider a worst case scenario where this module is user on a system with low RAM like a t2.nano or similar that contains symlinks to NFS mounted directories on other systems in its domain with very large directory trees.

If an attacker was able to get a path through the developers input filters that resolved to a symlink to a directory on the NFS mount the module would need to resolve all of the paths in the directory tree to preform checks on them which could take a very long time and consume a lot of memory potentially crashing the system.

This approach prevents this and other similar attacks against the check mechanism since it only needs to store one path at a time in memory and can stop resolving paths if the number of resolved paths exceeds the max_resolved_paths kwarg.

Additionally, once I add the ability to customize the BANNED_* sets using kwargs in the future, this approach will allow for directories to be added to SENSITIVE_FILE_PATHS and raise a SecurityException before resolving the its children.

Lucas Faudman added 20 commits February 9, 2024 17:38
…entation by properly handling globbing, chained commands via other binaries, directory traversal and missing of banned binaries accessed in unconventional ways. Added checking for owner/group and uncommon path types that should not be in args. Added typeing
… use precomputed abs_path_strings for .endswith check
…age>) not == since my execptions have slighly different message format but the check is still the exact same
…d shell syntax with recursive_shlex_split. Compatible with Python3.8
…_EXPANSION_OPERATORS.

- Handle arithmetic expansion of parameters using -+ and/or nesting and shell variables.
- Track env vars set/modified by expansion and use them in the final command.
- Use expanuser to handle ~ and ~user tilde expansion when resolving paths.
- Correctly hand alphanumeric sequences expansions
- Block process substitution via input redirection.
- More tests, refactoring and comments.

My _shell_expand implementation is still in progress but is neccessary since there are no other viable solutions in Python.
The best I have seen is https://github.com/kojiromike/parameter-expansion
but it cannot be used in a security context because they say:
"All the standard shell expansions are supported, including some level of nested expansion, as long as this is not too complex or ambiguous."
and ambigous cases are exactly what needs to be handled.
…d if shell expansion is needed.

- _shell expand is only used when shell=True or the executable is a shell. (list of shells derived from https://www.in-ulm.de/~mascheck/various/shells/) but reading /etc/shells could be used to get a more accurate/concise list on a per-system basis.
- The executable Popen kwarg is now handled correctly and takes precedence over the shell kwarg in determining if shell expansion is needed. See subprocess.py 1593-1596: When an executable is given and shell is True, the shell executable is replaced with the given executable.
- the initial venv state is a copy of the env Popen kwarg to not modify the original Popen kwargs.
- _resolve_executable_path now takes venv into account and uses the PATH env variable to find the executable if env is set in the Popen kwargs since this is how the subprocess module behaves.
- test_popen_kwargs added to test the Popen kwargs and the initial env state. Other tests adjusted accordingly and refactored EXCEPTIONS outside class so they can be used as pytest params.
…y converting _resolve_path_in_parsed_command to a generator of Path objects that takes a single cmd_part and yields all the resolved paths for it.

- Split check_* methods based on if they are checking the expanded_command, a single command part, a single path string, or single path.
- First the command is expanded and expanded_command is checked. Then the if expanded_command passes checks it is used to create a generator of cmd_parts.
- Checks are then preformed on each part and its paths as they are generated, so all the parts/paths do not need to be stored in memory at once.
- Reduces the space complexity from O(n) to O(1) where n is the number of command parts or paths.
- More loops over the BANNED_* are needed which does slightly increase the time complexity, but this trade off is worth it since:
    - This can actually reduce the overall check execution time by raising an exception sooner before fully resolving all the parts/paths.
    - The number of BANNED_* is relatively small and constant.
    - The number of command parts and more importantly paths could be very large and unpredictable.

- Recursively handle symlinks to symlinks, and/or symlinks to directories which may contain symlinks.
- _path_is_executable now will not return True for dirs since dirs can have the x bit but they are not executables.

- max_resolved_paths, and rglob_dirs kwargs can be used to control this behavior.
- SecurityException raised if the number of resolved paths exceeds max_resolved_paths.
- updated test_resolve_paths_in_parsed_command and added test_max_resolved_paths to test this behavior.
- Underlying check implmentation logic is unchanged.
Split BANNED_COMMAND_CHAINING_ARGUMENTS from BANNED_COMMAND_CHAINING_EXECUTABLES.
Add -ok -okdir system( (for awk and similar) to BANNED_COMMAND_CHAINING_ARGUMENTS
Refactor check_path_is_chaining_executable and check_path_is_shell to path checks.
…memory at once.

- Checks performed on paths as they are generated to catch banned paths as early as possible.
- _recursive_resolve_symlinks handles symlinks to symlinks and symlinks to directories which may contain symlinks
- max_resolved_paths kwarg can be used to limit the number of paths resolved to prevent the case where a large directory is injected into a command causing long rglob run-times needed to accurately preform checks which could allow the injection to be used as a DDoS mechanism.
- rglobdirs kwarg can be used to control the behavior of whether files in directories are considered when preforming checks. This is useful to reduce check run-time when a user knows the command they are using does not need to consider files in directories since it does not have recursive capabilities.
https://sonarcloud.io/project/issues?open=AY4rNwVcynvJrFFtuQvF&id=pixee_python-security
- Remove or correct this useless self-assignment.
- Use concise character class syntax '\w' instead of '[a-zA-Z0-9_]'.
- Add replacement fields or use a normal string instead of an f-string.
- Replace the unused local variable expanded_command with _.
- Rename this local variable Popen_env to match the regular expression ^[_a-z][a-z0-9_]*$.
Copy link

sonarcloud bot commented Mar 13, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant