Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues Parsing the Proxy Settings #124

Closed
thebigG opened this issue Nov 23, 2020 · 1 comment · Fixed by #129
Closed

Issues Parsing the Proxy Settings #124

thebigG opened this issue Nov 23, 2020 · 1 comment · Fixed by #129
Assignees
Labels

Comments

@thebigG
Copy link
Collaborator

thebigG commented Nov 23, 2020

Hi everyone!

Hope you are all doing well.

Description

While testing the proxy capabilities I discovered that the proxy parsing/validation is not quite correct.

Steps to Reproduce

  1. Edit one of the settings files like settings_USA.yaml and uncomment the bottom of it:
 # Proxy settings
proxy:
 protocol: http  # NOTE: you can also set to 'http'
 ip: "168.169.146.12"
 port: 8080

NOTE: Notice how I removed the single quotes from the port field. This is something I'll fix when I push these fixes(along with unit testing for proxy.py)

  1. funnel load -s demo/settings_USA.yaml
  2. Wait to see the following error:

TypeError: _validate_type_ipv4address() missing 1 required positional argument: 'value'

The error stack is larger than that, but I'm trying to stay brief.

There are two issues happening here.

  1. The cerberus API does not pass 2 arguments to custom validators, like this function expects:
    def _validate_type_ipv4address(self, field, value):
    It expects 1. One can see this on the Cerberus API on this snippet of code which is on cerberus/validator.py:
                type_handler = self.__get_rule_handler('validate_type', _type)
                matched = type_handler(value)
            if matched:
                return

            # TODO uncomment this block on next major release
            #      when _validate_type_* methods were deprecated:
            # type_definition = self.types_mapping[_type]
            # if isinstance(value, type_definition.included_types) \
            #         and not isinstance(value, type_definition.excluded_types):  # noqa 501
            #     return

        self._error(field, errors.BAD_TYPE)
        self._drop_remaining_rules()

the function type_handler points to _validate_type_ipv4address, and as you can see it only passes 1 argument, value.

Easy fix: make _validate_type_ipv4address take 1 argument.

Once I fixed that, there was another issue:

Traceback (most recent call last):
  File "/home/lorenzogomez/.local/bin/funnel", line 11, in <module>
    load_entry_point('JobFunnel', 'console_scripts', 'funnel')()
  File "/home/lorenzogomez/PycharmProjects/JobFunnel/jobfunnel/__main__.py", line 15, in main
    cfg_dict = build_config_dict(args)
  File "/home/lorenzogomez/PycharmProjects/JobFunnel/jobfunnel/config/cli.py", line 317, in build_config_dict
    raise ValueError(
ValueError: Invalid Config settings yaml:
{'proxy': [{'ip': ['must be of ipv4address type']}]}

This has to do with two functions in the code.

First this function:

class JobFunnelSettingsValidator(Validator):
    """A simple JSON data validator with a custom data type for IPv4 addresses
    https://codingnetworker.com/2016/03/validate-json-data-using-cerberus/
    """
    def _validate_type_ipv4address(self, value):
        """
        checks that the given value is a valid IPv4 address
        """
        try:
            # try to create an IPv4 address object using the python3 ipaddress
            # module
            ipaddress.IPv4Address(value)
            return True
        except:
            self._error(value, "Not a valid IPv4 address")

Notice how I added return True at the end of the try statement.

The second function on cli.py:

 # Validate the config passed via YAML
        if not SettingsValidator.validate(config):
            raise ValueError(
                f"Invalid Config settings yaml:\n{SettingsValidator.errors}"
            )

This fucntion expects a value from _validate_type_ipv4address, but it returns None(at least the current version of it). Hence the new return True in _validate_type_ipv4address.

Hopefully this explanation makes sense.

I will be working on fixing these issues, and will also add unit testing to proxy.py.

Cheers!

Expected behavior

jobfunnel should scrape normally with proxy settings on.

Actual behavior

The errors described above.

Environment

  • Build: JobFunnel 3.0.1 on commit cf80740
  • Operating system and version: Pop!_OS 20.04 LTS
  • [Linux] Desktop Environment and/or Window Manager: Cinnamon
@thebigG thebigG added the bug label Nov 23, 2020
@thebigG thebigG self-assigned this Nov 23, 2020
@PaulMcInnis
Copy link
Owner

ah good find, this was some techdebt that wasn't captured - I never got around to validating this feature.

thebigG added a commit to thebigG/JobFunnel that referenced this issue Nov 26, 2020
thebigG added a commit to thebigG/JobFunnel that referenced this issue Dec 13, 2020
thebigG added a commit to thebigG/JobFunnel that referenced this issue Dec 13, 2020
thebigG added a commit to thebigG/JobFunnel that referenced this issue Dec 13, 2020
thebigG added a commit to thebigG/JobFunnel that referenced this issue Dec 14, 2020
thebigG added a commit to thebigG/JobFunnel that referenced this issue Dec 15, 2020
thebigG added a commit to thebigG/JobFunnel that referenced this issue Dec 15, 2020
@thebigG thebigG mentioned this issue Jan 5, 2021
15 tasks
EmersonCosta0915 pushed a commit to EmersonCosta0915/JobFunnel that referenced this issue Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants