Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX/ENH: HttpMixin refactored and various fixes #2151

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ CHANGELOG
The `LogLevel` and `ReturnType` Enums were added to `intelmq.lib.datatypes`.
- `intelmq.lib.bot`:
- Enhance behaviour if an unconfigured bot is started (PR#2054 by Sebastian Wagner).
- Remove `http_*` variables and moved them into HTTPMixin (PR#2151 by Sebastian Waldbauer, fixes #2150).
- Remove `set_request_parameters` in favor of HTTPMixin (PR#2151 by Sebastian Waldbauer).
- `intelmq.lib.mixins.http`:
- Added missing variables types and simplified code (PR#2151 by Sebastian Waldbauer).
- `intelmq.lib.pipeline`:
- Removed trying to import requests, its a core library specified in setup.py (PR#2151 by Sebastian Waldbauer).

### Development

Expand All @@ -39,6 +45,10 @@ CHANGELOG

#### Collectors
- `intelmq.bots.collectors.mail._lib`: Add support for unverified SSL/STARTTLS connections (PR#2055 by Sebastian Wagner).
- `intelmq.bots.collectors.github_api`: Removed requests dependency in favor of HttpMixin (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.collectors.collector_azure`: Added HttpMixin (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.collectors.shodan.collector_stream`: Removed `set_request_paramters()` in favor of HttpMixin (PR#2151 by Sebastian Waldbauer).


#### Parsers
- `intelmq.bots.parsers.alienvault.parser_otx`: Save CVE data in `extra.cve` instead of `extra.CVE` due to the field name restriction on lower-case characters (PR#2059 by Sebastian Wagner).
Expand Down Expand Up @@ -75,12 +85,21 @@ CHANGELOG
- `intelmq.bots.experts.truncate_by_delimiter.expert`: Cut string if its length is higher than a maximum length (PR#1967 by Marius Karotkis).
- `intelmq.bots.experts.remove_affix`: Remove prefix or postfix strings from a field (PR#1965 by Marius Karotkis).
- `intelmq.bots.experts.asn_lookup.expert`: Fixes update-database script on the last few days of a month (PR#2121 by Filip Pokorný, fixes #2088).
- `intelmq.bots.experts.do_portal`: Removed requests dependency in favor of HTTPMixin (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.experts.http.*`: Using HTTPMixin instead of `create_request_session` (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.experts.national_cert_contact_certat`: Using HttpMixin instead of `requests` library (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.experts.rdap`: Removed requests dependency & `create_request_session` in favor of HTTPMixin (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.experts.ripe`: Simplified code & uses HTTPMixin (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.experts.splunk_saved_search`: Simplified & uses HTTPMixin (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.experts.tuency`: Removed `create_request_session` in favor of HTTPMixin (PR#2151 by Sebastian Waldbauer).

#### Outputs
- Removed `intelmq.bots.outputs.postgresql`: this bot was marked as deprecated in 2019 announced to be removed in version 3 of IntelMQ (PR#2045 by Birger Schacht).
- Added `intelmq.bots.outputs.rpz_file.output` to create RPZ files (PR#1962 by Marius Karotkis).
- Added `intelmq.bots.outputs.bro_file.output` to create Bro intel formatted files (PR#1963 by Marius Karotkis).
- `intelmq.bots.outputs.templated_smtp.output`: Add new function `from_json()` (which just calls `json.loads()` in the standard Python environment), meaning the Templated SMTP output bot can take strings containing JSON documents and do the formatting itself (PR#2120 by Karl-Johan Karlsson).
- `intelmq.bots.outputs.elasticsearch`: Uses HttpMixin (PR#2151 by Sebastian Waldbauer).
- `intelmq.bots.outputs.restapi`: Using HttpMixin instead of importing `requests` (PR#2151 by Sebastian Waldbauer).

### Documentation
- Feeds: Add documentation for newly supported dataplane feeds, see above (PR#2102 by Mikk Margus Möll).
Expand All @@ -95,6 +114,7 @@ CHANGELOG
- Also test on Python 3.10 (PR#2140 by Sebastian Wagner).
- Switch from nosetests to pytest, as the former does not support Python 3.10 (PR#2140 by Sebastian Wagner).
- CodeQL Github Actions `exponential backtracking on strings` fixed. (PR#2148 by Sebastian Waldbauer, fixes #2138)
- Replaced `MagicMock` & `patch` with `requests_mock` (PR#2151 by Sebastian Waldbauer).

### Tools

Expand Down
4 changes: 0 additions & 4 deletions intelmq/bots/collectors/github_api/REQUIREMENTS.txt

This file was deleted.

16 changes: 5 additions & 11 deletions intelmq/bots/collectors/github_api/_collector_github_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,10 @@
GITHUB API Collector bot
"""
import base64
from requests import exceptions

from intelmq.lib.bot import CollectorBot

try:
import requests
except ImportError:
requests = None
from intelmq.lib.mixins import HttpMixin

static_params = {
'headers': {
Expand All @@ -22,14 +19,11 @@
}


class GithubAPICollectorBot(CollectorBot):
class GithubAPICollectorBot(CollectorBot, HttpMixin):
basic_auth_username = None
basic_auth_password = None

def init(self):
if requests is None:
raise ValueError('Could not import requests. Please install it.')

self.__user_headers = static_params['headers']
if self.basic_auth_username is not None and self.basic_auth_password is not None:
self.__user_headers.update(self.__produce_auth_header(self.basic_auth_username, self.basic_auth_password))
Expand All @@ -47,13 +41,13 @@ def process_request(self):

def github_api(self, api_path: str, **kwargs) -> dict:
try:
response = requests.get(f"{api_path}", params=kwargs, headers=self.__user_headers)
response = self.http_get(api_path, headers=self.__user_headers, params=kwargs)
if response.status_code == 401:
# bad credentials
raise ValueError(response.json()['message'])
else:
return response.json()
except requests.RequestException:
except exceptions.RequestException:
raise ValueError(f"Unknown repository {api_path!r}.")

@staticmethod
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,14 @@
'regex': file regex (DEFAULT = '*.json')
"""
import re
from requests import exceptions

from intelmq.lib.exceptions import InvalidArgument
from intelmq.bots.collectors.github_api._collector_github_api import GithubAPICollectorBot
from intelmq.lib.mixins import HttpMixin

try:
import requests
except ImportError:
requests = None


class GithubContentsAPICollectorBot(GithubAPICollectorBot):
class GithubContentsAPICollectorBot(GithubAPICollectorBot, HttpMixin):
"Collect files from a GitHub repository via the API. Optionally with GitHub credentials."
regex: str = None # TODO: could be re
repository: str = None
Expand Down Expand Up @@ -62,7 +59,7 @@ def process_request(self):
if item['extra'] != {}:
report.add('extra.file_metadata', item['extra'])
self.send_message(report)
except requests.RequestException as e:
except exceptions.RequestException as e:
raise ConnectionError(e)

def __recurse_repository_files(self, base_api_url: str, extracted_github_files: list = None) -> list:
Expand All @@ -75,7 +72,7 @@ def __recurse_repository_files(self, base_api_url: str, extracted_github_files:
elif github_file['type'] == 'file' and bool(re.search(self.regex, github_file['name'])):
extracted_github_file_data = {
'download_url': github_file['download_url'],
'content': requests.get(github_file['download_url']).content,
'content': self.http_get(github_file['download_url']).content,
'extra': {}
}
for field_name in self.__extra_fields:
Expand Down
3 changes: 2 additions & 1 deletion intelmq/bots/collectors/mail/collector_mail_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"""
import io
import re
from requests import exceptions

from intelmq.lib.mixins import HttpMixin
from intelmq.lib.splitreports import generate_reports
Expand Down Expand Up @@ -50,7 +51,7 @@ def process_message(self, uid, message):
self.logger.info("Downloading report from %r.", url)
try:
resp = self.http_get(url)
except requests.exceptions.Timeout:
except exceptions.Timeout:
self.logger.error("Request timed out %i times in a row." %
self.http_timeout_max_tries)
erroneous = True
Expand Down
4 changes: 2 additions & 2 deletions intelmq/bots/collectors/microsoft/collector_azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from intelmq.lib.bot import CollectorBot
from intelmq.lib.exceptions import MissingDependencyError
from intelmq.lib.mixins import CacheMixin
from intelmq.lib.mixins import CacheMixin, HttpMixin

try:
from azure.storage.blob import ContainerClient
Expand All @@ -23,7 +23,7 @@
create_configuration = None # noqa


class MicrosoftAzureCollectorBot(CollectorBot, CacheMixin):
class MicrosoftAzureCollectorBot(CollectorBot, CacheMixin, HttpMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used here? The bot uses the azure library and the parameters http_proxy and https_proxy are used by direct access.

"Fetch data blobs from a Microsoft Azure container"
connection_string: str = "<insert your connection string here>"
container_name: str = "<insert the container name>"
Expand Down
5 changes: 3 additions & 2 deletions intelmq/bots/collectors/shodan/collector_stream.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from typing import List

from intelmq.lib.bot import CollectorBot
from intelmq.lib.mixins import HttpMixin

try:
import shodan
Expand All @@ -27,7 +28,7 @@
shodan = None


class ShodanStreamCollectorBot(CollectorBot):
class ShodanStreamCollectorBot(CollectorBot, HttpMixin):
"Collect the Shodan stream from the Shodan API"
api_key: str = "<INSERT your API key>"
countries: List[str] = []
Expand All @@ -36,7 +37,7 @@ def init(self):
if shodan is None:
raise ValueError("Library 'shodan' is needed but not installed.")

self.set_request_parameters()
self.setup()
if tuple(int(v) for v in pkg_resources.get_distribution("shodan").version.split('.')) <= (1, 8, 1):
if self.proxy:
raise ValueError('Proxies are given but shodan-python > 1.8.1 is needed for proxy support.')
Expand Down
20 changes: 3 additions & 17 deletions intelmq/bots/experts/do_portal/expert.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,40 +8,26 @@
a "502 Bad Gateway" status code is treated the same as a timeout,
i.e. will be retried instead of a fail.
"""
try:
import requests
except ImportError:
requests = None

from intelmq.lib.mixins import HttpMixin
import intelmq.lib.utils as utils
from intelmq.lib.bot import ExpertBot


class DoPortalExpertBot(ExpertBot):
class DoPortalExpertBot(ExpertBot, HttpMixin):
"""Retrieve abuse contact information for the source IP address from a do-portal instance"""
mode: str = "append"
portal_api_key: str = None
portal_url: str = None

def init(self):
if requests is None:
raise ValueError("Library 'requests' could not be loaded. Please install it.")

self.set_request_parameters()

self.url = self.portal_url + '/api/1.0/ripe/contact?cidr=%s'
self.http_header.update({
"Content-Type": "application/json",
"Accept": "application/json",
"API-Authorization": self.portal_api_key
})

self.session = utils.create_request_session(self)
retries = requests.urllib3.Retry.from_int(self.http_timeout_max_tries)
retries.status_forcelist = [502]
adapter = requests.adapters.HTTPAdapter(max_retries=retries)
self.session.mount('http://', adapter)
self.session.mount('https://', adapter)
self.session = self.http_session()

def process(self):
event = self.receive_message()
Expand Down
3 changes: 2 additions & 1 deletion intelmq/bots/experts/geohash/expert.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
https://github.com/joyanujoy/geolib
'''
from intelmq.lib.bot import ExpertBot
from intelmq.lib.exceptions import MissingDependencyError

try:
from geolib import geohash
Expand All @@ -23,7 +24,7 @@ class GeohashExpertBot(ExpertBot):

def init(self):
if not geohash:
raise ValueError("Library 'geolib' is required, please install it.")
raise MissingDependencyError("geolib")

def process(self):
event = self.receive_message()
Expand Down
7 changes: 3 additions & 4 deletions intelmq/bots/experts/http/expert_content.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@
from typing import List

from intelmq.lib.bot import ExpertBot
from intelmq.lib.utils import create_request_session
from intelmq.lib.mixins import HttpMixin


class HttpContentExpertBot(ExpertBot):
class HttpContentExpertBot(ExpertBot, HttpMixin):
"""
Test if a given string is part of the content for a given URL

Expand All @@ -29,8 +29,7 @@ class HttpContentExpertBot(ExpertBot):
__session = None

def init(self):
self.set_request_parameters()
self.__session = create_request_session(self)
self.__session = self.http_session()

def process(self):
event = self.receive_message()
Expand Down
6 changes: 3 additions & 3 deletions intelmq/bots/experts/http/expert_status.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@

from intelmq.lib.bot import ExpertBot
from intelmq.lib.utils import create_request_session
from intelmq.lib.mixins import HttpMixin


class HttpStatusExpertBot(ExpertBot):
class HttpStatusExpertBot(ExpertBot, HttpMixin):
"""
Fetch the HTTP Status for a given URL

Expand All @@ -31,8 +32,7 @@ def process(self):
event = self.receive_message()

if self.field in event:
self.set_request_parameters()
session = create_request_session(self)
session = self.http_session()

try:
response = session.get(event[self.field])
Expand Down
14 changes: 3 additions & 11 deletions intelmq/bots/experts/national_cert_contact_certat/expert.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,30 +20,22 @@
"""

from intelmq.lib.bot import ExpertBot
from intelmq.lib.mixins import HttpMixin
from intelmq.lib.utils import create_request_session
from intelmq.lib.exceptions import MissingDependencyError

try:
import requests
except ImportError:
requests = None


URL = 'https://contacts.cert.at/cgi-bin/abuse-nationalcert.pl'


class NationalCERTContactCertATExpertBot(ExpertBot):
class NationalCERTContactCertATExpertBot(ExpertBot, HttpMixin):
"""Add country and abuse contact information from the CERT.at national CERT Contact Database. Set filter to true if you want to filter out events for Austria. Set overwrite_cc to true if you want to overwrite an existing country code value"""
filter: bool = False
http_verify_cert: bool = True
overwrite_cc: bool = False

def init(self):
if requests is None:
raise MissingDependencyError("requests")

self.set_request_parameters()
self.session = create_request_session(self)
self.session = self.http_session()

def process(self):
event = self.receive_message()
Expand Down
19 changes: 5 additions & 14 deletions intelmq/bots/experts/rdap/expert.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,13 @@
# SPDX-License-Identifier: AGPL-3.0-or-later

# -*- coding: utf-8 -*-
import requests
from intelmq.lib.bot import ExpertBot
from intelmq.lib.utils import create_request_session
from intelmq.lib.exceptions import MissingDependencyError
from intelmq.lib.mixins import CacheMixin
from intelmq.lib.mixins import CacheMixin, HttpMixin

try:
import requests
except ImportError:
requests = None


class RDAPExpertBot(ExpertBot, CacheMixin):
class RDAPExpertBot(ExpertBot, CacheMixin, HttpMixin):
""" Get RDAP data"""
rdap_order: list = ['abuse', 'technical', 'administrative', 'registrant', 'registrar']
rdap_bootstrapped_servers: dict = {}
Expand All @@ -30,11 +25,7 @@ class RDAPExpertBot(ExpertBot, CacheMixin):
__session: requests.Session

def init(self):
if requests is None:
raise MissingDependencyError("requests")

self.set_request_parameters()
self.__session = create_request_session(self)
self.__session = self.http_session()

# get overall rdap data from iana
resp = self.__session.get('https://data.iana.org/rdap/dns.json')
Expand Down Expand Up @@ -73,7 +64,7 @@ def process(self):
if result:
event.add('source.abuse_contact', result, overwrite=self.overwrite)
else:
self.__session = create_request_session(self)
self.__session = self.http_session()
domain_parts = url.split('.')
domain_suffix = None
while domain_suffix is None:
Expand Down
Loading