-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added HTML parsing for content from Threatmatch #2846
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed and tested part of this implementation, and it works well overall.
I left a few suggestions that I believe would help make the code simpler.
Thank you for adding information and error logs.
The connector still follows an old template of the connector implementation. New features, such as pycti.OpenCTIConnectorHelper.helper.schedule_iso
, could help you manage the connector's scheduled runs, for instance.
def remove_html_tags(self, text): | ||
class HTMLTagRemover(HTMLParser): | ||
def __init__(self): | ||
super().__init__() | ||
self.fed = [] | ||
|
||
def handle_data(self, data): | ||
self.fed.append(data) | ||
|
||
def get_data(self): | ||
return "".join(self.fed) | ||
|
||
parser = HTMLTagRemover() | ||
parser.feed(text) | ||
return parser.get_data() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not necessary as you could use popular BeatifulSoup library later in the code to remove all tags
def remove_html_tags(self, text): | |
class HTMLTagRemover(HTMLParser): | |
def __init__(self): | |
super().__init__() | |
self.fed = [] | |
def handle_data(self, data): | |
self.fed.append(data) | |
def get_data(self): | |
return "".join(self.fed) | |
parser = HTMLTagRemover() | |
parser.feed(text) | |
return parser.get_data() | |
object["description"] = bs4.BeautifulSoup(object["description"], "html.parser")..get_text()
416a305
to
982a01c
Compare
Co-authored-by: flavienSindou <[email protected]>
Co-authored-by: flavienSindou <[email protected]>
@pietrocapece Thank you for your contribution. Could you resolve the conflicts ? |
Proposed changes
Related issues
Checklist
Further comments