Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow from Phishing => Phishing.Database => VirusTotal #395

Open
spirillen opened this issue May 3, 2024 · 15 comments
Open

Workflow from Phishing => Phishing.Database => VirusTotal #395

spirillen opened this issue May 3, 2024 · 15 comments
Labels
question Further information is requested todo 🗒️

Comments

@spirillen
Copy link
Collaborator

spirillen commented May 3, 2024

Copy of #391 (comment) by @g0d33p3rsec


I'll merge them, then pyfunceble will remove the dead once

Thanks! I wonder if pyfunceble may be causing the false negatives when I add as domain or wildcard. When I first added by individual URI, Virus Total would return a positive once the commit was merged upstream. Since, as I've been adding as domain or wildcard, the sites seem to be dropped by the time this repo is merged upstream resulting in subsequent false negatives on VT from the Phishing Database even though the upstream repo showed recent merges. That's why I tried testing both a few commits ago but the results were inconclusive. I should have more time to dig into it after the semester ends next week. If you want to compare output, I've been trying to track the group using a VT collection which can be found at https://www.virustotal.com/gui/collection/5b7e996c553034dddc8c690ea6be0adb3182b0fa96ce6a8b29627e165fb47f38/iocs

Here's an example from a recent add https://www.virustotal.com/gui/url/0503dbd260648c364c10793657cdebe883da30554b3c9cbed639025ea45e58e7 Most of the detections shown are from hand feeding the domain to the individual EDR vendors, which can be a bit laborious.
image
image

@spirillen
Copy link
Collaborator Author

Copy of: #391 (comment) by @spirillen


That is an interesting observation, and for sure something this project should be following up on. But where to run the thread?... Allow me to think about this one for a while and I'll try to find the right locate for this question, but for you observation about @PyFunceble it should not be the issue when we are in the case of adding records, that could be a case while testing for removal of outdated records

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @g0d33p3rsec


That is an interesting observation, and for sure something this project should be following up on. But where to run the thread?... Allow me to think about this one for a while and I'll try to find the right locate for this question, but for you observation about @PyFunceble it should not be the issue when we are in the case of adding records, that could be a case while testing for removal of outdated records

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @g0d33p3rsec


That is an interesting observation, and for sure something this project should be following up on. But where to run the thread?... Allow me to think about this one for a while and I'll try to find the right locate for this question, but for you observation about @PyFunceble it should not be the issue when we are in the case of adding records, that could be a case while testing for removal of outdated records

Awesome, thanks for following up. I'll try to look into the upstream workflow and automation more once my schedule lightens up next week. I think the conversation would probably belong as an issue if the discussion needs to be in the open. On the other hand, I could also see a reason for treating it as a vulnerability since there's something preventing tactical intelligence from making it's way upstream.

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @funilrys


Interesting ... I'll have to investigate this too ...


@githubbot remind me.

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @g0d33p3rsec


I'll have to investigate this too

Thanks! I wasn't sure what to make of it when I first noticed as I was also observing some challenges with scanning the sites which I interpreted at the time as anti-forensic attempts. There were a couple of domains where I had to get a particular user-agent and referrer and others where I seemed to encounter geofencing. Now I'm leaning more towards a bug somewhere between the domain addition and automatic validation on our end. If there's anything I can help with, feel free to reach out.

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @g0d33p3rsec


that could be a case while testing for removal of outdated records

I'm wondering what endpoints it tries to test if it only has the domain to work with and no specific URIs. For most hosts, the root domain has been returning a default Apache/ Nginx page of the sort that comes with a fresh install. The only exception that I can think of offhand is the deface that was done to westernautomobileassembly.com

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @spirillen


that could be a case while testing for removal of outdated records

I'm wondering what endpoints it tries to test if it only has the domain to work with and no specific URIs. For most hosts, the root domain has been returning a default Apache/ Nginx page of the sort that comes with a fresh install.

This is one of many reasons that PyFunceble by default leaves a record as ACTIVE if any test are positive and then disregards any other test results.

This is of course not the Holy Grail for how this should be handled, but as there isn't enough human resources to maintain and cats any scumbag URI out there, we have to cut some corners, also the fact of RFC:954 is limited to FQDN, it don't make a hole lot of seance to keep a URI list for all those running on the ~60 years old hosts file system or even the never RPZ. The only places you really can use URI systems are in browse addons like Ublock Origin and proxy servers like Squid

@/githubbot remind me

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @g0d33p3rsec


This is one of many reasons that PyFunceble by default leaves a record as ACTIVE if any test are positive and then disregards any other test results.

Oh, that makes total sense and for an interesting problem. I should only have a few more days of having to think in C++ before I can get back to thinking in Python and take a closer look at both projects.

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @spirillen


URIs on this domain have been returning 404s for a couple of days now. I'll leave the PR open for the maintainers to do with as they please. The activity group has moved to another host and can now be found at jestertunes.com (#393).

@g0d33p3rsec As I do think about the 404 uri's from your list above, I can't think of any current process that actually would remove them from the project. Reason: we treats a 404 as a temporary brake in something bad, as you can see here:

So is there by any change that your code could run a automated test for this, or do I have to write up something (which I sucks at)

@spirillen
Copy link
Collaborator Author

Copy of #391 (comment) by @g0d33p3rsec


that could be a case while testing for removal of outdated records

So is there by any change that your code could run a automated test for this, or do I have to write up something (which I sucks at)

I'll have to study the issue more in depth. I just did a public scan of the page with the mentioned 404, for reference and while the initial request returned a 404 response, there were also a stack of 200's from the site's host which would make it even more challenging to automate the removal of this sort of false positive. The mentioned requests/ responses can be seen at https://urlscan.io/result/d161ba51-3f51-4613-b539-b6555819dc9c/#transactions. I'll try to revisit some of the other previous hosts next week to see what the responses were like from their servers after the actor had moved on to the next host. At least as far as this group is concerned, most of the hosts that I've observed are in a shared hosting environment and when I've gotten a response from the hosting companies, they tend to indicate that it was their responders who handled the remediation and not the victim responsible for the website. I may also be in a bit of an edge case as I'm tracking a single activity group in my reports which is allowing me to build some familiarity with their tactics as they move from host to host in a very linear fashion.

The false negatives are more of a concern as I'm still not seeing any of the domains from the merged commits make their way upstream. When the new records are being merged from this repo upstream, what checks are done to validate the current status? Does it just convert the domain to an http or https request and evaluate the response looking for any non 500 code? https://urlscan.io/result/9f8ec4ee-e738-4cac-979f-a78dfdb78550/ is an example from the domain that is currently hosting the kit. The only other thing that I can think of offhand is that the commits seemed to stop making their way upstream after the merge conflict (can't find which one offhand) removed an entry from the list.

@spirillen
Copy link
Collaborator Author

When the new records are being merged from this repo upstream, what checks are done to validate the current status?

To my knowledge, none, As I keeps this one as clean as the time allows me.

Which reminds me to set up a new test...

Does it just convert the domain to an http or https request and evaluate the response looking for any non 500 code?

The Does IT, what is it you refer to here, as the answer may depends on who it is

@g0d33p3rsec
Copy link
Contributor

g0d33p3rsec commented May 3, 2024

The Does IT, what is it you refer to here, as the answer may depends on who it is

By it I was referring to PyFunceble since that is what we were earlier discussing when speaking of the response codes. I'm trying to figure out what endpoint it will try to test for a response code and if that could be related to the issue. Once a commit is merged in this repo, what happens between then and when the Phishing Database is updated? I know when I made my first commits by URI and they were merged upstream, the results were almost immediately visible on VT and also propagated to other vendors.

@spirillen
Copy link
Collaborator Author

spirillen commented May 3, 2024

OT, but like to share this little meme from https://matrix.rocks/notes/9ssmc8s00z

image

For your detailed question about:

Once a commit is merged in this repo, what happens between then and when the Phishing Database is updated?...

Only @mitchellkrogza and maybe @funilrys knows

Matrix Rocks
something floating around in another group:

#meme #infosec

h/t to "you know who you are" (📎1)

@g0d33p3rsec
Copy link
Contributor

OT, but like to share this little meme

love it! All too true, unfortunately.

@g0d33p3rsec
Copy link
Contributor

I've noticed the activity group that I've been tracking has recently begun reusing previous hosts that should be protected by the list but the entries don't appear to be making it upstream from this repo to the Phishing Database. A couple of days ago, the group was observed reusing a domain that should have been protected against #381 (comment). Today, I noticed another reused domain, technowide[.]com[.]tr, which should have been blocked by #396 https://urlscan.io/result/16d88492-6993-4fa0-9afb-7cceb751e0d2/. I remember a merge conflict happening a few months ago that prevented one of my entries from making it upstream but am currently failing to find it in the commit history. It seems that whatever was changed at that time has prevented my subsequent contributions from making their way upstream leading to a loss of valuable tactical intelligence.

spirillen added a commit to external-sources/hosts-sources that referenced this issue Jul 2, 2024
Do to a bug in Phishing.Database we are not able to do full search in the active files. For that reason we are now importing the `ALL-phishing-links.txt` and strips it down to domain only list in `data/phishing_database/`

Related issues:
- mitchellkrogza/Phishing.Database#840
- mitchellkrogza/Phishing.Database#881
- mitchellkrogza/phishing#381 (comment)
- mitchellkrogza/phishing#396
- mitchellkrogza/phishing#407
- mitchellkrogza/phishing#395
- mypdns/matrix#624
- blocklistproject/Lists#1252
- mitchellkrogza/Phishing.Database#840
- mitchellkrogza/Phishing.Database#722

Trying to use @main for the php installer and using php version 8.4

Added `libdomain-publicsuffix-perl` to the dependencies.sh script as it is required by perl in import.sh. It turns out Perl just anoyingly does it again... 😏
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested todo 🗒️
Projects
None yet
Development

No branches or pull requests

2 participants