Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add score-based spam detection besides blacklist #3

Open
ktos opened this issue Oct 2, 2020 · 4 comments
Open

Add score-based spam detection besides blacklist #3

ktos opened this issue Oct 2, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@ktos
Copy link

ktos commented Oct 2, 2020

  • I'm submitting a ...
    [ ] spammer report
    [ ] bug report
    [X] feature request
    [ ] question about the decisions made in the repository
    [ ] question about how to use this project

  • Summary

It seems that this tool is only a simple blacklist - but I think some kind of negative scoring system may be introduced.

  • Other information (e.g. detailed explanation, stack traces, related issues, suggestions how to fix, links for us to have context, eg. StackOverflow, personal fork, etc.)

I believe we can score PRs negatively (and positively) and mark as spam if a defined threshold is met. For example some things deducting score may be:

  • Changes only in text files (.md, .html).
  • Changes only in one file (or removal a single file),
  • Changes only in one line,
  • Changes consisting of words "awesome" or "amazing" ;) (aka: blacklisting words in commits messages and diffs themselves),
  • Empty descriptions,
  • "patch-1" as a name of remote branch.

Of course, it's not the best solution, as it won't be 100% bulletproof, but what do you think?

@StefanJanssen95
Copy link

That is not really possible. someone changing 4 words with actual typo's can be a sincere pull request, and if I would make a sincere pull request and get labelled as spam right away I'm not sure if I would spent my time on a project like that.

@ktos
Copy link
Author

ktos commented Oct 2, 2020

In my mind most (or all, or configurable number of) checks must be met to PR be marked as spam, so legitimate correcting of typos shouldn't trigger anything.

@maximelafarie
Copy link
Owner

Thank you for your contribution @ktos! As said by @StefanJanssen95 we need to refine the criteria to detect sincere PRs and spam PRs the more accurate way possible.

As planned in #1, if we are detaching the blacklist from the build and make it an external JSON file, we absolutely can add some more details and indicators attached to a user.

It implies defining a new and more complete model based on criteria we would use to compute a trust-score.
In addition, it would be nice to let the user configure its own minimal allowed threshold in the GitHub Action. Feel free to make some suggestions, propose some code and make some PRs.

@maximelafarie maximelafarie added the enhancement New feature or request label Oct 2, 2020
@aminya
Copy link

aminya commented Jan 22, 2021

I think this is a cool idea, but hard to implement. Someone may train a neural network on the database to find some relation between the information and the spamminess of a PR.

If implemented the intelligent algorithm should not close the PRs automatically but labeling the PR as "possibly not following the standards".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants