Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: implement method to validate suspicious packages for malicious… #851

Draft
wants to merge 1 commit into
base: staging
Choose a base branch
from

Conversation

Yao-Wen-Chang
Copy link
Contributor

This PR refers to issue #810.

This PR implements the validator to confirm malware on PyPI. We analyze the data flow by walking through the AST and finding out the actual value of the variable.

For example:

requests.get(malicious_endpoint)
malicious_endpoint = "https://malicious.com"

The new method should be able to detect the https://malicious.com. Furthermore, we will analyze the historical malware data to define the suspicious pattern as a .yaml.

The suspicious_setup heuristic will be removed since it overlaps with our new method.

Following are the tasks for implementing this method:

  • Analyze the historical malware and define the malicious pattern in .yaml
  • Remove suspicious_setup heuristic
  • Implement the method to analyze the data flow
  • Provide unit tests
  • Test the method on the latest packages on PyPI to ensure the detector is more accurate

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Sep 5, 2024
@Yao-Wen-Chang Yao-Wen-Chang force-pushed the 801-extend-pypi-malware-detector branch from 9ebfdc9 to f2a0694 Compare October 13, 2024 15:56
@Yao-Wen-Chang Yao-Wen-Chang force-pushed the 801-extend-pypi-malware-detector branch from f2a0694 to f2998f5 Compare October 13, 2024 16:14
@Yao-Wen-Chang
Copy link
Contributor Author

Hi @behnazh-w @tromai,

The latest update to the PR introduces additional suspicious patterns based on the review of over 30 malware data. The analyzer can now identify malicious code sections and store them in a dictionary. However, there is still a need to address the high false positive rate.
The result from scanning argsreq (contains line number and code snippet):
{'OS Detection': {}, 'Code Execution': {271: 'subprocess.call()'}, 'Information Collecting': {}, 'Remote Connection': {265: 'requests.get(url)'}, 'Custom Setup': {}, 'Suspicious Constant': {263: ['https://cdn.discordapp.com/attachments/1227878114533572611/1228362698920562828/ConsoleApplication2.exe?ex=662bc4e9&is=66194fe9&hm=4520192e5a1190c319246c81bf958c1d3e9bb6b4cb69f43a94ccaf7fbdf35fa6&', 'windows.exe']}, 'Obfuscation': {}}

To improve accuracy, the malware analyzer should implement two key enhancements:

  1. Analyze data flow to retrieve constant values.
  2. Detect and decrypt encoded source code.

Implementing these methods is expected to reduce false positives.

Currently, the malware validator uses AST analysis, which activates only when the heuristic analysis detects suspicious behavior in the package.

Please let me know if you have any concerns, and feel free to share any ideas for the validator! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant