chore: implement method to validate suspicious packages for malicious… #851

Yao-Wen-Chang · 2024-09-05T02:02:09Z

This PR refers to issue #810.

This PR implements the validator to confirm malware on PyPI. We analyze the data flow by walking through the AST and finding out the actual value of the variable.

For example:

requests.get(malicious_endpoint)
malicious_endpoint = "https://malicious.com"

The new method should be able to detect the https://malicious.com. Furthermore, we will analyze the historical malware data to define the suspicious pattern as a .yaml.

The suspicious_setup heuristic will be removed since it overlaps with our new method.

Following are the tasks for implementing this method:

Analyze the historical malware and define the malicious pattern in .yaml
Remove suspicious_setup heuristic
Implement the method to analyze the data flow
Provide unit tests
Test the method on the latest packages on PyPI to ensure the detector is more accurate

… behavior

Yao-Wen-Chang · 2024-10-13T16:24:18Z

Hi @behnazh-w @tromai,

The latest update to the PR introduces additional suspicious patterns based on the review of over 30 malware data. The analyzer can now identify malicious code sections and store them in a dictionary. However, there is still a need to address the high false positive rate.
The result from scanning argsreq (contains line number and code snippet):
{'OS Detection': {}, 'Code Execution': {271: 'subprocess.call()'}, 'Information Collecting': {}, 'Remote Connection': {265: 'requests.get(url)'}, 'Custom Setup': {}, 'Suspicious Constant': {263: ['https://cdn.discordapp.com/attachments/1227878114533572611/1228362698920562828/ConsoleApplication2.exe?ex=662bc4e9&is=66194fe9&hm=4520192e5a1190c319246c81bf958c1d3e9bb6b4cb69f43a94ccaf7fbdf35fa6&', 'windows.exe']}, 'Obfuscation': {}}

To improve accuracy, the malware analyzer should implement two key enhancements:

Analyze data flow to retrieve constant values.
Detect and decrypt encoded source code.

Implementing these methods is expected to reduce false positives.

Currently, the malware validator uses AST analysis, which activates only when the heuristic analysis detects suspicious behavior in the package.

Please let me know if you have any concerns, and feel free to share any ideas for the validator! Thanks

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Sep 5, 2024

Yao-Wen-Chang force-pushed the 801-extend-pypi-malware-detector branch from 9ebfdc9 to f2a0694 Compare October 13, 2024 15:56

chore: implement method to validate suspicious packages for malicious…

f2998f5

… behavior

Yao-Wen-Chang force-pushed the 801-extend-pypi-malware-detector branch from f2a0694 to f2998f5 Compare October 13, 2024 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: implement method to validate suspicious packages for malicious… #851

chore: implement method to validate suspicious packages for malicious… #851

Yao-Wen-Chang commented Sep 5, 2024

Yao-Wen-Chang commented Oct 13, 2024

chore: implement method to validate suspicious packages for malicious… #851

Are you sure you want to change the base?

chore: implement method to validate suspicious packages for malicious… #851

Conversation

Yao-Wen-Chang commented Sep 5, 2024

Yao-Wen-Chang commented Oct 13, 2024