Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement 3: simplify queries #195

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

msm-code
Copy link
Contributor

@msm-code msm-code commented Dec 15, 2022

I'm a bit disasppointed in this PR - TBH I expected another big win, while it turned out to be mediocre.

Nevertheless there are some improvements here, and the code is not too complex, so I think it's worth merging (I rejected a few ideas that turned out to be complex to implement for almost zero benefit).

As usual, results can be seen in this repo: https://github.com/msm-code/ursa-bench temporarily hosted here: http://65.21.130.153:8000/hdd_all.html

Ok, but what's going on in the code?

Basically we flatten queries:

AND(AND("A", "B"), AND("C", "D")) becomes AND("A", "B", "C", "D").
OR(OR("A", "B"), OR("C", "D")) becomes OR("A", "B", "C", "D").

And we handle minof special cases:

1 of ("A", "B") becomes OR("A", "B")
2 of ("A", "B") becomes AND("A", "B")

(should be merged after improvements 2)

@msm-code
Copy link
Contributor Author

After my adventures in #197, I've decided to check performance of this PR again. Again, let's use time because we don't have a proper counter in place (#198).

This PR with warm cache:

(venv) root@mquery ~/ursa-bench # time python3 ursabench.py rules/signature-base/yara/* > /dev/null
real    1m18.954s
user    0m13.420s
sys     0m0.700s

This PR after dropping page caches:

real    70m7.672s
user    0m17.272s
sys     0m0.832s

This is a bit faster than #197, but the difference is minimal. In the current form it doesn't bring enough benefits to merge, I think.

On the other hand, maybe we could prioritise ngrams depending on their use count. But I'm not yet ready to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant