-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine: Regex: Avoid compiling and storing the same regex multiple times #98
base: master
Are you sure you want to change the base?
Conversation
Previously, the engine was compiling the same regex multiple times during execution. This redundant compilation process led to unnecessary performance overhead and increased resource consumption. With this change, we store the compiled regex in a `HashSet`. If the same regex is used more than once, the engine reuses the previously compiled instance. Signed-off-by: Jon Doron <[email protected]>
Hello and thank you for your proposal. Unfortunately, we aren't going to merge this approach because we would prefer consumers of wirefilter to be in control of the cache they are using. But the use case your perfectly reasonable :) We have been working internally on a huge refactor which would only parse regex patterns using regex-syntax during filter parsing. Regex would then be compiled during filter compilation which is customizable using the Compiler trait. This is our current plan but we don't have an ETA at the moment.
|
Thank you so much for the fast reply! I started going over all the changes in this last sync, I noticed you added wildcard and wildcard strict, is there perhaps some document that summarizes some of the major changes? As for this PR, I understand and looking forward to work with what you have described. I was wondering if in the mean while if ill put this change under a feature flag will help? There are are few other changes we were hoping to present to you, we will be working on them in the coming week and send you a PR to review. |
I am very sorry about my answer, but no, we aren't going to merge changes that we aren't going to support and use internally.
Probably best to open issues about what you want/need first so that we can discuss ahead of time what is the best way forward. |
I don't think it does. As long as you store and reuse the |
That's true but it's fairly common to use the same regex in different and unrelated filters. |
Ah yeah, in those scenarios caching is definitely up to the user. |
In our use cases we have many different rules each one is a filter, but from your conversation it sounds like I might be using Wirefilter wrong. Is not it 1 rule to 1 filter? I mean one could do an OR between all the different conditions to build a single giant filter, but then how would you know which rules were triggered, and the moment a single "sub-rule" is triggered the evaluation would stop. Am I missing something in the way I currently work with Wirefilter? |
Regardless a small issue I ran into few days ago is the signature of get_field_value, which uses the wrong lifetime
This allows doing stuff like "copying" a field to another field |
We ran into some performance issue in our end to end tests with the new wirefilter, I was wondering if it would be possible for you guys to publish a branch with all the git commits rather than a single squashed one. so we can run a bisec easier to try and spot what is causing the issue. Thanks and happy new year! |
Unfortunately no, it's too tedious to analyze each commit and verify what information is contained in the commit and if it's publicly shareable. Alternatively, if you can reproduce the problem with some standalone code / script etc, we can probably run it internally and try to figure out the problem. That being said, if it's related to regexes, Wirefilter is probably not the problem since we are just a very simple consumer of it. The regex crate has been substantially rewritten in the last year and a half and we have found some performance regressions with it but never had time to reproduce in isolation in publicly shareable test cases. |
Previously, the engine was compiling the same regex multiple times during execution. This redundant compilation process led to unnecessary performance overhead and increased resource consumption.
With this change, we store the compiled regex in a
HashSet
. If the same regex is used more than once, the engine reuses the previously compiled instance.