Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improved performance of LIKE matching #1049

Merged
merged 5 commits into from
May 14, 2024

Conversation

Cali0707
Copy link
Contributor

@Cali0707 Cali0707 commented May 1, 2024

This PR attempts to improve the performance of CESQL LIKE matches by moving away from building a regex and instead taking a simple greedy algorithm approach (which guarantees O(n+m) time complexity to consume the full string, where n is the length of the input string, and m is the length of the pattern).

This change has the added benefit of simplifying the code - we are no longer building and then evaluating a regex, simply looping over a string.

@Cali0707 Cali0707 marked this pull request as ready for review May 3, 2024 20:08
@Cali0707 Cali0707 requested a review from a team as a code owner May 3, 2024 20:08
@Cali0707
Copy link
Contributor Author

Cali0707 commented May 3, 2024

Using the benchmarks in #1050 , I got the following results for this improvement over the original implementation:

benchmark                                                                             old ns/op     new ns/op     delta
BenchmarkTCK/Like_expression/Exact_match_(1)_parse-8                                  14758         8544          -42.11%
BenchmarkTCK/Like_expression/Exact_match_(1)_evaluate-8                               54.0          16.6          -69.32%
BenchmarkTCK/Like_expression/Exact_match_(2)_parse-8                                  12648         9543          -24.55%
BenchmarkTCK/Like_expression/Exact_match_(2)_evaluate-8                               53.9          18.3          -66.09%
BenchmarkTCK/Like_expression/Exact_match_(negate)_parse-8                             14208         9991          -29.68%
BenchmarkTCK/Like_expression/Exact_match_(negate)_evaluate-8                          62.8          24.3          -61.33%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_parse-8                          14502         10052         -30.69%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_evaluate-8                       138           19.4          -85.97%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_parse-8                          13726         9677          -29.50%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_evaluate-8                       142           20.8          -85.41%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_parse-8                          15831         9393          -40.67%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_evaluate-8                       226           27.3          -87.94%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_parse-8                          13356         9529          -28.65%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_evaluate-8                       168           18.7          -88.87%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_parse-8                          14551         8331          -42.75%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_evaluate-8                       14.9          15.4          +3.08%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_parse-8                          11908         8704          -26.91%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_evaluate-8                       14.9          13.6          -9.30%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_parse-8                          13463         9115          -32.30%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_evaluate-8                       220           25.7          -88.29%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_parse-8                          13717         8474          -38.22%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_evaluate-8                       54.1          14.2          -73.69%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_parse-8                          14725         9084          -38.31%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_evaluate-8                       14.9          16.3          +9.37%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_parse-8                          14657         8456          -42.31%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_evaluate-8                       69.3          18.2          -73.73%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_parse-8                          12694         8659          -31.79%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_evaluate-8                       15.1          16.3          +7.93%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_parse-8                          14553         9358          -35.70%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_evaluate-8                       14.9          17.3          +15.60%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_parse-8                          13478         9348          -30.64%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_evaluate-8                       70.8          18.2          -74.29%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_parse-8                          17317         9626          -44.41%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_evaluate-8                       69.3          18.2          -73.74%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_parse-8                          13191         8579          -34.96%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_evaluate-8                       41.4          14.2          -65.56%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_parse-8                 13441         9493          -29.37%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_evaluate-8              55.8          18.9          -66.10%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_parse-8                 14298         9841          -31.17%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_evaluate-8              64.1          26.6          -58.45%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_parse-8                 14786         9259          -37.38%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_evaluate-8              40.1          16.1          -59.84%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_parse-8                 13357         9316          -30.25%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_evaluate-8              14.9          16.1          +7.90%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_parse-8                 14233         9091          -36.13%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_evaluate-8              14.9          15.9          +6.22%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_parse-8                 12699         9316          -26.64%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_evaluate-8              55.4          18.7          -66.28%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_parse-8                 16482         8654          -47.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_evaluate-8              40.1          15.9          -60.44%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_parse-8                 13331         9264          -30.51%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_evaluate-8              14.9          15.9          +6.22%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_parse-8                  16142         9377          -41.91%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_evaluate-8               916           517           -43.54%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_parse-8        18162         10281         -43.39%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_evaluate-8     889           480           -46.02%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_parse-8                  14798         9458          -36.09%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_evaluate-8               125           69.2          -44.72%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_parse-8                  12326         8368          -32.11%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_evaluate-8               155           74.1          -52.23%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_parse-8                  12217         8298          -32.08%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_evaluate-8               129           74.0          -42.73%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_parse-8                 12822         9170          -28.48%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_evaluate-8              84.4          19.2          -77.18%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_parse-8                 13436         8809          -34.44%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_evaluate-8              122           20.6          -83.18%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_parse-8                 12214         10246         -16.11%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_evaluate-8              43.0          15.2          -64.67%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_parse-8                 13674         8945          -34.58%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_evaluate-8              82.5          20.3          -75.39%

benchmark                                                                             old allocs     new allocs     delta
BenchmarkTCK/Like_expression/Exact_match_(1)_parse-8                                  137            94             -31.39%
BenchmarkTCK/Like_expression/Exact_match_(1)_evaluate-8                               0              0              +0.00%
BenchmarkTCK/Like_expression/Exact_match_(2)_parse-8                                  139            94             -32.37%
BenchmarkTCK/Like_expression/Exact_match_(2)_evaluate-8                               0              0              +0.00%
BenchmarkTCK/Like_expression/Exact_match_(negate)_parse-8                             142            99             -30.28%
BenchmarkTCK/Like_expression/Exact_match_(negate)_evaluate-8                          0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_parse-8                          137            94             -31.39%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_parse-8                          136            93             -31.62%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_parse-8                 153            99             -35.29%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_parse-8                  162            93             -42.59%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_evaluate-8               2              2              +0.00%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_parse-8        167            98             -41.32%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_evaluate-8     2              2              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_parse-8                  136            93             -31.62%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_evaluate-8               2              2              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_parse-8                  141            94             -33.33%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_evaluate-8               2              2              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_parse-8                  137            94             -31.39%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_evaluate-8               2              2              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_parse-8                 138            91             -34.06%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_parse-8                 134            91             -32.09%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_parse-8                 138            91             -34.06%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_parse-8                 142            91             -35.92%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_evaluate-8              0              0              +0.00%

benchmark                                                                             old bytes     new bytes     delta
BenchmarkTCK/Like_expression/Exact_match_(1)_parse-8                                  7240          4688          -35.25%
BenchmarkTCK/Like_expression/Exact_match_(1)_evaluate-8                               0             0             +0.00%
BenchmarkTCK/Like_expression/Exact_match_(2)_parse-8                                  7376          4720          -36.01%
BenchmarkTCK/Like_expression/Exact_match_(2)_evaluate-8                               0             0             +0.00%
BenchmarkTCK/Like_expression/Exact_match_(negate)_parse-8                             7536          4984          -33.86%
BenchmarkTCK/Like_expression/Exact_match_(negate)_evaluate-8                          0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_parse-8                          9328          4712          -49.49%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_parse-8                          9328          4720          -49.40%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_parse-8                          9344          4736          -49.32%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_parse-8                          9328          4720          -49.40%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_parse-8                          7240          4688          -35.25%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_parse-8                          7224          4672          -35.33%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_parse-8                          9344          4736          -49.32%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_parse-8                          9344          4736          -49.32%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_parse-8                          8944          4712          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_parse-8                 8416          4736          -43.73%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_parse-8                 8712          5032          -42.24%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_parse-8                 8416          4736          -43.73%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_parse-8                 8384          4712          -43.80%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_parse-8                 8384          4712          -43.80%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_parse-8                 8416          4736          -43.73%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_parse-8                 8416          4736          -43.73%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_parse-8                 8384          4712          -43.80%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_parse-8                  11912         4752          -60.11%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_evaluate-8               341           336           -1.47%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_parse-8        12208         5048          -58.65%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_evaluate-8     341           336           -1.47%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_parse-8                  7336          4672          -36.31%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_evaluate-8               19            19            +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_parse-8                  7576          4680          -38.23%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_evaluate-8               20            20            +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_parse-8                  7336          4680          -36.21%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_evaluate-8               20            20            +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_parse-8                 7544          4648          -38.39%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_parse-8                 7632          4648          -39.10%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_parse-8                 7544          4648          -38.39%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_parse-8                 8392          4672          -44.33%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_evaluate-8              0             0             +0.00%

@dgeorgievski are these improvements good enough for your use case? There are further performance improvements I could make if it is really needed, but they would make the code more complex so I'd prefer not to if there isn't a need

@Cali0707
Copy link
Contributor Author

Cali0707 commented May 3, 2024

cc @pierDipi @duglin

chunks = append(chunks, ".*")
for textIdx < textLen {
// handle escaped characters -> pattern needs to increment two places here
if patternIdx < patternLen-1 && pattern[patternIdx] == '\\' &&
Copy link
Contributor

@duglin duglin May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if moving the pattern[patternIdx] == '\\' to be first would speed things up slightly since patternIdx < patternLen-1 will be true most of the time, and I think doing the check for \ is the biggest determining factor, so let's stop on that check as quickly as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh crap - never mind, I mixed up text* and pattern* variables. Although, I do wonder if you should add a check for the rest of the pattern being empty and then by-pass all of these if's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, an empty pattern is an interesting case. I guess currently you would just end up in the final else clause and return false. We could definitely add a quick check at the top to check the textLen and patternLen before entering the for loop and all the if statemenets

lastMatchIdx = textIdx
patternIdx += 1
// greedy match didn't work, try again from the last known match
} else if lastWildcardIdx != -1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably wrong, and it's probably due to the nature of % being so greedy, but are we sure that we don't need a stack of "last known wildcard indexes" instead of just 1? I'm wondering about a case where we've seen multiple %'s and then we don't match something and we need to back-up to the first % not just the 2nd %. I can't seem to write a testcase to show the issue, hence why I'm probably wrong, but I thought I'd mention it/ask....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the trick to understanding this is that wildcard matching has optimal substructure: if we have already matched part of the input string against part of the pattern string, we only need to use the remaining parts of the pattern string and input string to determine if the entire string matches the pattern or not.

The only slight "exception" to the optimal substructure is that the % character can consume more than one element of the input. So, if at some point our pattern no longer matches the input text, we need to go back to the last wildcard character we saw, as well as the place we were at in the input text when we saw it and try again, consuming one character from the input text in the process.

The key is that we know that we have a valid match up until the last wildcard character we saw, and that apart from going back to that wildcard and consuming more characters from the input text there we can not otherwise move backwards in the pattern or the input text. So, storing an older index would not end up being used, since it would be in part of the input text we had already matched.

@Cali0707
Copy link
Contributor Author

Cali0707 commented May 9, 2024

@duglin @pierDipi since this code is pretty technical, I also ran it (with modifications for % -> * and _ -> ?, since the wildcard characters were different) through the Leetcode pattern matching test suite, and it passed all of their test cases too:

https://leetcode.com/problems/wildcard-matching/submissions/1253803571

We have reasonable coverage of the LIKE expression in the tck tests, but I bet LeetCode has more edge cases than we do, so this makes me feel a bit better about the accuracy of this

@Cali0707
Copy link
Contributor Author

cc @embano1

@duglin
Copy link
Contributor

duglin commented May 14, 2024

@duglin @pierDipi since this code is pretty technical, I also ran it (with modifications for % -> * and _ -> ?, since the wildcard characters were different) through the Leetcode pattern matching test suite, and it passed all of their test cases too:

https://leetcode.com/problems/wildcard-matching/submissions/1253803571

We have reasonable coverage of the LIKE expression in the tck tests, but I bet LeetCode has more edge cases than we do, so this makes me feel a bit better about the accuracy of this

Good thinking!

@duglin duglin merged commit 0988325 into cloudevents:main May 14, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants