Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find without build html tree #534

Merged
merged 3 commits into from
Feb 9, 2024
Merged

Conversation

ypconstante
Copy link
Contributor

We finally got to this PR, after all of the preparation PRs.

With this PR changes, we'll check if we can run the search directly in the raw html nodes instead of building an html tree. To keep it simple for now, this will be applied only if find is called with a single selector that is not a composite selector nor uses pseudo-classes.

Composite selectors will require some more work, but it looks possible to do. Some pseudo-classes don't require the tree and should be easy to enable, like :disabled, but the other pseudo-classes will be a lot more complicated, and will require some investigation to see how hard it would be to implement them.

Benchmark results are a lot better than I expected, specially on big documents.

##### With input big #####
Name                             ips        average  deviation         median         99th %
id (pr)                       544.93        1.84 ms    ±12.73%        1.77 ms        2.70 ms
class (pr)                    413.43        2.42 ms     ±9.11%        2.36 ms        3.31 ms
tag name (type) (pr)          183.38        5.45 ms     ±7.29%        5.37 ms        6.99 ms
id (main)                      53.18       18.81 ms    ±18.93%       18.21 ms       29.85 ms
class (main)                   51.17       19.54 ms    ±19.50%       18.92 ms       30.66 ms
tag name (type) (main)         35.62       28.08 ms    ±11.25%       27.36 ms       35.57 ms

Comparison: 
id (pr)                       544.93
class (pr)                    413.43 - 1.32x slower +0.58 ms
tag name (type) (pr)          183.38 - 2.97x slower +3.62 ms
id (main)                      53.18 - 10.25x slower +16.97 ms
class (main)                   51.17 - 10.65x slower +17.71 ms
tag name (type) (main)         35.62 - 15.30x slower +26.24 ms

Memory usage statistics:

Name                      Memory usage
id (pr)                        1.87 MB
class (pr)                     1.89 MB - 1.01x memory usage +0.0155 MB
tag name (type) (pr)           2.32 MB - 1.24x memory usage +0.44 MB
id (main)                      9.46 MB - 5.05x memory usage +7.58 MB
class (main)                   9.47 MB - 5.05x memory usage +7.60 MB
tag name (type) (main)        12.90 MB - 6.89x memory usage +11.03 MB

**All measurements for memory usage were the same**

##### With input medium #####
Name                             ips        average  deviation         median         99th %
id (pr)                      1723.26        0.58 ms    ±13.85%        0.55 ms        0.87 ms
class (pr)                    764.11        1.31 ms    ±11.60%        1.27 ms        1.81 ms
tag name (type) (pr)          565.07        1.77 ms     ±9.08%        1.73 ms        2.26 ms
id (main)                     306.59        3.26 ms     ±8.71%        3.22 ms        4.76 ms
class (main)                  249.58        4.01 ms     ±8.21%        3.95 ms        5.84 ms
tag name (type) (main)        143.08        6.99 ms    ±11.33%        6.86 ms        9.61 ms

Comparison: 
id (pr)                      1723.26
class (pr)                    764.11 - 2.26x slower +0.73 ms
tag name (type) (pr)          565.07 - 3.05x slower +1.19 ms
id (main)                     306.59 - 5.62x slower +2.68 ms
class (main)                  249.58 - 6.90x slower +3.43 ms
tag name (type) (main)        143.08 - 12.04x slower +6.41 ms

Memory usage statistics:

Name                      Memory usage
id (pr)                        0.61 MB
class (pr)                     0.65 MB - 1.08x memory usage +0.0478 MB
tag name (type) (pr)           0.75 MB - 1.24x memory usage +0.147 MB
id (main)                      2.74 MB - 4.51x memory usage +2.13 MB
class (main)                   2.79 MB - 4.59x memory usage +2.18 MB
tag name (type) (main)         3.70 MB - 6.09x memory usage +3.09 MB

**All measurements for memory usage were the same**

##### With input small #####
Name                             ips        average  deviation         median         99th %
id (pr)                       7.96 K      125.56 μs    ±17.45%      116.91 μs      218.21 μs
class (pr)                    4.02 K      249.01 μs    ±14.21%      232.77 μs      398.61 μs
tag name (type) (pr)          2.80 K      357.10 μs    ±11.23%      344.87 μs      553.38 μs
id (main)                     1.67 K      599.25 μs    ±17.86%      616.19 μs      901.97 μs
class (main)                  1.25 K      800.95 μs    ±14.96%      845.50 μs     1060.31 μs
tag name (type) (main)        0.84 K     1196.96 μs    ±14.09%     1158.25 μs     1811.77 μs

Comparison: 
id (pr)                       7.96 K
class (pr)                    4.02 K - 1.98x slower +123.45 μs
tag name (type) (pr)          2.80 K - 2.84x slower +231.54 μs
id (main)                     1.67 K - 4.77x slower +473.69 μs
class (main)                  1.25 K - 6.38x slower +675.39 μs
tag name (type) (main)        0.84 K - 9.53x slower +1071.39 μs

Memory usage statistics:

Name                           average  deviation         median         99th %
id (pr)                      127.27 KB     ±0.00%      127.27 KB      127.27 KB
class (pr)                   135.13 KB     ±0.00%      135.13 KB      135.13 KB
tag name (type) (pr)         156.77 KB     ±0.00%      156.77 KB      156.77 KB
id (main)                    620.36 KB     ±0.00%      620.36 KB      620.36 KB
class (main)                 625.55 KB     ±0.00%      625.55 KB      625.59 KB
tag name (type) (main)       801.92 KB     ±0.00%      801.92 KB      801.92 KB

Comparison: 
id (pr)                      127.27 KB
class (pr)                   135.13 KB - 1.06x memory usage +7.87 KB
tag name (type) (pr)         156.77 KB - 1.23x memory usage +29.50 KB
id (main)                    620.36 KB - 4.87x memory usage +493.09 KB
class (main)                 625.55 KB - 4.92x memory usage +498.28 KB
tag name (type) (main)       801.92 KB - 6.30x memory usage +674.66 KB

Copy link
Owner

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark results are a lot better than I expected, specially on big documents.

OMG! 15x faster!! Congrats, @ypconstante!! 🚀

Esse é brabo demais! :D

lib/floki/selector.ex Outdated Show resolved Hide resolved
@philss philss merged commit c688e2a into philss:main Feb 9, 2024
9 checks passed
@ypconstante ypconstante deleted the find-without-html-tree branch February 17, 2024 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants