Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call self_closing_tags only once on raw_html #517

Merged
merged 3 commits into from
Jan 4, 2024

Conversation

ypconstante
Copy link
Contributor

@ypconstante ypconstante commented Dec 30, 2023

self_closing_tags is not a slow function, but today it is called thousands of times while generating the raw html, which ends up causing a significant performance impact.

This PR modifies the RawHTML module to call self_closing_tags() only once on raw_html, sending the value as params to the other functions. raw_html also converts closing tags to MapSet to help speed up the check .

##### With input big #####
Name                 ips        average  deviation         median         99th %
bench (pr)         61.60       16.23 ms    ±14.47%       15.99 ms       21.96 ms
bench              39.96       25.03 ms     ±9.18%       24.86 ms       31.23 ms

Comparison: 
bench (pr)         61.60
bench              39.96 - 1.54x slower +8.79 ms

Memory usage statistics:

Name          Memory usage
bench (pr)         8.47 MB
bench              9.49 MB - 1.12x memory usage +1.02 MB

**All measurements for memory usage were the same**

##### With input medium #####
Name                 ips        average  deviation         median         99th %
bench (pr)        197.42        5.07 ms    ±24.27%        4.80 ms        9.11 ms
bench             124.30        8.05 ms    ±14.06%        7.77 ms       11.08 ms

Comparison: 
bench (pr)        197.42
bench             124.30 - 1.59x slower +2.98 ms

Memory usage statistics:

Name          Memory usage
bench (pr)         2.95 MB
bench              3.31 MB - 1.12x memory usage +0.36 MB

**All measurements for memory usage were the same**

##### With input small #####
Name                 ips        average  deviation         median         99th %
bench (pr)        1.26 K        0.79 ms    ±10.28%        0.77 ms        1.10 ms
bench             0.74 K        1.35 ms    ±10.38%        1.30 ms        1.83 ms

Comparison: 
bench (pr)        1.26 K
bench             0.74 K - 1.70x slower +0.55 ms

Memory usage statistics:

Name          Memory usage
bench (pr)       645.78 KB
bench            720.79 KB - 1.12x memory usage +75.01 KB

**All measurements for memory usage were the same**
read_file = fn name ->
  __ENV__.file
  |> Path.dirname()
  |> Path.join(name)
  |> File.read!()
  |> Floki.parse_document!()
end

inputs = %{
  "big" => read_file.("big.html"),
  "medium" => read_file.("medium.html"),
  "small" => read_file.("small.html")
}

Benchee.run(
  %{
    "bench" => &Floki.raw_html/1
  },
  time: 10,
  inputs: inputs,
  memory_time: 2,
)

@ypconstante ypconstante force-pushed the optimize-self-closing-tags branch from d821500 to 5339fa3 Compare December 30, 2023 16:34
Copy link
Owner

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 🚀

@@ -47,35 +47,59 @@ defmodule Floki.RawHTML do
_ -> :noop
end

IO.iodata_to_binary(build_raw_html(html_tree, [], encoder, padding))
self_closing_tags = MapSet.new(self_closing_tags())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: did you find MapSet faster than using a List with unique elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some tests, and it seems that performance is the same for this case, I reverted back to list 4d53e0f

@philss
Copy link
Owner

philss commented Jan 3, 2024

@ypconstante thank you! 💜

@philss philss merged commit 725e530 into philss:main Jan 4, 2024
9 checks passed
@ypconstante ypconstante deleted the optimize-self-closing-tags branch January 4, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants