Replies: 5 comments
-
where do you intend to use bstr? we already use crates to extract links and traversing nodes of html. what else can we benefit from using bstr? |
Beta Was this translation helpful? Give feedback.
-
We read the entire file into memory here and we pass that object around: lychee/lychee-lib/src/types/input.rs Line 51 in 6d56c6b lychee/lychee-lib/src/types/input.rs Lines 279 to 281 in 6d56c6b I was thinking of ways to speed that up. |
Beta Was this translation helpful? Give feedback.
-
The consumer of InputContent is in lychee-lib::extract module. Most of time it's directly passed to html5gum or pulldown_cmark. The performance improvement is possible but I doubt how much it would be. More performance gain can be made if html5gum starts to allow unsafe code. |
Beta Was this translation helpful? Give feedback.
-
We could test it and run a benchmark. It probably also depends on the platform. https://github.com/BurntSushi/ripgrep/blob/master/crates/searcher/src/searcher/mmap.rs There are some general caveats mentioned in this article, but none we should be worried about for lychee. The article mentions that mmap was around 30% faster than |
Beta Was this translation helpful? Give feedback.
-
Converting this issue to a discussion, since it doesn't track any kind of planned work. The benchmark would still be valuable, but this is not fleshed out enough to be tackled as an actionable item. |
Beta Was this translation helpful? Give feedback.
-
After reading this comment I was wondering what would be the downsides of testing bstr for reading inputs.
The way I see it
This would probably be the fastest way to read inputs if there was a way to stream the input to the extractor (which would be a bigger change).
Another alternative: memory maps.
This is just a thought for now. Would love to get people's opinions.
Beta Was this translation helpful? Give feedback.
All reactions