Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acting after on_end_tag #108

Open
mitsuhiko opened this issue Nov 28, 2021 · 5 comments
Open

Acting after on_end_tag #108

mitsuhiko opened this issue Nov 28, 2021 · 5 comments

Comments

@mitsuhiko
Copy link
Contributor

I'm not sure if this is a feature request but I have tried using on_end_tag to do something after a tag has been closed. Unfortunately the handler is invoked before the tag is being written into the sink. This is intentional clearly as this lets the handler modify things like the tag name or append stuff behind the tag, but it also means that you cannot communicate into the sink easily.

My idea was to instruct the sink to output or not output content outside of an element of interest (eg: to "select" a certain element exclusively). I am thus flipping a flag on enter/leave. The result now is that my closing tag is no longer emitted.

I believe there are use cases where one wants to have code run after the tag has been closed and emitted tot he sink and I'm not sure if this is at all possible at the moment.

@jongiddy
Copy link
Collaborator

I see you've proposed #109 to add this capability.

Curious as to why this didn't seem to work using the existing on_end_tag, I had a go at getting it to work. This is my code to display only the a tags in a document, including the start and end tags. I assume that you've got something similar to the can_write flag in your code. Adding the extra string was the only additional step needed. It might be considered slightly hacky that I construct the end tag string manually, but end tags are pretty simple.

I don't object to your proposed change. I just wanted to understand why it was a problem. Have I understood it correctly, or have I missed an aspect of the problem you're describing?

use lol_html::{element, HtmlRewriter, Settings};
use std::{cell::RefCell, error::Error, rc::Rc};

const PAGE: &str = "
<html>
This <a href=\"http://example.com\">link</a> is an example.
</html>
";

struct OutputHandler {
    can_write: bool,
    extra: String,
}
impl OutputHandler {
    fn on(&mut self) {
        self.can_write = true;
    }
    fn off(&mut self) {
        self.can_write = false;
    }
    fn push(&mut self, extra: &str) {
        self.extra.push_str(extra)
    }
}

fn main() -> Result<(), Box<dyn Error>> {
    let output = Rc::new(RefCell::new(OutputHandler {
        can_write: false,
        extra: String::new(),
    }));
    let element_content_handlers = vec![element!("a", |a| {
        output.borrow_mut().on();
        let output = output.clone();
        a.on_end_tag(move |tag| {
            let mut handler = output.borrow_mut();
            handler.push(&format!("</{}>", tag.name()));
            handler.off();
            Ok(())
        })?;
        Ok(())
    })];

    let output = output.clone();
    let mut rewriter = HtmlRewriter::new(
        Settings {
            element_content_handlers,
            ..Settings::default()
        },
        |chunk: &[u8]| {
            let mut handler = output.borrow_mut();
            if !handler.extra.is_empty() {
                print!("{}", handler.extra);
                handler.extra.clear();
            }
            if handler.can_write {
                print!("{}", String::from_utf8_lossy(chunk))
            }
        },
    );
    rewriter.write(PAGE.as_ref())?;
    rewriter.end()?;
    Ok(())
}

@mitsuhiko
Copy link
Contributor Author

You're right in that it can be somewhat emulated but it's quite inconvenient. This solution now also always inserts a closing tag, even if that did not exist in the original document. For me the biggest issue was actually that I attempted to maintain a somewhat accurate tag stack to make more meaningful decisions and having the on_end fire "within" the stack level creates a lot of complexities.

However right now all of this is entirely blocked on #110 anyways. A solution to that might change the situation somewhat.

@jongiddy
Copy link
Collaborator

Would this be easier if, rather than having the on_end_tag method on the Element, there was a separate end tag handler like there is a separate element text handler? That was my original idea, but the on_end_tag callback was easier to implement.

@mitsuhiko
Copy link
Contributor Author

Potentially. The current nice aspect of this on_end_tag business is that you can pass state from the start tag to the end tag somehow, but with the need to maintain a stack anyways that might not be necessary.

@mitsuhiko
Copy link
Contributor Author

I came across this today and decided to close down #109 because too much has changed in the meantime. The problem still exists and #110 is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants