Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update transitive.rb #230

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Update transitive.rb #230

wants to merge 3 commits into from

Conversation

Beagle123
Copy link

@Beagle123 Beagle123 commented Jan 7, 2025

GitHub: fix GH-228

This makes Transitive.rb do exactly what the docs say it should do. It outputs Pretty xml except for text nodes which are left untouched. If you read the issue that I posted, you can see that the current Transitive output is garbled. Also, this has the benefit of being initialized using Prestty.rb, initialize method for simplicity.

This makes Transitive.rb do exactly what the docs say it should do.  It outputs Pretty xml except for text nodes which are left untouched.
This formatter was adding newline characters so the Text nodes grew every time you saved/wrote them.  Adding #strip fixes it.  I've been working with it and it works perfectly.
@naitoh
Copy link
Contributor

naitoh commented Jan 9, 2025

It seems that this is a backward incompatible change.

@kou
What do you think?

See: #228

@tompng
Copy link
Member

tompng commented Jan 9, 2025

Formatting this html content

html = "<html><title> foo </title><body><div> bar </div></body></html>"
REXML::Document.new(html).write(STDOUT, 2, true)

In the document

This means that no extra whitespace nodes are inserted, and whitespace within text nodes is preserved.

This is transitive output. It looks weird, but it's indented, it's valid, and the content is preserved.
Just like the document, title.text == ' foo ' and div.text == ' bar ' is unchanged and no whitespace text node is inserted between </title> and <body>.
This kind of formatting is the only way to indent without inserting newline nor whitespace between > and <.

<html
><title
  > foo </title
  ><body
  ><div
    > bar </div
    ></body
  ></html
>

This is not transitive. It looks pretty but the content is not preserved.
title.text == "\nfoo\n " and div.text == "\nbar\n " is changed from the original.
Text node "\n " is inserted between </title> and <body>.

<html>
  <title>
foo
  </title>
  <body>
    <div>
bar
    </div>
  </body>
</html>

@Beagle123
Copy link
Author

Beagle123 commented Jan 9, 2025

If you guys want to preserve Transitive the way that it is for backward compatibility, that's fine with me. My issue is that I need a user friendly (pretty) output that doesn't change my text nodes.

I don't see any scenario where having the formatter change the text is advantageous. When I write my xml with Pretty, it adds a bunch of whitespace so when I read it back it's wrong. Its a different string that doesn't work. This makes no sense to me becuase the purpose of XML is to be read by a program, not a human. Pretty makes it human readable but now program readable.

I really like Pretty.rb. I'd love to see it have an option to preserve text nodes. That would be the best imho.

In fact, I would suggest making the default behavior of Pretty be to preserve text nodes, and have an option to indent with a bunch of whitespace. That makes sense to me.

Are you guys receptive to that instead?

@kou
Copy link
Member

kou commented Jan 10, 2025

In fact, I would suggest making the default behavior of Pretty be to preserve text nodes, and have an option to indent with a bunch of whitespace. That makes sense to me.

What does "text nodes" mean?

Could you provide a sample XML and an expected pretty output for it?

@naitoh
Copy link
Contributor

naitoh commented Jan 11, 2025

@tompng

In the document

This means that no extra whitespace nodes are inserted, and whitespace within text nodes is preserved.

This is transitive output. It looks weird, but it's indented, it's valid, and the content is preserved. Just like the document, title.text == ' foo ' and div.text == ' bar ' is unchanged and no whitespace text node is inserted between </title> and <body>. This kind of formatting is the only way to indent without inserting newline nor whitespace between > and <.

Thanks for your comment.
I understand the purpose of REXML::Formatters::Transitive.

REXML::Formatters::Transitive is implemented as documented, and I don't think REXML::Formatters::Transitive needs to be changed.

@Beagle123

#228

In fact, I would suggest changing the Pretty formater to output the unaltered text.

Depending on your answer below, I think it would be more appropriate to extend REXML::Formatters::Pretty.

In fact, I would suggest making the default behavior of Pretty be to preserve text nodes, and have an option to indent with a bunch of whitespace. That makes sense to me.

What does "text nodes" mean?
Could you provide a sample XML and an expected pretty output for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

REXML::Formatters::Transitive not found.
4 participants