Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MoveChild doesn't set parent of childs correctly #574

Open
dartrax opened this issue Dec 3, 2024 · 3 comments
Open

MoveChild doesn't set parent of childs correctly #574

dartrax opened this issue Dec 3, 2024 · 3 comments
Assignees

Comments

@dartrax
Copy link

dartrax commented Dec 3, 2024

Let's say I have an HtmlDocument that has some child nodes in DocumentNode like <div></div><div></div><div></div>. I want to encapsulate the existing childs into a new <html></html> Node:

private void InsertHtmlRootElement(HtmlDocument doc) {
    HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
    if (HtmlTag is null) {
        // Create new <html></html> Node
        HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };
        // Move all childs from root into the new HTML Node
        HtmlTag.MoveChildren(doc.DocumentNode.ChildNodes);
        // Append new HTML Node into Document
        doc.DocumentNode.AppendChild(HtmlTag);
        // HtmlTag.ChildNodes.ForEach(c => c.SetParent(HtmlTag));
    }
}

This works but all children of HtmlTag have their ParentNode set to null. I guessed that I first need to add the HtmlTag to the document and then insert the childs to fix it:

private void InsertHtmlRootElement(HtmlDocument doc) {
    HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
    if (HtmlTag is null) {
        // Append new <html></html> Node
        HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };
        doc.DocumentNode.AppendChild(HtmlTag);
        // Move all childs from root into the new HTML Node
        doc.DocumentNode.ChildNodes.Except([HtmlTag]).ToArray().ForEach( c => {
            HtmlTag.MoveChild(c);
            // c.SetParent(HtmlTag);
        });
    }
}

As it turns out, the children still have no parent. To work around that, I have to call c.SetParent(HtmlTag); explicitly.

@JonathanMagnan JonathanMagnan self-assigned this Dec 3, 2024
@JonathanMagnan
Copy link
Member

JonathanMagnan commented Dec 11, 2024

Hello @dartrax ,

The method MoveChilden seems to have been created to move a node from one document to another document, not within the same document.

The MoveChildren method calls the RemoveChildren method: https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlNode.cs#L1770. This causes an issue because the child is fully removed from the document after being correctly added with a parent.

I don't know if that's a bug or if this works as intended by the original author (I would have expected the same behavior as you). What I would recommend you is probably this code:

public void InsertHtmlRootElement(HtmlDocument doc)
{
	HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
	if (HtmlTag is null)
	{
		// Create new <html></html> Node
		HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };

		var childNodes = doc.DocumentNode.ChildNodes.ToList();
		childNodes.ForEach(x => doc.DocumentNode.RemoveChild(x)); 

		// Append new HTML Node into Document
		doc.DocumentNode.AppendChild(HtmlTag);

		// Move all childs from root into the new HTML Node
		childNodes.ForEach(x => HtmlTag.AppendChild(x));
	}
}

All childs get removed from the document and after added to the HTML node.

Best Regards,

Jon

@dartrax
Copy link
Author

dartrax commented Dec 21, 2024

Hi Jon,
thanks for your reply. Your recommended code works, thank you!

I would suggest to add the MoveChildren method to the docs, because currently I couldn't find any documentation for it. Then the intended behavior could be documented.

Ideally, of course this use case should be added to make it work in both cases ;-)

@JonathanMagnan
Copy link
Member

Thank you for your suggestion.

Both methods have been added with a note about the "different document": https://html-agility-pack.net/traversing

Unfortunately, .NET Fiddle is currently down so I was not able to add the example

Best Regards,

Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants