Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix XmlStringBuilder caching. #569

Closed

Conversation

bgrozev
Copy link
Contributor

@bgrozev bgrozev commented Oct 25, 2023

XmlStringBuilder.append(Element) calls into the generic append(CharSequence) method which breaks the caching.

As a result we see IQ.toString() taking a very long time (over 8 seconds for some ~60KB IQs from a real system). Here's a test to demonstrate (see post below)

Note that PacketWriter uses XmlStringBuilder.write() and never calls toString() and its performance isn't affected.

XmlStringBuilder.append(Element) calls into the generic
append(CharSequence) method which breaks the caching.
@Flowdalic
Copy link
Member

Thanks. Took me a while to figure out what I assume now the issue is, but I am still not sure I understand.

You want that

public LazyStringBuilder append(LazyStringBuilder lsb) {
list.addAll(lsb.list);
invalidateCache();
return this;
}

is used (at least that is what your change appears to do), right?

But I am not sure how this helps with caching? Could you maybe elaborate on the "breaks the caching" part? I am sure missing something obvious. BTW, the output from the code snipped you posted would be beneficial.

I think the thing you want to show with the snippet is that the run time for l1-l5 always stay identical. If so, then I am not sure how your suggested change helps reducing them, since every LazyToStringBuilder.toString() invocation has to deal with the same number of lazy string elements. That appears to be the case without your change, and with your change.

@bgrozev
Copy link
Contributor Author

bgrozev commented Oct 26, 2023

Sorry, that was a bad example. Here's a better one with the output below:

import org.jivesoftware.smack.util.*;

class Test
{
    static int COUNT_OUTER = 500;
    static int COUNT_INNER = 50;
    public static void main(String[] args) throws Exception
    {
        test1();
        test2();
        test3();
    }
    public static void test1() throws Exception
    {
        System.err.println("Test 1");
        XmlStringBuilder parent = new XmlStringBuilder();
        XmlStringBuilder child = new XmlStringBuilder();
        XmlStringBuilder child2 = new XmlStringBuilder();

        for (int i = 1 ; i < COUNT_OUTER; i++)
        {
            XmlStringBuilder cs = new XmlStringBuilder();
            for (int j = 0; j < COUNT_INNER; j++)
            {
                cs.append("abc");
            }
            child2.append((CharSequence) cs);
        }

        child.append((CharSequence) child2);
        parent.append((CharSequence) child);

        time("test1: parent", () -> "len="+parent.toString().length());
        time("test1: child", () -> "len="+child.toString().length());
        time("test1: child2", () -> "len="+child2.toString().length());
    }
    public static void test2() throws Exception
    {
        System.err.println("Test 2: evaluate children first");
        XmlStringBuilder parent = new XmlStringBuilder();
        XmlStringBuilder child = new XmlStringBuilder();
        XmlStringBuilder child2 = new XmlStringBuilder();

        for (int i = 1 ; i < COUNT_OUTER; i++)
        {
            XmlStringBuilder cs = new XmlStringBuilder();
            for (int j = 0; j < COUNT_INNER; j++)
            {
                cs.append("abc");
            }
            child2.append((CharSequence) cs);
        }

        child.append((CharSequence) child2);
        parent.append((CharSequence) child);

        time("test2: child2", () -> "len="+child2.toString().length());
        time("test2: child", () -> "len="+child.toString().length());
        time("test2: parent", () -> "len="+parent.toString().length());
    }
    public static void test3() throws Exception
    {
        System.err.println("Test 3: use append(XmlStringBuilder)");
        XmlStringBuilder parent = new XmlStringBuilder();
        XmlStringBuilder child = new XmlStringBuilder();
        XmlStringBuilder child2 = new XmlStringBuilder();

        for (int i = 1 ; i < COUNT_OUTER; i++)
        {
            XmlStringBuilder cs = new XmlStringBuilder();
            for (int j = 0; j < COUNT_INNER; j++)
            {
                cs.append("abc");
            }
            child2.append(cs);
        }

        child.append(child2);
        parent.append(child);

        time("test3: parent", () -> "len="+parent.toString().length());
        time("test3: child", () -> "len="+child.toString().length());
        time("test3: child2", () -> "len="+child2.toString().length());
    }



    static void time(String name, Supplier<String> block) {
        long start = System.currentTimeMillis();
        String result = block.get();
        long end = System.currentTimeMillis();

        System.err.println(name + " took " + (end-start) + "ms: " + result);
    }
}

Output:

Test 1
test1: parent took 9122ms: len=74850
test1: child took 4425ms: len=74850
test1: child2 took 6ms: len=74850
Test 2: evaluate children first
test2: child2 took 5ms: len=74850
test2: child took 1ms: len=74850
test2: parent took 0ms: len=74850
Test 3: use append(XmlStringBuilder)
test3: parent took 3ms: len=74850
test3: child took 2ms: len=74850
test3: child2 took 1ms: len=74850

I'm not sure exactly what's going on, but it gets worse with more levels of nesting. I think it has to do with LazyStringBuilder.length() not being cached separate from the string, and being used in charAt, which ends up iterating every time. This is what the profile of "test1" looks like:
Screenshot 2023-10-26 at 15 50 49

@bgrozev
Copy link
Contributor Author

bgrozev commented Nov 15, 2023

Bump. Note that this is not just a contrived example, it's based on a real IQ with nested ExtensionElements that we use. With the packet debugger enabled we were starting to see timeouts.

@Flowdalic
Copy link
Member

I have it on the radar. But you are aware that your PR currently targets the main dev branch and not the stable release branch. If it is important for you that this hits a release early, then you may want to issue a pull request against the current stable branch.

@bgrozev
Copy link
Contributor Author

bgrozev commented Nov 15, 2023

I didn't mean to pressure you. We have worked around it (by using XmlStringBuilder.write() instead of toString()), so we don't need it in a release soon.

@Flowdalic
Copy link
Member

Flowdalic commented Nov 26, 2023

I finally found the time to investigate this. \o/

The main cause of the slow performance was that LazyStringBuilder did not cache the length alone. I fixed this with 70e4830. The times of your test then go down from ~12000ms to ~100ms on my machine. I also added your approach in 6322f4f, which can be optionally enabled (via the FLAT_APPEND option) and brings the times down to ~10ms.

I was hesitant to make FLAT_APPEND the default. I first want to gather some experience with it.

In any case, thanks for reporting this and providing additional data! :)

@Flowdalic Flowdalic closed this Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants