-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select | Performance issue with large file #97
Comments
Thank you for looking into the location of the performance bottleneck. We already have an open issue #87 about this, so I am going to close this one. |
@JLRishe FYI, it isn't related to version 27. It took the same time complete the parsing also in versions 23 and 24. |
@markb-trustifi Thank you for clarifying. I will reopen this issue for now. |
Hi, @markb-trustifi, I've tested your use case with the file you've attached. It didn't take 3 minutes on my machine initially, but it took 30 seconds, which I think is still so much. Now, with the changes I'm proposing in #108, it's taking 1.5 seconds on my machine. You may want to give the modified code a try. |
I am seeing a similar severe performance issue with a large file. My file is relatively simple -- a few nodes deep are a very large number of self-closing child nodes with a few attributes. I'm selecting them quite directly. Both of these queries:
seemingly hang the code. I don't know if it's stuck in an endless busy loop or if it'll eventually exit but as I write this the process has been running 12 minutes on a somewhat fast machine and hasn't finished. Meanwhile, |
I'm also experiencing some serious performance issues. My application used to take 30 seconds to load XML files on start up, and now that the source files have grown, it's taking about 30mins. Almost all of that time is xpath queries. My XML files are relatively flat, and my two largest files are 6MB and 32MB. I could be doing something wrong, but I've been seeing worrying and inconsistent performance in my benchmarks. I tested with the flatter 6MB file, and |
We also have an app that is experiencing performance issues to do with xpath selects using this library. We're dealing with ~10-60 Mb files, with a fairly complex node structure. Over repeated runs, the most complicated files we process (~25 Mb, but many repeated nodes (tens of thousands), so although not the largest file we handle, it takes the longest) our app takes on average ~800 seconds to process a 18 Mb file using the current version of this library. We are currently using a modified version of this library that incorporates the changes that are in the unmerged PR #107 (PR #107 has merge conflicts, but the same change has been redone as PR #120, and that has no conflicts), and using this fork of the library reduces the processing time to between 200-250 seconds. So PR #120 gives a ~75% performance increase in at least one real-world use case. For us, 200-250 seconds is still far too long, and we're hitting timeout issues, so we're considering our options. Are there any updates on whether the unmerged but mergeable performance fix that drops It would not be ideal to start modifying even further one of the forks of this library that already incorporates the performance fix gained by dropping |
@nick-hunter @simon-20 Sorry to hear that you are both experiencing performance issues. I can try to get the unshift change merged and published in the next week or so. One question - what are you using for your XML DOM? If it's @xmldom/xmldom, please note that a change has been made to that package that should offer significant performance benefits when querying it from this package, but it looks like those changes are still in the So if you are using @xmldom/xmldom, I would suggest trying the latest beta version of that package to see if it makes a difference. |
@JLRishe thanks for the info! I am using |
@nick-hunter Thank you for checking on that. I guess I had assumed that the newly added implementation of |
thanks @JLRishe, we are using |
XPath versions: 23, 24, 27.
Selecting from large file (~70000 records) takes about 3 minute. All this time the thread is stuck.
bigxmlfile.xml.zip
The most time is spent in the cycle in this function:
The text was updated successfully, but these errors were encountered: