You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Win32NT 10.0.22000.0 Microsoft Windows NT 10.0.22000.0
Processor Architecture
AMD64
Memory
64 GB
Storage Type, free / capacity
SSD 80/512 GB
Relevant apps installed
Traces collected via Feedback Hub
We collected profile traces with both Visual Studio and VTune, we can provide these via a private channel, if needed.
Isssue description
Recently, my current employer started measuring the multithreaded performance of a commercial application. We were interested in both raw performance numbers and scalability in terms of CPU core count. We were surprised to see that some operations scale terribly: the durations actually increase with the core count (contrary to the usual case). Profiling revealed that in most of the problematic cases the biggest bottleneck was the NT heap, due to its scalability problems. We measured with other heaps as well (Intel's TBB, and the Segment Heap to name a few), and none of them suffered from the same phenomenon.
Here's a chart plotting some of our measurements, lower is better (Y-axis: the time it takes to perform a certain operation in seconds X-axis: number of CPUs):
Here's a second chart that compares the NT heap and Segment Heap, a value below 100% means that the Segment Heap performed better (Y-axis: Segment heap/NT heap relative time as percentage X-axis: number of CPUs):
I'm aware that this is a bit too vague, I can provide the whole dataset through a private channel if required.
We opened a PSfD support case (can provide the case number, if needed), as we believed that we might be hitting some pathological path in the NT heap implementation that should be fixed on Microsoft's side. We were basically told, that:
the "classic" NT heap is a legacy heap, and won't be further improved
we should migrate to the Segment Heap, as "that's the future"
That's all well and good, we wouldn't mind switching to the Segment Heap, per se. However, there are many cases where the Segment Heap has worse performance than the "classic" one. I've included every data point of every measurement we did on the chart below. Relative performance, a value above 100% means that the Segment Heap performed worse:
Trading in some performance (about 10% on average) in many cases for scalability in others does not seem like a very good deal.
Is this expected? We would prefer to stay on a heap that's part of the operating system (either the "classic" or Segment heap), but these are the kind of trade-offs that make it not worth it.
Steps to reproduce
No easy repro (the phenomenon in question was reproduced in a commercial application that requires a license and some setup/installation steps).
Expected Behavior
The NT heap scales at an acceptable level, or the Segment Heap performs at least as good as the "classic" NT heap in every case.
Actual Behavior
The NT heap scales horrendously in some cases. The segment heap scales well but has worse performance in many cases.
The text was updated successfully, but these errors were encountered:
Hey! Thanks for reporting and giving such detailed descriptions of the issue🙂. I'm working on routing this issue to the right team and will report back soon.
Windows Build Number
Win32NT 10.0.22000.0 Microsoft Windows NT 10.0.22000.0
Processor Architecture
AMD64
Memory
64 GB
Storage Type, free / capacity
SSD 80/512 GB
Relevant apps installed
Traces collected via Feedback Hub
We collected profile traces with both Visual Studio and VTune, we can provide these via a private channel, if needed.
Isssue description
Recently, my current employer started measuring the multithreaded performance of a commercial application. We were interested in both raw performance numbers and scalability in terms of CPU core count. We were surprised to see that some operations scale terribly: the durations actually increase with the core count (contrary to the usual case). Profiling revealed that in most of the problematic cases the biggest bottleneck was the NT heap, due to its scalability problems. We measured with other heaps as well (Intel's TBB, and the Segment Heap to name a few), and none of them suffered from the same phenomenon.
Here's a chart plotting some of our measurements, lower is better (Y-axis: the time it takes to perform a certain operation in seconds X-axis: number of CPUs):
Here's a second chart that compares the NT heap and Segment Heap, a value below 100% means that the Segment Heap performed better (Y-axis: Segment heap/NT heap relative time as percentage X-axis: number of CPUs):
I'm aware that this is a bit too vague, I can provide the whole dataset through a private channel if required.
We opened a PSfD support case (can provide the case number, if needed), as we believed that we might be hitting some pathological path in the NT heap implementation that should be fixed on Microsoft's side. We were basically told, that:
That's all well and good, we wouldn't mind switching to the Segment Heap, per se. However, there are many cases where the Segment Heap has worse performance than the "classic" one. I've included every data point of every measurement we did on the chart below. Relative performance, a value above 100% means that the Segment Heap performed worse:
Trading in some performance (about 10% on average) in many cases for scalability in others does not seem like a very good deal.
Is this expected? We would prefer to stay on a heap that's part of the operating system (either the "classic" or Segment heap), but these are the kind of trade-offs that make it not worth it.
Steps to reproduce
No easy repro (the phenomenon in question was reproduced in a commercial application that requires a license and some setup/installation steps).
Expected Behavior
The NT heap scales at an acceptable level, or the Segment Heap performs at least as good as the "classic" NT heap in every case.
Actual Behavior
The NT heap scales horrendously in some cases. The segment heap scales well but has worse performance in many cases.
The text was updated successfully, but these errors were encountered: