-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin consumes even more CPU when idle #3
Comments
This is a Linux Here are some screenshots: |
Hey @AnClark thank you for the detailed report. This will take some dedicated time to figure out. I haven't seen this issue before and I can't directly think of what could cause it. It might be related to the framework we use. Have you used any other DPF based plugins before that show a similar load increase when transport is stopped? |
Hmm, so from your perf inspection it seems that the biquad filters take a lot of time on your machine I'm quickly trying this with the v1.0 release in REAPER on my AMD Ryzen 5 (quite a bit more performant than your ancient C2D). And I don't see any such discrepancies: Occasionally I see a tiny "jump", when stopping, to 0.03% but it quickly goes down to 0.02% again. I am considering to enable SSE4.1 for all plugins this year, which should give a near 4x performance increase. This instruction set is supported for C2D at least. Maybe we can do some preliminary tests to see if this improves this a bit for you. |
Btw it seems your perf.data file is incompatible with my system, so I cannot read the output myself. I'm guessing those visual stats are also an extra feature of that version, which I don't seem to have. |
Here's another way you can reproduce the issue:
WSTD EQ still consumes more CPU when I stopped playing. It's strange that during processing, biquad filter works perfectly. Only if I stopped transport, the filter begins to consume CPU. Is it possible that any inappropriate samples were being processed by DSP, which made it misbehave? |
Yes. I'm porting some LV2 plugins to DPF. Both of the following plugins used to have similar issue: Both of them have a Moog-style filter. If I stopped transport, the filter will increase the CPU load tremendously. Currently I didn't figure out why it happens, so I just made a workaround: bypass filters if oscillators does not send samples to them. |
I tried following these instructions. I have a midi section, |
I’ve checked Seems that newer CPU like your Ryzen 5 enabled solution(s) optimized with AVX or SSE 4.1, while my ancient C2D only supports SSE and SSE2, so it fallbacks to this simple solution: const float y = bIn*bX0 + o->xm1*bX1 + o->xm2*bX2 - o->ym1*bY1 - o->ym2*bY2;
o->xm2 = o->xm1; o->xm1 = bIn;
o->ym2 = o->ym1; o->ym1 = y;
*bOut = y; However it's still strange: this solution performs quite well on transport, but CPU load increases when transport stops. Full `__hv_biquad_f()` code in WSTD EQ#if _WIN32 && !_WIN64
void __hv_biquad_f_win32(SignalBiquad *o, hv_bInf_t *_bIn, hv_bInf_t *_bX0, hv_bInf_t *_bX1, hv_bInf_t *_bX2, hv_bInf_t *_bY1, hv_bInf_t *_bY2, hv_bOutf_t bOut) {
hv_bInf_t bIn = *_bIn;
hv_bInf_t bX0 = *_bX0;
hv_bInf_t bX1 = *_bX1;
hv_bInf_t bX2 = *_bX2;
hv_bInf_t bY1 = *_bY1;
hv_bInf_t bY2 = *_bY2;
#else
void __hv_biquad_f(SignalBiquad *o, hv_bInf_t bIn, hv_bInf_t bX0, hv_bInf_t bX1, hv_bInf_t bX2, hv_bInf_t bY1, hv_bInf_t bY2, hv_bOutf_t bOut) {
#endif
#if HV_SIMD_AVX
__m256 x = _mm256_permute_ps(bIn, _MM_SHUFFLE(2,1,0,3)); // [3 0 1 2 7 4 5 6]
__m256 y = _mm256_permute_ps(o->x, _MM_SHUFFLE(2,1,0,3)); // [d a b c h e f g]
__m256 n = _mm256_permute2f128_ps(y,x,0x21); // [h e f g 3 0 1 2]
__m256 xm1 = _mm256_blend_ps(x, n, 0x11); // [h 0 1 2 3 4 5 6]
x = _mm256_permute_ps(bIn, _MM_SHUFFLE(1,0,3,2)); // [2 3 0 1 6 7 4 5]
y = _mm256_permute_ps(o->x, _MM_SHUFFLE(1,0,3,2)); // [c d a b g h e f]
n = _mm256_permute2f128_ps(y,x,0x21); // [g h e f 2 3 0 1]
__m256 xm2 = _mm256_blend_ps(x, n, 0x33); // [g h 0 1 2 3 4 5]
__m256 a = _mm256_mul_ps(bIn, bX0);
__m256 b = _mm256_mul_ps(xm1, bX1);
__m256 c = _mm256_mul_ps(xm2, bX2);
__m256 d = _mm256_add_ps(a, b);
__m256 e = _mm256_add_ps(c, d); // bIn*bX0 + o->x1*bX1 + o->x2*bX2
float y0 = e[0] - o->ym1*bY1[0] - o->ym2*bY2[0];
float y1 = e[1] - y0*bY1[1] - o->ym1*bY2[1];
float y2 = e[2] - y1*bY1[2] - y0*bY2[2];
float y3 = e[3] - y2*bY1[3] - y1*bY2[3];
float y4 = e[4] - y3*bY1[4] - y2*bY2[4];
float y5 = e[5] - y4*bY1[5] - y3*bY2[5];
float y6 = e[6] - y5*bY1[6] - y4*bY2[6];
float y7 = e[7] - y6*bY1[7] - y5*bY2[7];
o->x = bIn;
o->ym1 = y7;
o->ym2 = y6;
*bOut = _mm256_set_ps(y7, y6, y5, y4, y3, y2, y1, y0);
#elif HV_SIMD_SSE
__m128 n = _mm_blend_ps(o->x, bIn, 0x7); // [a b c d] [e f g h] = [e f g d]
__m128 xm1 = _mm_shuffle_ps(n, n, _MM_SHUFFLE(2,1,0,3)); // [d e f g]
__m128 xm2 = _mm_shuffle_ps(o->x, bIn, _MM_SHUFFLE(1,0,3,2)); // [c d e f]
__m128 a = _mm_mul_ps(bIn, bX0);
__m128 b = _mm_mul_ps(xm1, bX1);
__m128 c = _mm_mul_ps(xm2, bX2);
__m128 d = _mm_add_ps(a, b);
__m128 e = _mm_add_ps(c, d);
const float *const bbe = (float *) &e;
const float *const bbY1 = (float *) &bY1;
const float *const bbY2 = (float *) &bY2;
float y0 = bbe[0] - o->ym1*bbY1[0] - o->ym2*bbY2[0];
float y1 = bbe[1] - y0*bbY1[1] - o->ym1*bbY2[1];
float y2 = bbe[2] - y1*bbY1[2] - y0*bbY2[2];
float y3 = bbe[3] - y2*bbY1[3] - y1*bbY2[3];
o->x = bIn;
o->ym1 = y3;
o->ym2 = y2;
*bOut = _mm_set_ps(y3, y2, y1, y0);
#elif HV_SIMD_NEON
float32x4_t xm1 = vextq_f32(o->x, bIn, 3);
float32x4_t xm2 = vextq_f32(o->x, bIn, 2);
float32x4_t a = vmulq_f32(bIn, bX0);
float32x4_t b = vmulq_f32(xm1, bX1);
float32x4_t c = vmulq_f32(xm2, bX2);
float32x4_t d = vaddq_f32(a, b);
float32x4_t e = vaddq_f32(c, d);
float y0 = e[0] - o->ym1*bY1[0] - o->ym2*bY2[0];
float y1 = e[1] - y0*bY1[1] - o->ym1*bY2[1];
float y2 = e[2] - y1*bY1[2] - y0*bY2[2];
float y3 = e[3] - y2*bY1[3] - y1*bY2[3];
o->x = bIn;
o->ym1 = y3;
o->ym2 = y2;
*bOut = (float32x4_t) {y0, y1, y2, y3};
#else
const float y = bIn*bX0 + o->xm1*bX1 + o->xm2*bX2 - o->ym1*bY1 - o->ym2*bY2;
o->xm2 = o->xm1; o->xm1 = bIn;
o->ym2 = o->ym1; o->ym1 = y;
*bOut = y;
#endif
} |
As I said we do not build with SIMD optimizations yet (only on ARM). Your CPU should support SSE4.1 which I might enable later this year. You could try this optimization by adding |
I have a newer ThinkPad X201 Tablet. It has a Core 1st Gen processor (Core i7 L 640). I enabled Sounds like we have something to do with the algorithm. |
For reference, here's a Moog-style filter from RaffoSynth, which has the same problem as I described: //hace lo mismo que la versión en asm
void equalizer(float* buffer, float* prev_vals, uint32_t sample_count, float psuma0, float psuma2, float psuma3, float ssuma0, float ssuma1, float ssuma2, float ssuma3, float factorSuma2){
float psuma1 = psuma0 *2;
for (int i = 0; i < sample_count; i++) {
//low-pass filter
float temp = buffer[i];
buffer[i] *= psuma0; //psuma0 == factorsuma1
buffer[i] += psuma0 * prev_vals[0] + psuma1 * prev_vals[1]
+ psuma2 * prev_vals[2] + psuma3* prev_vals[3];
prev_vals[0] = prev_vals[1];
prev_vals[1] = temp;
// peaking EQ (resonance)
float temp2 = buffer[i];
buffer[i] *= factorSuma2;
buffer[i] += ssuma0 * prev_vals[2] + ssuma1 * prev_vals[3]
+ ssuma2 * prev_vals[4] + ssuma3 * prev_vals[5];
prev_vals[2] = prev_vals[3];
prev_vals[3] = temp;
prev_vals[4] = prev_vals[5];
prev_vals[5] = buffer[i];
}
} |
I got a hint from FalkTX on what could be going on. To the top of #include "extra/ScopedDenormalDisable.hpp" And in the run function set the following: const ScopedDenormalDisable sdd;
const TimePosition& timePos(getTimePosition()); Rebuild and try again. |
@dromer OK. I'll try tonight (BJT), and give you report. |
@AnClark you can try this build when it finishes: https://github.com/Wasted-Audio/wstd-eq/actions/runs/7431093334 |
Great! By adding those lines, and build with |
I've also tested your build. Your build has better performance than mine. CPU usage is not beyond 0.5% on idle. So disabling denormal numbers really works. |
Cool! thank you for confirming. I guess on older systems as yours this really makes a difference. Now comes the question on how to best apply this, as setting this option can potentially break things as well .. |
My pleasure! It would be better if there were any document for Also you can do more tests on other platforms, including Apple Silicon. All of my machines are not newer than Core i5 5th-Gen. |
I do not own any Windows or MacOS machines, so doing "proper" testing on those is not possible. |
Btw the only documentation for this class is in the code: https://github.com/DISTRHO/DPF/blob/main/distrho/extra/ScopedDenormalDisable.hpp |
I've found a solution: add a new entry in HVCC JSON metadata (e.g. What's more, we can also provide 2 builds of WSTD EQ since next release. One applys this fix, and the other one keeps as-is. |
I don't see any reason to provide two completely separate builds of the same plugin, that doesn't make any sense. Having it as a configurable option in the json is a nice idea, so it won't be put there automatically for all DPF builds. |
Maybe I can help test on Windows (as well as Wine). I have a Hewlett-Packard Pavillion with Windows 11 and Msys2 installed (though it uses i7-5500U). What's more, if WSTD and HVCC had unit test (or benchmark test) it would also help a lot. |
HVCC does have some testing in place (although not everything works), but that's a discussion for a different project :) |
So how could we do tests? Maybe we can make a roadmap for testing plugins (maybe not limited to WSTD EQ). For example, specify test cases and target DAWs. |
Hi Wasted Audio Team,
I've encountered a strange issue when using WSTD EQ on REAPER for Linux. If the plugin is processing audio, CPU usage is below 1.0% on average. However, when I click "Stop" on REAPER, CPU usage will terribly increase to 7.0%.
See the following screenshots:
My system environment:
The text was updated successfully, but these errors were encountered: