Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluid.transientslice~: audio thread crashes at higher order #270

Open
weefuzzy opened this issue Jul 3, 2024 · 13 comments · May be fixed by #290
Open

fluid.transientslice~: audio thread crashes at higher order #270

weefuzzy opened this issue Jul 3, 2024 · 13 comments · May be fixed by #290
Labels
bug Something isn't working

Comments

@weefuzzy
Copy link
Member

weefuzzy commented Jul 3, 2024

Please tell us what you were doing! You can include code and files by drag and dropping them into the text area.

report via a user: sporadic crashes with transient slice, stack trace indicated that these were in the detection function.

Can reproduce with @order 196 simply by turning audio on in the following patch.


----------begin_max5_patcher----------
441.3ocsT1rSDCBDG+b6SAgil5lB8y0S9dXLFjhJltPCP0UMtO6VFZ0UypV0
5EZloCy+4Gy.OEGguTuUXwnSPmghhdJNJBb4cDMZGg2v1xaYVHLr3wFFeGNI
7KU+FopU3f+QGc1wb7ajpquvH3tPtyKnqRSPzpZ+m7hoUz4ukHcuaJSoidkM
fl5Ku83Zr20yww9kjYVrJw8C6cpXchsP4f800NT5phChQ1miAcc8phDDoL2C
PQIvD8SwfL5M3x8PmHjFrUdsh0heca6wY0Bx4Q+39DsFZPTRncs1S6BCX4uA
PKW2I90zrFngjk95m4O2ksf8C9C7VwNDIM8GSRUXvKqxiP4+wfGcAA8p1dYy
JmgorRgxYak7AtOUaZDFDYc4Awm78Mxb.eR3Ujufe52veBBevy.x9mAP.3Vo
5iOPBEn2+6OXr5dCeRnP1RPuUgMBqSpXNoVsWLYgXN3Q+b0obF5PV.cplgN9
Zg7G0gNScdOOg1Eqq6NgwNFLHwvD6sZi2rNALkpfYAXZD2ImhGlavLyv7maX
3q2.kEdaYNNrU8v.rpWBiqwd3FjDtMnXaD1NVfC3RS7ywuvVYaKm
-----------end_max5_patcher-----------

Need to narrow down why. Possibilities include, in order of likelihood:

  • buffer overrun because we've messed up the allocation needed to account for blocksize vs order vs padding
  • screw up in the autocorrelation fn
  • badness due to Eigen's solver deciding to allocate behind the scenes

What was the expected result?

Not crashing

What was the actual result?

crashing

What operating system were you using?

Mac

Operating system version

13.5.2

FluCoMa Version

1.0.6

@weefuzzy weefuzzy added the bug Something isn't working label Jul 3, 2024
@weefuzzy
Copy link
Member Author

weefuzzy commented Jul 3, 2024

Pretty sure it's a blocksize vs padding vs order issue, specifically that object bombs if order > blocksize - padding

@tremblap
Copy link
Member

I'll investigate. For info, this crashes too:

{FluidTransientSlice.ar(SinOsc.ar(100,mul: LFPulse.ar(0.5)), order: 196)}.play

@tremblap
Copy link
Member

ok it is the resize of the mFFT in the autocorrelation (ARMmodel line 308) that throws an assert. Line 42 the FFT mMaxSize is allocated twice the blocksize - but I never see when we call it!

@tremblap
Copy link
Member

tremblap commented Dec 16, 2024

also, I don't know if that is related, but TransientSlices do not give the same results between platforms... and crashes with the UT in Linux with simple settings that do not crash in Windows and MacOS... this code crashes:

b = Buffer.read(s, FluidFilesPath("Nicol-LoopE-M.wav"))
c = Buffer(s)
FluidBufTransientSlice.process(s,b,indices: c)

@tremblap
Copy link
Member

ok we get the same assert with:

{FluidTransients.ar(SinOsc.ar(100,mul: LFPulse.ar(0.5)), order: 196)}.play

@AlexHarker
Copy link
Contributor

Please be specific about the assert that fires, permalinking to the relevant line of code.

@AlexHarker
Copy link
Contributor

I would suggest as a first step that ARModel should include an appropriate assert to check that the estimate methods are not called with an input that exceeds the largest expected size (which is currently not stored directly, so would either need to be stored or derived). This would help to isolate issues in code calling the ARModel from those in the model itself. If it turns out that ARModel is being called with incorrectly-sized input the task will then be to figure out why that is happening.

@tremblap
Copy link
Member

or you could just run the test code posted in debug 😄 in all cases, the assert thrown is when resizing the FFT as described above:

ok it is the resize of the mFFT in the autocorrelation (ARMmodel line 308) that throws an assert. Line 42 the FFT mMaxSize is allocated twice the blocksize - but I never see when we call it!

which calls resize on the mFFT with in.size (which is larger than what it has been allocated - the assert itself is line 86 of FFT.hpp but that is not very helpful!) I assume the algo was right (and took a long time to find the actual value calculation) and corrected it so we have enough ram.

@AlexHarker
Copy link
Contributor

The assert() on the estimate should be added as it guards against other cases of misuse (there are a lot of asserts for other uses of ARModel), and then we trace back from there (or if it doesn't fire for this crash then we know that the ARModel is the likely the place to look). The assert in the FFT is far from "not very helpful" as it is the point of failure. That's where we would trace from. I'm not set up to easily test so my starting point for a review will always be reading the code. At the moment my feeling is that the issue is not yet fully understood and that should be the aim before merging any fix (regardless of whether the fix stops a crash).

Merging a change that stops a crash (assert or otherwise), but isn't fully understood risks making it harder to implement the correct fix in future, and my current reading of the code is that the assumptions of the transient extractor code in terms of sizing appear to be consistent.

@weefuzzy
Copy link
Member Author

weefuzzy commented Dec 18, 2024

This confusion is down to me, plus ça change. When I retrofitted the runtime allocation to all the algorithms, I don't think I paid the complexities of the ARModel and Transient<X> classes quite enough heed, and I certainly didn't do due diligence on the clients.

So, I think what's biting us is that the slight shift in idiom towards all algorithms being instantiated with inviolable maximum storage requirements isn't yet observed by the respective clients, but also I should have instrumented ARModel and TransientExtraction with more consistency asserts where it still calls resize on sub-objects.

The concrete problem in this issue, then, is actually that the clients neither actually pass in proper maximal values nor consider the invariants of TransientExtraction when the objects are created (namely that the blockSize and padSize must be >= model order). For this particular case: the actual padding size being used ends up being 196 (the model order), because that's bigger than the default of 128. Hence kaboom.

So, no change to ARModel is needed: if kosher numbers are passed in the first place, then all is well. I'll do a PR to fix up the clients, but also enforce the invariants in TransientExtraction.

@AlexHarker
Copy link
Contributor

Thanks @weefuzzy - I think the assert I suggest above would be useful for anyone using ARmodel directly, and similar asserts should maybe be used for the issues you state. From what you've put here I am guessing that the fix PA proposes happens to fix it in certain scenarios, but isn't actually the correct thing to do? In that case the merge shouldn't be made in my opinion.

If I can help by testing further I will as I have just set up to compile the flucoma-sc repo if needed, but if all is clear I'm happy to await better fixes.

@weefuzzy
Copy link
Member Author

Yeah, an assert there will be useful too. And, yes, we don't need the change in #288.

I'll put something together once I've un-broken my own build system, which seems to have gone bonkers.

@weefuzzy
Copy link
Member Author

Lol. I did just realise that I had dropped the ball in ARModel in quite a spectacular way though because the code largely assumes that the size of the mParameters buffer is always the same as the current model order, which is no longer the case 🤦

Because I'd also been lazy with updating the client, this probably wouldn't yet have been experienced in the wild unless someone was adjusting the model order a lot on a single instance.

@weefuzzy weefuzzy linked a pull request Dec 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants