Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

rconstanzo · 2023-04-22T13:53:38Z

As mentioned on the discourse thread got an (unrepeated) crash when trying to fit some data with fluid.mlpclassifier.

Attaching the isolated patch bit in question, along with the data/labels I was using at the time. Also the crash report.

This is, I believe, the crash-y bit:

12  fluid.libmanipulation         	       0x1320628dc Eigen::DenseStorage<double, -1, -1, -1, 1>::resize(long, long, long) + 80
13  fluid.libmanipulation         	       0x1322763d8 Eigen::internal::product_evaluator<Eigen::Product<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> const>, Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0>, 8, Eigen::DenseShape, Eigen::DenseShape, double, double>::product_evaluator(Eigen::Product<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> const>, Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0> const&) + 108
14  fluid.libmanipulation         	       0x132276048 void Eigen::internal::call_dense_assignment_loop<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Transpose<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Product<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> const>, Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0> const, Eigen::Replicate<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 1, -1> const> >, Eigen::internal::assign_op<double, double> >(Eigen::Matrix<double, -1, -1, 0, -1, -1>&, Eigen::Transpose<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Product<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> const>, Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0> const, Eigen::Replicate<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 1, -1> const> > const&, Eigen::internal::assign_op<double, double> const&) + 40
15  fluid.libmanipulation         	       0x132275b34 fluid::algorithm::NNLayer::forward(Eigen::Ref<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::Ref<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >) const + 136
16  fluid.libmanipulation         	       0x132275880 fluid::algorithm::MLP::forward(Eigen::Ref<Eigen::Array<double, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::Ref<Eigen::Array<double, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, long, long) const + 344
17  fluid.libmanipulation         	       0x1322743bc fluid::algorithm::SGD::train(fluid::algorithm::MLP&, fluid::FluidTensorView<double, 2ul>, fluid::FluidTensorView<double, 2ul>, long, long, double, double, double) + 2060
18  fluid.libmanipulation         	       0x13229ec8c fluid::client::mlpclassifier::MLPClassifierClient::fit(fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>) + 1748
19  fluid.libmanipulation         	       0x1322b29f8 auto fluid::client::makeMessage<fluid::client::MessageResult<double>, fluid::client::mlpclassifier::MLPClassifierClient, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const> >(char const*, fluid::client::MessageResult<double> (fluid::client::mlpclassifier::MLPClassifierClient::*)(fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>))::'lambda'(fluid::client::mlpclassifier::MLPClassifierClient&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>)::operator()('lambda'(fluid::client::mlpclassifier::MLPClassifierClient&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>), fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>) const + 96
20  fluid.libmanipulation         	       0x1322b27e0 fluid::client::Message<auto fluid::client::makeMessage<fluid::client::MessageResult<double>, fluid::client::mlpclassifier::MLPClassifierClient, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const> >(char const*, fluid::client::MessageResult<double> (fluid::client::mlpclassifier::MLPClassifierClient::*)(fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>))::'lambda'(fluid::client::mlpclassifier::MLPClassifierClient&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>), fluid::client::MessageResult<double>, fluid::client::mlpclassifier::MLPClassifierClient, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const> >::operator()(auto fluid::client::makeMessage<fluid::client::MessageResult<double>, fluid::client::mlpclassifier::MLPClassifierClient, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const> >(char const*, fluid::client::MessageResult<double> (fluid::client::mlpclassifier::MLPClassifierClient::*)(fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>))::'lambda'(fluid::client::mlpclassifier::MLPClassifierClient&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>), fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>) const + 80
21  fluid.libmanipulation         	       0x1322b255c _ZNK5fluid6client10MessageSetINSt3__15tupleIJNS0_7MessageIZNS0_11makeMessageINS0_13MessageResultIdEENS0_13mlpclassifier19MLPClassifierClientEJNS0_15SharedClientRefIKNS0_7dataset13DataSetClientEEENSA_IKNS0_8labelset14LabelSetClientEEEEEEDaPKcMT0_FT_DpT1_EEUlRS9_SE_SI_E_S7_S9_JSE_SI_EEENS4_IZNS5_INS6_IvEES9_JSE_NSA_ISG_EEEEESJ_SL_SR_EUlSS_SE_SW_E_SV_S9_JSE_SW_EEENS4_IZNS5_INS6_INS2_12basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEEES9_JNS2_10shared_ptrIKNS0_13BufferAdaptorEEEEEESJ_SL_SR_EUlSS_S19_E_S15_S9_JS19_EEENS4_IZNS5_ISV_NS0_10DataClientINS8_17MLPClassifierDataEEEJEEESJ_SL_SR_EUlRS1E_E_SV_S1E_JEEENS4_IZNS0_11makeMessageINS6_IlEES1E_JEEESJ_SL_MSM_KFSN_SP_EEUlS1F_E_S1J_S1E_JEEES1N_NS4_IZNS5_INS6_INS3_IJNSZ_IcS11_N9foonathan6memory13std_allocatorIcNS_17FallbackAllocatorEEEEENS_11FluidTensorIlLm1EEEllddldEEEEES9_JS14_EEESJ_SL_SR_EUlSS_S14_E_S1X_S9_JS14_EEENS4_IZNS5_IS15_S1E_JEEESJ_SL_SR_EUlS1F_E_S15_S1E_JEEENS4_IZNS5_ISV_S1E_JS14_EEESJ_SL_SR_EUlS1F_S14_E_SV_S1E_JS14_EEES1Z_EEEE6invokeILm0EJRNS0_24NRTSharedInstanceAdaptorIS9_E12SharedClientERSE_RSI_EEEDcDpOT0_ + 144
22  fluid.libmanipulation         	       0x1322b1ef8 decltype(auto) fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> >::invoke<0ul, fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> >, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>&, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>&>(fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> >&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>&, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>&) + 360
23  fluid.libmanipulation         	       0x1322b17a0 void fluid::client::FluidMaxWrapper<fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> > >::invokeMessageImpl<0ul, 0ul, 1ul>(fluid::client::FluidMaxWrapper<fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> > >*, symbol*, long, atom*, std::__1::integer_sequence<unsigned long, 0ul, 1ul>) + 172

crashbits.zip

The text was updated successfully, but these errors were encountered:

rconstanzo · 2023-04-22T14:13:26Z

Got a second crash (also with a really long @hiddenlayers network.

Also, narrowed down when it happens. It seems like I got the crash when I pause the training (i.e. toggle off the toggle in the loop) then toggle it back on. I got the instacrash when toggling it back on.

crash2.zip

tremblap · 2023-04-23T12:14:22Z

few observations:

it is a huge network. have you pca'd the datasets first to reduce the number of dimensions? I'm saying this because I get spinning wheel of death here when I start the patch, but no crash.
the first crash seems chromium related
the second crash is mem-alloc related but maybe not flucoma...

on my machine, there is no memory leak after running it for 15 minutes without a crash - I thought of checking this since both crashes are linked to memory allocation... and I start and stop it, to no avail.

so that brings us to how we can help you help us help you: are you set up for compilation? if so you could gain 2 things:

when you are in dev mode, you could use objects that are a little more explicit when they crash (line numbers in code) so that helps us volunteer coding people know where to look for problems in our ginormous code base
when you are in gig mode, you could have super optimised versions of the objects, tailored for your actual hardware.

this comes at the expense of having to compile, but also maybe being confused by which version you are actually using. I have scripts to swap them in the OS, but that might not be exciting for you. in all cases, I'm happy to help.

anyway, as I am unable to reproduce, we are stalled. let us know if you find something more reproducible.

tremblap · 2023-04-23T13:02:45Z

now running for 45 minutes in 'test' compile mode, starting and stopping and resetting - still no crash.

rconstanzo · 2023-04-24T08:38:17Z

I'll see if I can get it to crash again.

Not saying the network is useful, I was just testing different structures on to see what type/direction/style was better (maybe changing structures often, via the attrui is a component of this?).

It could just be coincidence, but happening twice with the same object/process seems unlikely.

tremblap · 2023-04-24T11:55:39Z

If you don't mind, try this version of the object (keep the other one you have for real-life use) so if it crashes we'll know better if it is fluid.verse-related and where from...
fluid.libmanipulation.mxo.zip

tremblap · 2023-05-04T07:28:25Z

@rconstanzo any more crash with my magic custom compile?

rconstanzo · 2023-05-05T21:56:32Z

Was in the UK teaching, will give it a test now. Not gotten any new crashes since though (but haven't been testing super long network structures since.

tremblap · 2024-10-09T12:06:19Z

any luck on this?

rconstanzo · 2024-10-09T12:46:47Z

Will test the patch above a bit more, but I'm not actively using that large structure in any patches (partially being scared from this issue).

rconstanzo · 2024-10-09T19:58:52Z

Turns out I hadn't tested the new version you posted.

Running the above patch again (with delay 500 on my laptop, since it's much slower than my desktop) and got crashes both with the old and new versions of libmanipulation. And both pretty quickly too (as in, let it do about 5 ticks of crunching, toggling off, then toggling on to an instacrash.

Here are both crash reports (both are the same kind of thing as above):
crash with default.zip

crash with new.zip

I will add, perhaps unhelpfully?, that the patch does not crash in a beta version of Max...

tremblap · 2024-10-10T08:51:12Z

ok reading the log it might be a memory thing again only in Max. I'll recode in debug SC and see if I can crash at all, stay tuned

tremblap · 2024-10-10T09:05:39Z

ok I'm running it in Max first, and I cannot crash... even with the default!

But, as I look at your code, you know that you are running in the high priority thread, right? "but I put a deferlow" you will say... but not at the right place! [delay 60] promotes the post-process bang back to scheduler!

To deal with Max's (awful) threading promotion and demotion, I usually am very disciplined now (thanks to @weefuzzy ) and put the defer right after the potential promotion objects (delay) in this instance.

Will run in SC in case I hopefully manage to crash it - but I think it might still be a max memory threading thing that we are discussing in another bug (and many others actually) with @AlexHarker, only happening in Max.

tremblap · 2024-10-10T09:09:23Z

also, a few NN hints: momentum that low is not helpful, it is a huge network, and reset is not resetting the network, clear is (as per manual)

rconstanzo · 2024-10-10T09:10:53Z

My thinking with the defer is that I wanted to wait until the process was "done" before moving on, with little interest in what thread was passed forward elsewhere (i.e. my intention was not to start the process in low priority, but end it)

Also, these specific settings aren't really working well (for a variety of reasons), but it shouldn't crash in any regard.

I have managed to get the patch to crash on both of my computers go (intel and arm).

tremblap · 2024-10-10T09:13:09Z

yet it doesn't crash here - let's see if SC crashes

tremblap · 2024-10-10T09:14:22Z

[mxj WhichThread] is your friend if you are in doubt. Always try to send long jobs in the low priority thread.

tremblap · 2024-10-10T09:28:16Z

ok I'm running it in Max (no crash so far) and SC (no crash so far) at the same time - my i9 is in full leafblower mode 🤣
I'll leave it for an hour, clearing the network once in a while.

sc code:

x = FluidDataSet(s).read("/Users/pa/Downloads/crashbits/dataset.json").print
y = FluidLabelSet(s).read("/Users/pa/Downloads/crashbits/labelset.json").print
z = FluidMLPClassifier(s, hiddenLayers: [95,85,75,65,55,45,35,25,15,25,35,45,55,65,75,85,95], activation: 3, learnRate: 0.1, momentum: 0.1, validation: 0, maxIter: 100)
z.fit(x,y,{|x|x.postln})

fork{var cond = Condition.new;{z.fit(x,y,{|x|x.postln; cond.unhang});cond.hang}.loop}

rconstanzo · 2024-10-10T09:32:11Z

Wait, are you clear-ing each time you restart?

I was getting a crash by not doing that. As in, I would toggle on the crunching for a bit, then toggle it off, then toggle it back on again (boom crash).

I don't think I ever got a crash from just letting it run.

tremblap · 2024-10-10T09:36:39Z

ok, removing the [defer] I added, starting and stopping, this crashes, in max only. This points to @AlexHarker investigations of thread safety in Max then, which is piling up many other instacrashes... the good news for you for now, is that if you put a defer at the right place, you get a stable patch. A solution for the bigger issue is going to happen, don't worry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

rconstanzo commented Apr 22, 2023

rconstanzo commented Apr 22, 2023

tremblap commented Apr 23, 2023

tremblap commented Apr 23, 2023

rconstanzo commented Apr 24, 2023

tremblap commented Apr 24, 2023

tremblap commented May 4, 2023

rconstanzo commented May 5, 2023

tremblap commented Oct 9, 2024

rconstanzo commented Oct 9, 2024

rconstanzo commented Oct 9, 2024 •

edited

Loading

tremblap commented Oct 10, 2024

tremblap commented Oct 10, 2024

tremblap commented Oct 10, 2024

rconstanzo commented Oct 10, 2024

tremblap commented Oct 10, 2024

tremblap commented Oct 10, 2024

tremblap commented Oct 10, 2024

rconstanzo commented Oct 10, 2024 •

edited

Loading

tremblap commented Oct 10, 2024

Instacrash with fluid.mlpclassifier when trying to fit something #364

Instacrash with fluid.mlpclassifier when trying to fit something #364

Comments

rconstanzo commented Apr 22, 2023

rconstanzo commented Apr 22, 2023

tremblap commented Apr 23, 2023

tremblap commented Apr 23, 2023

rconstanzo commented Apr 24, 2023

tremblap commented Apr 24, 2023

tremblap commented May 4, 2023

rconstanzo commented May 5, 2023

tremblap commented Oct 9, 2024

rconstanzo commented Oct 9, 2024

rconstanzo commented Oct 9, 2024 • edited Loading

tremblap commented Oct 10, 2024

tremblap commented Oct 10, 2024

tremblap commented Oct 10, 2024

rconstanzo commented Oct 10, 2024

tremblap commented Oct 10, 2024

tremblap commented Oct 10, 2024

tremblap commented Oct 10, 2024

rconstanzo commented Oct 10, 2024 • edited Loading

tremblap commented Oct 10, 2024

Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

rconstanzo commented Oct 9, 2024 •

edited

Loading

rconstanzo commented Oct 10, 2024 •

edited

Loading