Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MEG computation with FEM : memory, time, sensors, integration points & computation per block #350

Open
tmedani opened this issue Oct 28, 2020 · 10 comments
Assignees

Comments

@tmedani
Copy link
Member

tmedani commented Oct 28, 2020

Hi @ftadel @juangpc @jcmosher, et al,

I want just to discuss with you one of the issues that we observe when we compute the MEG forward with the FEM with high mesh densities.

With about 300 sensors and 8 integrations points per sensor and a high mesh density model (~>1 M nodes)
the transfer matrix then is around ~2.5k xs 1M, which in most case crash the computation.

here is a user experience feedback: https://neuroimage.usc.edu/forums/t/duneuro-fem-doesnt-finish/21977/24?u=tmedani

Here are the possible solutions :

-1- Possible solution is to disable the use of the cache memory, however, in this case, the computation can be very long (~ 3 days) on a regular computer (see Yvonne response in the forum with the link above).
This solution is already implemented (locally), but it's too long on a regular computer

-2- Reduce the computation of the MEG by considering only the inner FEM layers, then we can reduce the number of the node by ~ 30-50%. This solution can work but also may crash in some cases if the memory is low, and can take a long time if we disable the cache memory (~ 1 day).
This solution is also possible with the current implementation (master).

-3- Avoid the use of the integrations points and use one point per sensor, which will reduce the size of the sensor from ~ 2.5k to ~600 (gradiometers, as the ctf MEG)
I have implemented this approach and the computation is much faster, in a basic example (10k nodes and 300 sensors),
the time passes from ~8 minutes to ~ 2min, and the solution is still correct.
@jcmosher do you have an estimation of the error on the solution (inverse problem)
between these two configurations? (sensors with or without the integration points)

-4- Use all the integration points, then divides these ~2500 sensors into blocks of ~250, then call ~10 times the DUNEuro app, and finally concatenate the sub-results to build the full final LF.
With this option, we can use the cache memory without error and the computation "can be faster".
This last solution is not implemented yet.

I will try to implement it and test it, with this option we may need to read/write multiple times the i/o files.

There is also the solution (-5-) but it requires further dev on the CPP duneuro code and this will be for the next version of the bst-duneuro,

@juangpc what do you think?

@ftadel we may need to add some options to the duneuro_panel to allows the activation of these options for advanced users.
and also decide if we will include all of these options or just select some of them
( also some options can be mixed and then can be faster).

Thanks in advance for your help & feedback.

A+,
Takfarinas

@Moo-Marc
Copy link
Collaborator

Hi @tmedani, I checked the forward solution change for different number of integration points for a CTF system a little while ago. There was a 2% relative change between 2 and 8 integration points, and 0.2% change between 8 and 14 points. This was with a multi-sphere model, but it gives an idea of the order of magnitude. I think with realistic head models we can get better than 2% error so we should probably keep the 8 points.
TestForwardIntegrationPoints

@tmedani
Copy link
Member Author

tmedani commented Oct 28, 2020

Hi @tmedani, I checked the forward solution change for different number of integration points for a CTF system a little while ago. There was a 2% relative change between 2 and 8 integration points, and 0.2% change between 8 and 14 points. This was with a multi-sphere model, but it gives an idea of the order of magnitude. I think with realistic head models we can get better than 2% error so we should probably keep the 8 points.
TestForwardIntegrationPoints

Hi @Moo-Marc

Thanks for this clear answer, very helpful.

There is also this master thesis (from Carsten Group) where they investigate the same problem, and they found a similar error range (for RDM and MAG) http://www.sci.utah.edu/~wolters/PaperWolters/2019/MasterDachwitz.pdf

Then there is a tradeoff, between the feasibility of this computation on a regular computer (FEM with integration points, slower and may crash) or the use of regular sensors with less accurate solutions (faster and "no crashes").

From the Brainstorm side, maybe it's better to give all these possibles options for the users.

@juangpc
Copy link
Collaborator

juangpc commented Oct 30, 2020

Hi, I think this could make sense.

Even if we use only 2 integration points, it could still make sense, depending on the amount of memory of the user (as already mentioned).It will allow for process level parallelism, regardless of using multithreading underneath the process level. Taking into account the difficulties found in multithreading cross-compilation. Depending on the amount of overlapping (probably quite big, bit still, it could make sense).
This could save the day.

Modifying the system call for each platform could make each call asynchronous. Depending if you want to call asynchronously each set of sensors, or sequentially, as a bonus, it kind of "democratizes" high definition fem computations :) making it achievable for clients with old/not-expensive hardware. Yes at a computational-time cost.

To make asynchronous system calls we could modify the application so that there would be some kind of easy way to make matlab know that the execution has finished with no trouble. Perhaps with a file creation, and create a matlab timer to poll the execution output. In unix based systems you just add the '&' after the command and in windows you can use the "start" command. All the ini files would be specific to each call. For sequential calls, we would only need to loop over different sensors with the actual code we have now.

@ftadel
Copy link
Member

ftadel commented Mar 13, 2021

@tmedani Is this issue still up-to-date?

@tmedani
Copy link
Member Author

tmedani commented Mar 13, 2021

Yes, there is, I have a lot of changes in my local computer that I need to clean and then pushed to the main branch.
Maybe in 2 weeks, I can do it.

@ftadel
Copy link
Member

ftadel commented Mar 14, 2021

No rush, it's just to know which issues I can already close.
No worries if this one remains open :)

@ftadel
Copy link
Member

ftadel commented Sep 19, 2021

Any update on this issue?

@tmedani
Copy link
Member Author

tmedani commented Sep 22, 2021

Same in the same state, it still requires inputs and improvements.

@juangpc
Copy link
Collaborator

juangpc commented Sep 27, 2021

@tmedani If there is a specific cpp related change maybe I could help? (or even it if is not related to cpp at all.)

@ftadel
Copy link
Member

ftadel commented Jun 8, 2022

Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants