Gong, Weikang, Christian F. Beckmann, and Stephen M. Smith. "Phenotype Discovery from Population Brain Imaging." Medical Image Analysis (2021). https://www.sciencedirect.com/science/article/pii/S1361841521000967
Python 3.6+ (better with anaconda3), spams (pip install spams==2.6.1), numpy (pip install numpy==1.20.3), scipy, copy, joblib (https://joblib.readthedocs.io/en/latest/ for parallel processing of dictionary learning), and multiprocessing.
CentOS Linux 7/ macOS BigSur
The main function to use is the BigFLICA function in the script BigFLICA_cpu.py. Put this module in a position where your python can find it (e.g., /home/weikanggong). Name the folder as BigFLICA (Then you will have a folder /home/weikanggong/BigFLICA/)!
You can use the following code to add path to your python:
import sys
sys.path.append("/home/weikanggong/")
Then, prepare the data as .npy files, each is a matrix of subject-by-voxel/feature. Store them on the disk. (assume that we have two modalities, and they are stored as /home/weikanggong/mod1.npy, /home/weikanggong/mod2.npy).
Finally, suppose the output directory is /home/weikanggong/bigflica_output.
Example code for using BigFLICA is something like the following:
from BigFLICA import BigFLICA_cpu
data_loc = ['/home/weikanggong/mod1.npy',
'/home/weikanggong/mod2.npy']
output_dir = '/home/weikanggong/bigflica_output/'
nlat = 10
migp_dim =100
dicl_dim =500
ncore = 1
BigFLICA_cpu.BigFLICA(data_loc, nlat, output_dir, migp_dim, dicl_dim, ncore)
- data_loc: a list whose length equals to the number of modalities, each element is the absolute directory of data matrix of one modality in .npy format. The data matrix is assumed to be of size subject * voxels (This can be generated by vectorizing the voxel dimension by appling a binary mask). The number of subjects should be equal across modalities. (Subjects with a missing modality can be imputed by the mean of other subjects).
- nlat: Number of components to extract in BigFLICA
- output_dir: the absolute directory to store all BigFLICA results
- migp_dim: Number of components to extract in MIGP step (migp_dim > nlat).
- dicl_dim: Number of components to extract in the Dictionary learning step
- ncore: Number of CPUs to perform dictionary learning on each modality.
In the specified output directory,
- subj_course.npy is the subject course (H matrix in the paper), which is of the size subject-by-FLICA_component, this is the matrix used to correlate with behavioural variables, or to predict the behavioural variables.
- flica_mod_Z.npy is the Z-score normalized spatial maps of each modality, which is of the size voxel-by-FLICA_component.
- mod_contribution.npy is the relative contribution of each modality to each FLICA component, which is of the size modality-by-FLICA_component. Within each FLICA component, the contribution of different modalities can be sorted based on these numbers.
- Functions support nifti, cifti and freesurfer inputs.
- Plotting the spatial maps.