Understanding how crossnobis distance metric is derived #423

ahachisuka · 2023-12-20T20:58:12Z

ahachisuka
Dec 20, 2023

Hello,

I have a question about the crossnobis distance metric from the toolbox. I am actually not using the covariance matrix, so "noise" is just the identity matrix and the distance is simply cross-validated Euclidean distance.

Conceptually, I understand the crossvalidation procedure to mean: If we have k runs, average across (k-1) runs and calculate the distance with the left-out kth run. Repeat iteratively and average across runs.

Code where this is implemented (in rdm/calc.py):

for i_fold, fold in enumerate(cv_folds):
data_test = datasetCopy.subset_obs(cv_descriptor, fold)
data_train = datasetCopy.subset_obs(
cv_descriptor,
np.setdiff1d(cv_folds, fold)
)
measurements_train, _, _ =
average_dataset_by(data_train, descriptor)
measurements_test, _, _ =
average_dataset_by(data_test, descriptor)
rdm = _calc_rdm_crossnobis_single(
measurements_train, measurements_test, noise)

The _calc_rdm_crossnobis_single function is as follows:

def _calc_rdm_crossnobis_single(meas1, meas2, noise) -> NDArray:
kernel = meas1 @ noise @ meas2.T
rdm = np.expand_dims(np.diag(kernel), 0) +
np.expand_dims(np.diag(kernel), 1) - kernel - kernel.T
return extract_triu(rdm) / meas1.shape[1]

How does this map onto the Euclidean distance formula, which seems to be in this form: dEuc = ||x||^2 + ||y||^2 - 2xy?

Because using crossvalidation (with some fMRI noise across runs, taking the distance between the averaged (k-1) runs and the left-out kth run), I had not expected the distances between identical images to be 0 but they are.

I have also written out matrix multiplication in a for loops to help with my understanding, but this further proves the point that, somehow, the code gives me zero values in the diagonal.

#expanded for-loops:

for i in range(meas1.shape[0]):
for j in range(meas2.shape[0]):
kernel[i,j] = np.dot(np.dot(meas1[i],noise),meas2[j])
rdm = np.zeros((meas1.shape[0],meas2.shape[0]))for i in range(meas1.shape[0]):

rdm = np.zeros((meas1.shape[0],meas2.shape[0]))
for i in range(meas1.shape[0]):
for j in range(meas2.shape[0]):
rdm[i,j] = kernel[i,i] + kernel[j,j] - kernel[i,j] - kernel[j,i]

Which distance metric is being used for crossnobis/cross-validated Euclidean? How does cross-validation achieve zero values in the diagonal; shouldn't it contain non-zero values due to fMRI noise?

Thank you.

JasperVanDenBosch · 2023-12-20T21:46:33Z

JasperVanDenBosch
Dec 20, 2023
Maintainer

The diagonal is always set to 0, as the interpretation of noise on the diagonal is not obvious.

0 replies

jdiedrichsen · 2023-12-21T00:55:14Z

jdiedrichsen
Dec 21, 2023
Maintainer

Yes, the distance between condition i and condition i is always zero, even for cross-validated estimates (we do not set the diagonal artificially to zero.

It's pretty easy to see from the math. Check the following paper:
https://www.diedrichsenlab.org/pubs/NBDT_2021.pdf

For the Eucledian distances (Eq. 1), the \delta (difference between conditons) is always zero if the condition is the same.
So, even if you do the cross-validated distance (Eq. 3), it remains 0.

Joern

0 replies

ahachisuka · 2023-12-21T03:57:42Z

ahachisuka
Dec 21, 2023
Author

Thank you, that cleared things up!

0 replies

ahachisuka · 2024-01-04T17:00:57Z

ahachisuka
Jan 4, 2024
Author

Hi Joern and Jasper,

I hope you had a wonderful new year!

From the section of your paper that describes cross-validated distances, I understand now that the distance between two identical images (diagonal values) must be 0, by definition.

However, I'm wondering what your thoughts are on estimating the distances across partition, as follows:

Originally, the equation for the c.v. Euclidean distances is written as

$\ d_k = \dfrac{1}{M^2} \sum_{m}^{M} \sum_{n}^{M} \delta_{k,m} \delta_{k,n}^T $

where $\ \delta_{k,m} = b_{i,m} - b_{j,m}; \delta_{k,n} = b_{i,n} - b_{j,n} $

If i = j, then $\ \delta_{k,m} =\delta_{k,n} = 0 $

But, if we take differences across partitions, we can rewrite the equation as:

$\ d_k = \dfrac{1}{M^2} \sum_{m}^{M} \sum_{n}^{M} ( b_{i,m} - b_{j,n}) ( b_{i,m} - b_{j,n})^T $

Note that I swapped the positions of m and n.

While the distances between identical images should still be a small value, it is now non-zero, by definition.

Our hope is that this will capture the variability in fMRI noise. Then for statistical testing, we could do a t-test against this non-zero diagonal value, instead of against 0 as rsatoolbox documentation suggests (https://rsatoolbox.readthedocs.io/en/stable/distances.html#crossnobis-dissimilarity).

What do you think about this alternative approach to generate a "lower ceiling bound" for neural distance estimation?

I have tried this, and the RDM (even the off-diagonal values) is less clear/interpretable. Why do you think this might be? Intuitively the math seems similar to me, other than implementing a cross-validation between partitions of unequal measurement repetitions.

Thank you so much for your help.

0 replies

JasperVanDenBosch · 2024-11-07T16:57:56Z

JasperVanDenBosch
Nov 7, 2024
Maintainer

Hi @ahachisuka , I've converted this issue to a "discussion" as it is more about RSA methodology than a software issue.
If you have found the answer to your question, please describe it in here or mark the correct answer so that others can learn from this. Thanks!

1 reply

ahachisuka Nov 12, 2024
Author

Hi Jasper, thanks for your message. After re-reading Walther, et. al. (2016), we realized calculating an unbiased distance requires crossvalidation across independent partitions to ensure that noise is independent. So, it probably wouldn't make sense to mix partitions (i.e., swap m and n in the equation) if our goal is to average out the within-partition noise. Am I understanding this correctly? It still feels a little bit odd because intuitively we wouldn't expect an exact zero-distance value between identical images just because data is always inherently noisy. If we had a zero distance value, that would imply that this approach removes noise completely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding how crossnobis distance metric is derived #423

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Understanding how crossnobis distance metric is derived #423

ahachisuka Dec 20, 2023

Replies: 5 comments · 1 reply

JasperVanDenBosch Dec 20, 2023 Maintainer

jdiedrichsen Dec 21, 2023 Maintainer

ahachisuka Dec 21, 2023 Author

ahachisuka Jan 4, 2024 Author

JasperVanDenBosch Nov 7, 2024 Maintainer

ahachisuka Nov 12, 2024 Author

ahachisuka
Dec 20, 2023

Replies: 5 comments 1 reply

JasperVanDenBosch
Dec 20, 2023
Maintainer

jdiedrichsen
Dec 21, 2023
Maintainer

ahachisuka
Dec 21, 2023
Author

ahachisuka
Jan 4, 2024
Author

JasperVanDenBosch
Nov 7, 2024
Maintainer

ahachisuka Nov 12, 2024
Author