Bug in MultiSimilarityMiner? #723

kazimpal87 · 2024-10-21T17:09:12Z

Hi,

is it expected that the MultiSimilarityMiner will produce positive pairs that don't actually have the same label?

For example one of my batches has items with the following labels (this is with a small batch size of only 8 just to illustrate the problem):
tensor([ 15, 15, 15, 15, 169, 169, 169, 169], device='mps:0')

I use the MultiSimilarityMiner to mine pairs for MultiSimilarityLoss. If i print out the values of mat, pos_mask, and neg_mask in the compute_loss function of MultiSimilarityLoss, they are

tensor([[1.0000, 0.9996, 0.9975, 0.9994, 0.9948, 0.9836, 0.9968, 0.9975],
        [0.9996, 1.0000, 0.9952, 0.9981, 0.9919, 0.9798, 0.9950, 0.9963],
        [0.9975, 0.9952, 1.0000, 0.9991, 0.9977, 0.9879, 0.9993, 0.9980],
        [0.9994, 0.9981, 0.9991, 1.0000, 0.9974, 0.9876, 0.9979, 0.9975],
        [0.9948, 0.9919, 0.9977, 0.9974, 1.0000, 0.9960, 0.9947, 0.9917],
        [0.9836, 0.9798, 0.9879, 0.9876, 0.9960, 1.0000, 0.9823, 0.9767],
        [0.9968, 0.9950, 0.9993, 0.9979, 0.9947, 0.9823, 1.0000, 0.9992],
        [0.9975, 0.9963, 0.9980, 0.9975, 0.9917, 0.9767, 0.9992, 1.0000]],
       device='mps:0', grad_fn=<MmBackward0>)

tensor([[0., 1., 1., 1., 1., 1., 1., 1.],
        [1., 0., 1., 1., 1., 1., 1., 1.],
        [1., 1., 0., 1., 1., 1., 1., 1.],
        [1., 1., 1., 0., 1., 1., 1., 1.],
        [1., 1., 1., 1., 0., 1., 1., 1.],
        [1., 1., 1., 1., 1., 0., 1., 1.],
        [1., 1., 1., 1., 1., 1., 0., 1.],
        [1., 1., 1., 1., 1., 1., 1., 0.]], device='mps:0')

tensor([[0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [1., 1., 1., 1., 0., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0., 0.]], device='mps:0')

This is right at the beginning of training so the similarity scores in mat are total garbage, but the pos_mask looks wrong to me. It has selected every pair as positive, including those that don't share the same ID. Is that expected for some reason?

The text was updated successfully, but these errors were encountered:

KevinMusgrave · 2024-10-21T18:32:25Z

Can you paste in the embeddings tensor here? I'd like to reproduce the bug.

kazimpal87 · 2024-10-22T09:02:13Z

It seems to happen with any embedding tensor. I think i have traced the problem to here

mat_pos_sorting[a2, n] = pos_ignore
mat_neg_sorting[a1, p] = neg_ignore

after this, for some reason mat_pos_sorting looks like this

tensor([[ 1.0000e+00,  9.7298e-01,  9.6208e-01,  9.5423e-01, -3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38],
        [ 9.7298e-01,  1.0000e+00,  9.8334e-01,  9.5774e-01, -3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38],
        [ 9.6208e-01,  9.8334e-01,  1.0000e+00,  9.4227e-01, -3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38],
        [ 9.5423e-01,  9.5774e-01,  9.4227e-01,  1.0000e+00, -3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38],
        [-3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38,  1.0000e+00,  9.7043e-01,  9.6518e-01,  9.6306e-01],
        [-3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38,  9.7043e-01,  1.0000e+00,  9.7086e-01,  9.7068e-01],
        [-3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38,  9.6518e-01,  9.7086e-01,  1.0000e+00,  9.7823e-01],
        [-3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38,  9.6306e-01,  9.7068e-01,  9.7823e-01,  1.0000e+00]], device='mps:0')

ie it has neg_ignore filled into the negative cells instead of pos_ignore. I don't think this is an overflow issue because if i change pos_ignore and neg_ignore to 10 and -10, the issue still persists.

kazimpal87 · 2024-10-22T09:10:45Z

to make things even more strange, if i do

mat_pos_sorting[a2, n] = pos_ignore
print(mat_pos_sorting)
mat_neg_sorting[a1, p] = neg_ignore

everything works fine and the tensors contain the expected results

KevinMusgrave · 2024-10-22T12:36:21Z

Hmm, I'm not able to reproduce it yet. Can you paste in the minimal code for reproducing it?

Also what OS are you running this on and what versions of pytorch and pytorch-metric-learning are you using?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in MultiSimilarityMiner? #723

Bug in MultiSimilarityMiner? #723

kazimpal87 commented Oct 21, 2024 •

edited

Loading

KevinMusgrave commented Oct 21, 2024

kazimpal87 commented Oct 22, 2024

kazimpal87 commented Oct 22, 2024

KevinMusgrave commented Oct 22, 2024

Bug in MultiSimilarityMiner? #723

Bug in MultiSimilarityMiner? #723

Comments

kazimpal87 commented Oct 21, 2024 • edited Loading

KevinMusgrave commented Oct 21, 2024

kazimpal87 commented Oct 22, 2024

kazimpal87 commented Oct 22, 2024

KevinMusgrave commented Oct 22, 2024

kazimpal87 commented Oct 21, 2024 •

edited

Loading