Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in MultiSimilarityMiner? #723

Open
kazimpal87 opened this issue Oct 21, 2024 · 4 comments
Open

Bug in MultiSimilarityMiner? #723

kazimpal87 opened this issue Oct 21, 2024 · 4 comments

Comments

@kazimpal87
Copy link

kazimpal87 commented Oct 21, 2024

Hi,

is it expected that the MultiSimilarityMiner will produce positive pairs that don't actually have the same label?

For example one of my batches has items with the following labels (this is with a small batch size of only 8 just to illustrate the problem):
tensor([ 15, 15, 15, 15, 169, 169, 169, 169], device='mps:0')

I use the MultiSimilarityMiner to mine pairs for MultiSimilarityLoss. If i print out the values of mat, pos_mask, and neg_mask in the compute_loss function of MultiSimilarityLoss, they are

tensor([[1.0000, 0.9996, 0.9975, 0.9994, 0.9948, 0.9836, 0.9968, 0.9975],
        [0.9996, 1.0000, 0.9952, 0.9981, 0.9919, 0.9798, 0.9950, 0.9963],
        [0.9975, 0.9952, 1.0000, 0.9991, 0.9977, 0.9879, 0.9993, 0.9980],
        [0.9994, 0.9981, 0.9991, 1.0000, 0.9974, 0.9876, 0.9979, 0.9975],
        [0.9948, 0.9919, 0.9977, 0.9974, 1.0000, 0.9960, 0.9947, 0.9917],
        [0.9836, 0.9798, 0.9879, 0.9876, 0.9960, 1.0000, 0.9823, 0.9767],
        [0.9968, 0.9950, 0.9993, 0.9979, 0.9947, 0.9823, 1.0000, 0.9992],
        [0.9975, 0.9963, 0.9980, 0.9975, 0.9917, 0.9767, 0.9992, 1.0000]],
       device='mps:0', grad_fn=<MmBackward0>)

tensor([[0., 1., 1., 1., 1., 1., 1., 1.],
        [1., 0., 1., 1., 1., 1., 1., 1.],
        [1., 1., 0., 1., 1., 1., 1., 1.],
        [1., 1., 1., 0., 1., 1., 1., 1.],
        [1., 1., 1., 1., 0., 1., 1., 1.],
        [1., 1., 1., 1., 1., 0., 1., 1.],
        [1., 1., 1., 1., 1., 1., 0., 1.],
        [1., 1., 1., 1., 1., 1., 1., 0.]], device='mps:0')

tensor([[0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [1., 1., 1., 1., 0., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0., 0.]], device='mps:0')

This is right at the beginning of training so the similarity scores in mat are total garbage, but the pos_mask looks wrong to me. It has selected every pair as positive, including those that don't share the same ID. Is that expected for some reason?

@KevinMusgrave
Copy link
Owner

Can you paste in the embeddings tensor here? I'd like to reproduce the bug.

@kazimpal87
Copy link
Author

It seems to happen with any embedding tensor. I think i have traced the problem to here

mat_pos_sorting[a2, n] = pos_ignore
mat_neg_sorting[a1, p] = neg_ignore

after this, for some reason mat_pos_sorting looks like this

tensor([[ 1.0000e+00,  9.7298e-01,  9.6208e-01,  9.5423e-01, -3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38],
        [ 9.7298e-01,  1.0000e+00,  9.8334e-01,  9.5774e-01, -3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38],
        [ 9.6208e-01,  9.8334e-01,  1.0000e+00,  9.4227e-01, -3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38],
        [ 9.5423e-01,  9.5774e-01,  9.4227e-01,  1.0000e+00, -3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38],
        [-3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38,  1.0000e+00,  9.7043e-01,  9.6518e-01,  9.6306e-01],
        [-3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38,  9.7043e-01,  1.0000e+00,  9.7086e-01,  9.7068e-01],
        [-3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38,  9.6518e-01,  9.7086e-01,  1.0000e+00,  9.7823e-01],
        [-3.4028e+38, -3.4028e+38, -3.4028e+38, -3.4028e+38,  9.6306e-01,  9.7068e-01,  9.7823e-01,  1.0000e+00]], device='mps:0')

ie it has neg_ignore filled into the negative cells instead of pos_ignore. I don't think this is an overflow issue because if i change pos_ignore and neg_ignore to 10 and -10, the issue still persists.

@kazimpal87
Copy link
Author

to make things even more strange, if i do

mat_pos_sorting[a2, n] = pos_ignore
print(mat_pos_sorting)
mat_neg_sorting[a1, p] = neg_ignore

everything works fine and the tensors contain the expected results

@KevinMusgrave
Copy link
Owner

Hmm, I'm not able to reproduce it yet. Can you paste in the minimal code for reproducing it?

Also what OS are you running this on and what versions of pytorch and pytorch-metric-learning are you using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants