Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is DSMI(A;A) far different from DSE(A)? #4

Open
nanguoyu opened this issue Dec 5, 2024 · 2 comments
Open

Why is DSMI(A;A) far different from DSE(A)? #4

nanguoyu opened this issue Dec 5, 2024 · 2 comments
Assignees

Comments

@nanguoyu
Copy link

nanguoyu commented Dec 5, 2024

Hi,

Thank you for your nice paper and released code!

I am recently looking at how DSMI and DSE would work in some simple datasets. But I found DSMI(A;A) is not close to DSE(A). Is this expected?

BR,

import numpy as np
from dsmi import diffusion_spectral_mutual_information
from dse import diffusion_spectral_entropy

for n_samples in [100, 500, 1000, 2000]:
    embedding_vectors = np.random.uniform(0, 1, (n_samples, 10))
    DSMI, _ = diffusion_spectral_mutual_information(
        embedding_vectors=embedding_vectors,
        reference_vectors=embedding_vectors)
    DSE = diffusion_spectral_entropy(embedding_vectors=embedding_vectors)
    print(f'num of samples = {n_samples}, DSMI[embedding, embedding] = {DSMI}, DSE[embedding] = {DSE}' )

Then I got

num of samples = 100, DSMI[embedding, embedding] = 0.02843197210459075, DSE[embedding] = 0.0908556597236652
num of samples = 500, DSMI[embedding, embedding] = 0.019353783018266492, DSE[embedding] = 0.09478050957948245
num of samples = 1000, DSMI[embedding, embedding] = 0.019112679773918215, DSE[embedding] = 0.0969059254282217
num of samples = 2000, DSMI[embedding, embedding] = 0.015199623500426241, DSE[embedding] = 0.09653039901116785
@nanguoyu nanguoyu changed the title Why DSMI(A;A) is far different from DSE(A)? Why is DSMI(A;A) far different from DSE(A)? Dec 5, 2024
@ChenLiu-1996
Copy link
Owner

ChenLiu-1996 commented Dec 5, 2024

It's a very good question!

First of all, indeed mutual information I(A; A) and entropy H(A) should be equivalent, since I(A; A) = H(A) - H(A | A) and H(A|A) = 0.

Based on the simulation results you provided, I suspect that it is caused by the following fact:

On the DSMI formulation side, we are using

Screenshot 2024-12-05 at 4 38 12 PM

to approximate the conditional entropy DSE(X | Y).

This is not a perfect equality but rather an approximation. Under this approximation, we would quantize Y into a finite number of clusters, and subset X based on these clusters. As a result, if you compute DSMI(X; X), it does not give you 0.

Due to this limitation, in the paper we have only used DSMI for cases where Y is the ground truth class labels, and have not explored the case where Y are intermediate embeddings. I believe some altered definition of DSMI would be necessary for the latter case.

@nanguoyu
Copy link
Author

nanguoyu commented Dec 6, 2024

It's a very good question!

First of all, indeed mutual information I(A; A) and entropy H(A) should be equivalent, since I(A; A) = H(A) - H(A | A) and H(A|A) = 0.

Based on the simulation results you provided, I suspect that it is caused by the following fact:

On the DSMI formulation side, we are using

Screenshot 2024-12-05 at 4 38 12 PM to approximate the conditional entropy `DSE(X | Y)`.

This is not a perfect equality but rather an approximation. Under this approximation, we would quantize Y into a finite number of clusters, and subset X based on these clusters. As a result, if you compute DSMI(X; X), it does not give you 0.

Due to this limitation, in the paper we have only used DSMI for cases where Y is the ground truth class labels, and have not explored the case where Y are intermediate embeddings. I believe some altered definition of DSMI would be necessary for the latter case.

Thank you for your reply. Yes, a novel DSMI for intermediate embeddings would be a super interesting tool to explore information propagation between layers/neurons and/or in training dynamics. I am looking forwarding to seeing further steps after this paper. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants