Why is DSMI(A;A) far different from DSE(A)? #4

nanguoyu · 2024-12-05T14:06:31Z

Hi,

Thank you for your nice paper and released code!

I am recently looking at how DSMI and DSE would work in some simple datasets. But I found DSMI(A;A) is not close to DSE(A). Is this expected?

BR,

import numpy as np
from dsmi import diffusion_spectral_mutual_information
from dse import diffusion_spectral_entropy

for n_samples in [100, 500, 1000, 2000]:
    embedding_vectors = np.random.uniform(0, 1, (n_samples, 10))
    DSMI, _ = diffusion_spectral_mutual_information(
        embedding_vectors=embedding_vectors,
        reference_vectors=embedding_vectors)
    DSE = diffusion_spectral_entropy(embedding_vectors=embedding_vectors)
    print(f'num of samples = {n_samples}, DSMI[embedding, embedding] = {DSMI}, DSE[embedding] = {DSE}' )

Then I got

num of samples = 100, DSMI[embedding, embedding] = 0.02843197210459075, DSE[embedding] = 0.0908556597236652
num of samples = 500, DSMI[embedding, embedding] = 0.019353783018266492, DSE[embedding] = 0.09478050957948245
num of samples = 1000, DSMI[embedding, embedding] = 0.019112679773918215, DSE[embedding] = 0.0969059254282217
num of samples = 2000, DSMI[embedding, embedding] = 0.015199623500426241, DSE[embedding] = 0.09653039901116785

ChenLiu-1996 · 2024-12-05T21:51:25Z

It's a very good question!

First of all, indeed mutual information I(A; A) and entropy H(A) should be equivalent, since I(A; A) = H(A) - H(A | A) and H(A|A) = 0.

Based on the simulation results you provided, I suspect that it is caused by the following fact:

On the DSMI formulation side, we are using

to approximate the conditional entropy DSE(X | Y).

This is not a perfect equality but rather an approximation. Under this approximation, we would quantize Y into a finite number of clusters, and subset X based on these clusters. As a result, if you compute DSMI(X; X), it does not give you 0.

Due to this limitation, in the paper we have only used DSMI for cases where Y is the ground truth class labels, and have not explored the case where Y are intermediate embeddings. I believe some altered definition of DSMI would be necessary for the latter case.

nanguoyu · 2024-12-06T09:21:38Z

It's a very good question!

First of all, indeed mutual information I(A; A) and entropy H(A) should be equivalent, since I(A; A) = H(A) - H(A | A) and H(A|A) = 0.

Based on the simulation results you provided, I suspect that it is caused by the following fact:

On the DSMI formulation side, we are using
to approximate the conditional entropy `DSE(X | Y)`.
This is not a perfect equality but rather an approximation. Under this approximation, we would quantize Y into a finite number of clusters, and subset X based on these clusters. As a result, if you compute DSMI(X; X), it does not give you 0.

Due to this limitation, in the paper we have only used DSMI for cases where Y is the ground truth class labels, and have not explored the case where Y are intermediate embeddings. I believe some altered definition of DSMI would be necessary for the latter case.

Thank you for your reply. Yes, a novel DSMI for intermediate embeddings would be a super interesting tool to explore information propagation between layers/neurons and/or in training dynamics. I am looking forwarding to seeing further steps after this paper. Nice work!

nanguoyu changed the title ~~Why DSMI(A;A) is far different from DSE(A)?~~ Why is DSMI(A;A) far different from DSE(A)? Dec 5, 2024

ChenLiu-1996 assigned ChenLiu-1996 and Danqi7 Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is DSMI(A;A) far different from DSE(A)? #4

Why is DSMI(A;A) far different from DSE(A)? #4

nanguoyu commented Dec 5, 2024

ChenLiu-1996 commented Dec 5, 2024 •

edited

Loading

nanguoyu commented Dec 6, 2024

Why is DSMI(A;A) far different from DSE(A)? #4

Why is DSMI(A;A) far different from DSE(A)? #4

Comments

nanguoyu commented Dec 5, 2024

ChenLiu-1996 commented Dec 5, 2024 • edited Loading

nanguoyu commented Dec 6, 2024

ChenLiu-1996 commented Dec 5, 2024 •

edited

Loading