Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D visulization of crowded cluster with ivis #119

Open
yuxiaokang-source opened this issue Jan 31, 2023 · 1 comment
Open

2D visulization of crowded cluster with ivis #119

yuxiaokang-source opened this issue Jan 31, 2023 · 1 comment

Comments

@yuxiaokang-source
Copy link

I test the 2D visulization of ivis with mnist dataset. I found the distribution of point is very crowded in the 2D figure, my code is below:

from sklearn.datasets import fetch_openml
from ivis import Ivis
import matplotlib.pyplot as plt 
from sklearn.preprocessing import MinMaxScaler
from sklearn import datasets
import numpy as np
from tensorflow.keras.datasets import mnist
from ivis import Ivis

import os
os.environ["PYTHONHASHSEED"]="0"
import random
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
#tf.random.set_seed(1234)

from sklearn.preprocessing import MinMaxScaler
X, Y = fetch_openml('mnist_784', version=1, return_X_y=True)
X = MinMaxScaler().fit_transform(X)
target=Y.copy()

model = Ivis(embedding_dims=2, k=15)

embeddings = model.fit_transform(X)
print(embeddings.shape)
                                 
fig=plt.figure(figsize=(20,12))
for label in np.unique(target):
    plt.scatter(embeddings[label==target,0], embeddings[label==target,1],label=label)
plt.legend(loc="upper left")
plt.show()

the result is below:
image

It is very different from the umap visulization:
image

I have test k=10,20,50,100,150 with ivis, the cluster of all digits is very crowded with little difference.In other word, I think the ideal result is few points should be overlapped like umap. Could you give some advices on how to solve this crowded problem? for example, the digit 1 should be away from any other digit but ivis lost this information in this sence.

@idroz
Copy link
Collaborator

idroz commented Jan 31, 2023

Hi there,

Generally speaking, in dimensionality reduction (DR) there is a trade-off between cluster separability and fidelity to actual data structure. Ivis was developed to prioritise the latter.

For example, consider this figure:

Ivis does a much better job at preserving L1 and L2 distances between observations across high- and low-dimensional space. This is not the case for many other DR techniques.

Also, this:

image

In other words, the separability between data points (or lack of it) is a real phenomenon of the dataset and is not artificially optimised for by Ivis algorithm.

We find this property of ivis much more useful in any downstream tasks from classification/regression to metric learning. We discuss this property in greater depth in our paper: https://www.nature.com/articles/s41598-019-45301-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants