2D visulization of crowded cluster with ivis #119

yuxiaokang-source · 2023-01-31T08:27:08Z

I test the 2D visulization of ivis with mnist dataset. I found the distribution of point is very crowded in the 2D figure, my code is below:

from sklearn.datasets import fetch_openml
from ivis import Ivis
import matplotlib.pyplot as plt 
from sklearn.preprocessing import MinMaxScaler
from sklearn import datasets
import numpy as np
from tensorflow.keras.datasets import mnist
from ivis import Ivis

import os
os.environ["PYTHONHASHSEED"]="0"
import random
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
#tf.random.set_seed(1234)

from sklearn.preprocessing import MinMaxScaler
X, Y = fetch_openml('mnist_784', version=1, return_X_y=True)
X = MinMaxScaler().fit_transform(X)
target=Y.copy()

model = Ivis(embedding_dims=2, k=15)

embeddings = model.fit_transform(X)
print(embeddings.shape)
                                 
fig=plt.figure(figsize=(20,12))
for label in np.unique(target):
    plt.scatter(embeddings[label==target,0], embeddings[label==target,1],label=label)
plt.legend(loc="upper left")
plt.show()

the result is below:

It is very different from the umap visulization:

I have test k=10,20,50,100,150 with ivis, the cluster of all digits is very crowded with little difference.In other word, I think the ideal result is few points should be overlapped like umap. Could you give some advices on how to solve this crowded problem? for example, the digit 1 should be away from any other digit but ivis lost this information in this sence.

The text was updated successfully, but these errors were encountered:

idroz · 2023-01-31T09:12:03Z

Hi there,

Generally speaking, in dimensionality reduction (DR) there is a trade-off between cluster separability and fidelity to actual data structure. Ivis was developed to prioritise the latter.

For example, consider this figure:

Ivis does a much better job at preserving L1 and L2 distances between observations across high- and low-dimensional space. This is not the case for many other DR techniques.

Also, this:

In other words, the separability between data points (or lack of it) is a real phenomenon of the dataset and is not artificially optimised for by Ivis algorithm.

We find this property of ivis much more useful in any downstream tasks from classification/regression to metric learning. We discuss this property in greater depth in our paper: https://www.nature.com/articles/s41598-019-45301-0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2D visulization of crowded cluster with ivis #119

2D visulization of crowded cluster with ivis #119

yuxiaokang-source commented Jan 31, 2023

idroz commented Jan 31, 2023

2D visulization of crowded cluster with ivis #119

2D visulization of crowded cluster with ivis #119

Comments

yuxiaokang-source commented Jan 31, 2023

idroz commented Jan 31, 2023