Skip to content

CodyQin/SGtSNEpiPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 

Repository files navigation

SGtSNEpiPy

3d_karate_club_animation

Overview

SGtSNEpiPy is a Python interface to SG-t-SNE-П, a Julia package. It enables Python users to utilize and take full advantage of the SG-t-SNE-П functionalities within the Python ecosystem. With SGtSNEpiPy, users can swiftly and directly embed and visualize a sparse graph in a $d$-dimensional space, $d=1,2,3$. The node adjacency on the graph is translated to spatial near-neighbor proximity, as illustrated in the plots above.

The input is a sparse graph $G(V,E)$ represented by its adjacency matrix A in sparse formats. The graph at input is either directed or undirected, with the edges weighted or unweighted. The output is the array of the $d$-dimensional vertex coordinates in the embedding space.

Introduction

SG-t-SNE extends t-SNE 4,5,6,7 from point feature data to graph data, especially sparse graphs represented by their adjacency matrices in sparse formats. Here, SNE stands for stochastic near-neighbor embedding, and SG stands for sparse graph. SG-t-SNE makes a direct translation of the node adjacency on the graph to spatial proximity in the embedding space. In comparison, other graph embedding methods first embed the vertices by spectral decomposition or learning, followed by SNE. By our empirical tests and by user feedback in SCARF [3], the SG-t-SNE mapping has higher fidelity and efficiency.

The SG-t-SNE algorithm was first introduced in 2019 in the paper[1]. A software SG-t-SNE-П in C/C++ was released in 2019 in [2] and then made accessible via Julia in 2021 on GitHub. This package SGtSNEpiPy makes SG-t-SNE-Pi deployable to the Python ecosystem. This Python wrapper seamlessly converts the data formats and translates all input and output arguments between Julia and Python.

Installation

To install SGtSNEpiPy through Python from PyPi, issue

$ pip install SGtSNEpiPy

The installation is straightforward: import SGtSNEpiPy and issue the command

from SGtSNEpiPy.SGtSNEpiPy import sgtsnepipy

Warning: The current version of SGtSNEpiPy requires WSL2 on Windows and Rosetta 2 on Apple ARM hardware.

More details can be found in the full documentation.

Usage

SGtSNEpiPy.sgtsnepipy

The calling sequence is simple,

Y = sgtsnepi(A, kwargs**)

where

  • A: the adjacency matrix of a graph $G(V, E)$, in the compressive sparse row (CSR) format. Matrix A has $n=|V|$ rows, $n$ columns and the $m$ nonzero elements represent the edges in $E$. The graph is directed or undirected, with weighted or unweighted edges. The graph may represent a real-world network or the graph is synthetically generated. In the former case, the vertices may represent data feature vectors/points, and the edges represent the pairwise similarities of the feature vectors.

    Data type: scipy.sparse.csr.csr_matrix

  • Y: the n x d array of vertex coordinates in the $d$-dimensional embedding space.

    Data type: numpy.ndarray.

Key arguments

  • d (Integer): positive, the dimension of the embedding space. Default value: 2

  • λ (Integer or Float): positive, SG-t-SNE scaling factor. Default Value: 10

Optional and advanced SNE arguments (for control of search area and pace)

  • max_iter (Integer): the maximum number of iterations for the SNE optimization process. Default Value: 1000

  • early_exag (Integer): the number of early exaggeration iterations. Default Value: 250

  • alpha (Integer or Float): exaggeration strength for the first early_exag iterations. Default Value: 12

  • Y0: an n x d numpy array for the initial embedding configuration in the embedding space. Default setting: None (the initial configuration is generated randomly). For reproducibility, the user is advised to set and save Y0.

  • eta (Integer or Float): the learning parameter. Default Value: 200.0

  • drop_leaf (Boolean): if True, remove leaf nodes. Default Value: False

Optional and advanced SG-t-SNE arguments (for performance tuning)

  • np (Integer): number of threads (set to 0 to use all available cores). Default Value: threading.active_count(), which returns the number of active threads in the current process.

  • h (Float): grid step length. Default Value: 1.0

  • list_grid_size (a list of integers): the list of FFT grid sizes for data interpolation. Default Value: False. The FFT module tends to be more efficient if the transform size can be factored into small prime numbers.

  • profile (Boolean): if True, the function returns performance profile in a 3-tuple : (Y, t, g), where Y is the embedding coordinate array, t is the table of the execution times of each module per iteration (size 6 x max_iter), and g consists of the grid size, the embedding domain size (maximum(Y) - minimum(Y)), and the scaling factor s_k for the band-limited version, per dimension (size 3 x max_iter). Default Value: False

  • fftw_single (Boolean): if True, use the FFTW (Fast Fourier Transform) in single precision. Default Value: False

Examples

2D Embedding of the social network

Zachary's Karate Club

This simple example illustrates the use of SGtSNEpiPy for spatial embedding and visualization of Zachary's Karate Club network, which is readily available in NetworkX. The vertex coordinate array returned by SGtSNEpiPy is passed to the plot function matplotlib.pyplot for visualization. The vertices represent the club members, they are colored according to the membership types, either 'Mr. Hi' or 'Officer'. The scatter plot is a spatial portrait of the social network's structure and patterns.

The 2D embedding code

   from SGtSNEpiPy.SGtSNEpiPy import sgtsnepipy
   import networkx as nx
   import numpy as np
   import matplotlib.pyplot as plt

   # 'G' is the Zachary's Karate Club graph with 'club' attribute for each node
   
   G = nx.karate_club_graph()
   G_sparse_matrix = nx.to_scipy_sparse_matrix(G) 
   y = sgtsnepipy(G_sparse_matrix,d=2)
   
   # Separate the X and Y coordinates from the embedding 'y'
   X = y[:, 0]
   Y = y[:, 1]
   
   # Get the color for each node based on the 'club' attribute
   node_colors = ['red' if G.nodes[node]['club'] == 'Mr. Hi' else 'blue' for node in G.nodes]
   
   # Create a scatter plot to visualize the embedding and color the nodes
   plt.scatter(X, Y, c=node_colors, alpha=0.7)
   
   # Label the nodes with their numbers (node names)
   for node, (x, y) in enumerate(zip(X, Y)):
       plt.text(x, y, str(node))
   
   plt.title("2D SG-t-SNE-П Embedding of Zachary's Karate Club")
   plt.xlabel("Dimension 1")
   plt.ylabel("Dimension 2")
   plt.show()
2D SG-t-SNE-Π Embedding of Zachary’s Karate CLub

3D Embedding of Zachary's Karate Club

For the 3D embedding, SGtSNEpiPy is used the same way as for the 2D embedding. For the 3D visualization, the graph can be viewed from various viewpoints via rotation. We created an animated gif file with matplotlib.pyplot, matplotlib.animation, and mpl_toolkits.mplot3d.axes3d.Axes3D in matplotlib.

In order to save the animation to a gif file, install Pillow through Python from PyPi by issuing the command

$ pip install pillow

The 3D embedding code

from SGtSNEpiPy.SGtSNEpiPy import sgtsnepipy
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation
from mpl_toolkits.mplot3d import axes3d

G = nx.karate_club_graph()
G_sparse_matrix = nx.to_scipy_sparse_matrix(G) 
y = sgtsnepipy(G_sparse_matrix,d=3)

# Get the color for each node based on the 'club' attribute
node_colors = ['red' if G.nodes[node]['club'] == 'Mr. Hi' else 'blue' for node in G.nodes]

# Separate the X, Y, and Z coordinates from the 3D embedding 'y'
X = y[:, 0]
Y = y[:, 1]
Z = y[:, 2]

# Create the 3D scatter plot to visualize the embedding
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
scatter = ax.scatter(X, Y, Z, c=node_colors, cmap='coolwarm')   # One may choose other colormaps 

# Label the nodes with their names/indices  
for node, (x, y, z) in zip(G.nodes, zip(X, Y, Z)):
    ax.text(x, y, z, node)

ax.set_title("3D SG-t-SNE-П Embedding of Zachary's Karate Club")
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_zlabel('Z-axis')

# Function to initialize the animation
def init():
    scatter.set_offsets(np.column_stack([X, Y, Z]))  # Update the scatter plot data
    return scatter,

# Function to update the plot for each frame of the animation
def animate(i):
    ax.view_init(elev=30., azim=3.6*i)
    return scatter,

# Create the animation
ani = animation.FuncAnimation(fig, animate, init_func=init,
                              frames=100, interval=100, blit=True)

# Save the animation to a gif file
ani.save('3d_karate_club_animation.gif', writer='pillow')

3d_karate_club_animation

To see the spatial embedding of larger networks/graphs, visit the website SG-t-SNE-Pi

Contact

Cody (Chenshuhao) Qin: [email protected]

Aaron (Yihua) Zhong: [email protected]

Acknowledgment

Dr. N. Pitsianis, Dr. D. Floros, and Dr. X. Sun have offered valuable introductions to SG-t-SNE, sparse data formats, and Foreign Language Interfaces.

Citation

[1] Nikos Pitsianis, Alexandros-Stavros Iliopoulos, Dimitris Floros, Xiaobai Sun, Spaceland Embedding of Sparse Stochastic Graphs, In IEEE HPEC Conference, (2019).

[2] Nikos Pitsianis, Dimitris Floros, Alexandros-Stavros Iliopoulos, Xiaobai Sun, SG-t-SNE-Π: Swift Neighbor Embedding of Sparse Stochastic Graphs, JOSS, 4(39), 1577, 2019.

[3] Dhapola, P., Rodhe, J., Olofzon, R. et al. Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data, Nat Commun 13, 4616 (2022).

[4] G. Hinton, S. Roweis, Stochastic Neighbor Embedding, NIPS, 857–864, (2002).

[5] L. van der Maaten, G. Hinton, Visualizing data using t-SNE, JMLR, 9(11) 2579−2605, (2008).

[6] L. van der Maaten, Accelerating t-SNE using tree-based algorithms, JMLR, 15(1) 3221-3245, (2014).

[7] Linderman, G.C., Rachh, M., Hoskins, J.G. et al. Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data, Nat Methods 16, 243–245 (2019)..

If you use this software, please cite the following paper:

author = {Pitsianis, Nikos and Iliopoulos, Alexandros-Stavros and Floros,
          Dimitris and Sun, Xiaobai},
	  doi = {10.1109/HPEC.2019.8916505},
	  booktitle = {IEEE High Performance Extreme Computing Conference},
	  month = {11},
	  title  = {Spaceland Embedding of Sparse Stochastic Graphs}},
	  year = {2019} }