This repository implements the tools for analysing latent spaces of NARs and their symmetries, as described in the paper
Latent Space Representations of Neural Algorithmic Reasoners
by Vladimir V. Mirjanić, Razvan Pascanu, and Petar Veličković (ICML 2023 Workshop).
Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, usually by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed during the execution of the algorithm. In this work we perform a detailed analysis of the structure of the latent space induced by the GNN when executing algorithms.
We identify two possible failure modes:
- loss of resolution, making it hard to distinguish similar values;
- inability to deal with values outside the range observed during training.
We propose to solve the first issue by relying on a softmax aggregator, and propose to decay the latent space in order to deal with out-of-range values. We show that these changes lead to improvements on the majority of algorithms in the standard CLRS-30 benchmark when using the state-of-the-art Triplet-GMPNN processor.
Trajectory-wise PCA of random samples in the latent space:
Step-wise PCA of random samples in the latent space:
16 clusters under the reweighting symmetry of Bellman-Ford algorithm:
Latent spaces under scaling symmetry, for one equivalency class:
Latent spaces under scaling symmetry, for many equivalency classes:
This repository is based on Google DeepMind's CLRS Algorithmic Reasoning Benchmark deepmind/clrs.
A standard way to run the code is to first train the NAR with run.py
, then to extract
trajectories by passing the test
flag, and finally to use the trajectories in various
Jupyter notebooks to generate graphical visualisations.
Features present in this repository compared to the original can be summarised as follows:
- In the
clrs/
directory, which contains original DeepMind code, we make the following changes:- We edit
examples/run.py
to provide a mechanism for extracting trajectories from samples, as well as the following flags:noise_injection_strategy
enum which sets the type of noise to be added to hiddens after each processor step. Some strategies (Directional
, etc.) require principal to be generated beforehand, while others (Uniform
andCorrupt
) do not. Default isNoisefree
.softmax_reduction
boolean to toggle softmax aggregation on or off. Default isoff
.decay
float to specify amount of decay to apply to the processor. Default is1
.test
boolean that, when set, restores trained model and uses it to generate trajectories.sample_strat
, which is only applicable for the Bellman-Ford algorithm, and controls sampling of graphs.
- We edit
nets.py
to add support for softmax, decay, and noise, as well as to enable feature extraction. - We implement softmax aggregation and linear PGN in
processors.py
. - We modify
samplers.py
to allow support for custom sampling methods while remaining backwards compatible by wrapping sampling in a stream. We implement multiple sampling strategies for Bellman-Ford that target its scaling, permutation, and reweighting symmetries.
- We edit
- In the
lsr/
directory we provide tools for studying the latent space representations.- In
figure_generators/
are the notebooks used to generate figures from the paper. These notebooks also generate plots that did not make it into the paper or the appendix, but that are useful in their own right. - We are unable to provide the trajectories ourselves due to the GitHub's size limit of
2GB per file. However, we provide ample figures in the
plots/
directory. - In
alternative_ford.py
we provide several faulty implementations of Bellman-Ford that the correct algorithm is evaluated against. - In
noise_injection.py
we implement multiple methods of adding noise to embeddings. - In
experment.py
we automate many plotting functions that are used to generate figures.
- In
- In the
results/
directory we providerunner.sh
that was used to run the entire test suite (Triplet GPMNN processor, 30000 train steps, seeds in range (42, 47), softmax aggregation and processor decay both either enabled and disabled). Raw test results used to generate tables in the paper are inclrs_summary.txt
, and the code that generates LaTeX tables is insummarise.ipynb
.