PyG 2.1.0: Principled aggregations, link-level and temporal samplers, data pipe support, ...
We are excited to announce the release of PyG 2.1.0 πππ
PyG 2.1.0 is the culmination of work from over 60 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.0.4
.
Highlights
Principled Aggregations
See here for the accompanying tutorial.
Aggregation functions play an important role in the message passing framework and the readout functions of Graph Neural Networks. Specifically, many works in the literature (Hamilton et al. (2017), Xu et al. (2018), Corso et al. (2020), Li et al. (2020), Tailor et al. (2021), Bartunov et al. (2022)) demonstrate that the choice of aggregation functions contributes significantly to the representational power and performance of the model.
To facilitate further experimentation and unify the concepts of aggregation within GNNs across both MessagePassing
and global readouts, we have made the concept of Aggregation
a first-class principle in PyG (#4379, #4522, #4687, #4721, #4731, #4762, #4749, #4779, #4863, #4864, #4865, #4866, #4872, #4927, #4934, #4935, #4957, #4973, #4973, #4986, #4995, #5000, #5021, #5034, #5036, #5039, #4522, #5033, #5085, #5097, #5099, #5104, #5113, #5130, #5098, #5191). As of now, PyG provides support for various aggregations β from simple ones (e.g., mean
, max
, sum
), to advanced ones (e.g., median
, var
, std
), learnable ones (e.g., SoftmaxAggregation
, PowerMeanAggregation
), and exotic ones (e.g., LSTMAggregation
, SortAggregation
, EquilibriumAggregation
). Furthermore, multiple aggregations can be combined and stacked together:
from torch_geometric.nn import MessagePassing, SoftmaxAggregation
class MyConv(MessagePassing):
def __init__(self, ...):
# Combines a set of aggregations and concatenates their results.
# The interface also supports automatic resolution.
super().__init__(aggr=['mean', 'std', SoftmaxAggregation(learn=True)])
Link-level Neighbor Loader
We added a new LinkNeighborLoader
class for training scalable GNNs that perform edge-level predictions on giant graphs (#4396, #4439, #4441, #4446, #4508, #4509, #4868). LinkNeighborLoader
comes with automatic support for both homogeneous and heterogenous data, and supports link prediction via automatic negative sampling as well as edge-level classification and regression models:
from torch_geometric.loader import LinkNeighborLoader
loader = LinkNeighborLoader(
data,
num_neighbors=[30] * 2, # Sample 30 neighbors for each node for 2 iterations
batch_size=128, # Use a batch size of 128 for sampling training links
edge_label_index=data.edge_index, # Use the entire graph for supervision
negative_sampling_ratio=1.0, # Sample negative edges
)
sampled_data = next(iter(loader))
print(sampled_data)
>>> Data(x=[1368, 1433], edge_index=[2, 3103], edge_label_index=[2, 256], edge_label=[256])
Neighborhood Sampling based on Temporal Constraints
Both NeighborLoader
and LinkNeighborLoader
now support temporal sampling via the time_attr
argument (#4025, #4877, #4908, #5137, #5173). If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. neighbors have an earlier timestamp than the center node:
from torch_geometric.loader import NeighborLoader
data['paper'].time = torch.arange(data['paper'].num_nodes)
loader = NeighborLoader(
data,
input_nodes='paper',
time_attr='time', # Only sample papers that appeared before the seed paper
num_neighbors=[30] * 2,
batch_size=128,
)
Note that this feature requires torch-sparse>=0.6.14
.
Functional DataPipes
See here for the accompanying example.
PyG now fully supports data loading using the newly introduced concept of DataPipes
in PyTorch for easily constructing flexible and performant data pipelines (#4302, #4345, #4349). PyG provides DataPipe
support for batching multiple PyG data
objects together and for applying any PyG transform:
datapipe = FileOpener(['SMILES_HIV.csv'])
datapipe = datapipe.parse_csv_as_dict()
datapipe = datapipe.parse_smiles(target_key='HIV_active')
datapipe = datapipe.in_memory_cache() # Cache graph instances in-memory.
datapipe = datapipe.shuffle()
datapipe = datapipe.batch_graphs(batch_size=32)
datapipe = FileLister([root_dir], masks='*.off', recursive=True)
datapipe = datapipe.read_mesh()
datapipe = datapipe.in_memory_cache() # Cache graph instances in-memory.
datapipe = datapipe.sample_points(1024) # Use PyG transforms from here.
datapipe = datapipe.knn_graph(k=8)
datapipe = datapipe.shuffle()
datapipe = datapipe.batch_graphs(batch_size=32)
Breaking Changes
- The
torch_geometric.utils.metric
package has been removed. We now recommend to use thetorchmetrics
package instead. len(batch)
of thedata.Batch
class will now return the number of graphs inside the batch, not the number of attributes (#4931)
Deprecations
- The usage of the
torch_geometric.nn.glob
package is now deprecated in favor oftorch_geometric.nn.aggr
- The usage of
RandomTranslate
is now deprecated in favor ofRandomJitter
(#4828)
Features
Layers, Models and Examples
- Added the
GroupAddRev
module with support for reducing training GPU memory (#4671, #4701, #4715, #4730) [Example] - Added the
MaskLabel
module for performing masked label propagation (#4197) **[Example] - Added the
DimeNetPlusPlus
module (#4432, #4699, #4700, #4800) - Added the
MeanSubtractionNorm
module (#5068) - Added the
DynamicBatchSampler
for filling a mini-batch with a variable number of samples up to a maximum size (#4972) - Added PyTorch Lightning support in GraphGym (#4511, #4516 #4531, #4689, #4843)
- Added an example of using PyG with PyTorch Ignite (#4487)
- Added an example for unsupervised heterogeneous graph learning (#3189)
- Added an example for unsupervised
GraphSAGE
on thePPI
dataset (#4416) - Added the
EdgeCNN
model (#4991) - Added an example to load a trained PyG model in C++ (#4307)
Transformations
- Added the
AddPositionalEncoding
transforms with two implementations:AddLaplacianEigenvectorPE
andAddRandomWalkPE
(#4521) - Added the
Rooted
transform with two implementations:RootedEgoNets
andRootedRWSubgraph
(#3926) - Added support for computing weighted metapaths in
AddMetapaths
(#5049)
Datasets
- Added the
Genius
andWiki
datasets to theLINKXDataset
(#4570, #4600) - Added the
AQSOL
dataset (#4626) - Added
Geom-GCN
splits to thePlanetoid
datasets (#4442)
General Improvements
- Added support for
GATv2Conv
in theGAT
model (#4357) - Added support for projecting features before propagation in
SAGEConv
(#4437) - Added a
MessagePassing.explain_message()
method to customize making explanations on messages (#4278, #4448)) - Added the
MLP.plain_last = False
option (4652) - Added support graph-level attributes in
networkx
conversion (#4343) - Added
Data.validate()
andHeteroData.validate()
functionality to validate the correctness of the data (#4885) - Added TorchScript support to
JumpingKnowledge
module (#4805) - Added
predict()
support to theLightningNodeData
module (#4884) - Added support for renaming node types via
HeteroData.rename()
(#4329) - Added
HeteroData.num_features
functionality (#4504) - Added
HeteroData.subgraph
,HeteroData.node_type_subgraph
andHeteroData.edge_type_subgraph
functionality (#4243) - Added
HeteroData
support to theRemoveIsolatedNodes
transform (#4479) - Added support for graph-level outputs in
to_hetero
(#4582) - Added
HeteroData.is_undirected()
support (#4604) - Added
HeteroData.node_items()
andHeteroData.edge_items()
functionality (#4644) - Added
HeteroData.subgraph()
support (#4635) - Added node-wise normalization mode in
LayerNorm
(#4944) - Added
utils.unbatch
andutils.unbatch_edge_index
functionality for splitting anedge_index
tensor according to abatch
vector (#4628, #4903) - Added scalable
inference
mode inBasicGNN
with layer-wise neighbor loading (#4977) - Added fine grained options for setting
bias
anddropout
per layer in theMLP
model (#4981) - Added support for
BasicGNN
models withinto_hetero
(#5091) - Let
ImbalancedSampler
accepttorch.Tensor
as input (#5138) - Allow
edge_type == rev_edge_type
argument inRandomLinkSplit
(#4757)
Bugfixes
- Fixed a bug in
RGATConv
that produced device mismatches for"f-scaled"
mode (#5187] - Fixed a bug in
GINEConv
bug for non-Sequential
neural network layers (#5154] - Fixed a bug in
HGTLoader
which produced outputs with missing edge types, will requiretorch-sparse>=0.6.15
(#5067) - Fixed a bug in
load_state_dict
forLinear
withstrict=False
mode (5094) - Fixed
data.num_node_features
computation for sparse matrices (5089) - Fixed a bug in which GraphGym did not create new non-linearity functions but re-used existing ones (4978)
- Fixed
BasicGNN
fornum_layers=1
, which now respects a desired number ofout_channels
(#4943) - Fixed a bug in
data.subgraph
for 0-dim tensors (#4932) - Fixed a bug in
InMemoryDataset
inferring wrong length for lists of tensors (#4837) - Fixed a bug in
TUDataset
wherepre_filter
was not applied wheneverpre_transform
was present (#4842) - Fixed access of edge types in
HeteroData
via two node types when there exists multiple relations between them (#4782) - Fixed a bug in
HANConv
in which destination node features rather than source node features were propagated (#4753) - Fixed a ranking protocol bug in the
RGCN
link prediction example (#4688) - Fixed the interplay between
TUDataset
andpre_transform
transformations that modify node features (#4669) - The
bias
argument inTAGConv
is now correctly applied (#4597) - Fixed filtering of attributes in samplers in case
__cat_dim__ != 0
(#4629) - Fixed
SparseTensor
support inNeighborLoader
(#4320) - Fixed average degree handling in
PNAConv
(#4312) - Fixed a bug in
from_networkx
in case some attributes are PyTorch tensors (#4486) - Fixed a missing clamp in the
DimeNet
model (#4506, #4562) - Fixed the download link in
DBP15K
(#4428) - Fixed an autograd bug in
DimeNet
when resetting parameters (#4424) - Fixed bipartite message passing in case
flow="target_to_source"
(#4418) - Fixed a bug in which
num_nodes
was not properly updated in theFixedPoints
transform (#4394) - Fixed a bug in which
GATConv
was not jittable (#4347) - Fixed a bug in which
nn.models.GAT
did not produceout_channels
many output channels (#4299) - Fixed a bug in mini-batching with empty lists as attributes (#4293)
- Fixed a bug in which
GCNConv
could not be combined withto_hetero
on heterogeneous graphs with one node type (#4279)
Full Changelog
Added
- Added
edge_label_time
argument toLinkNeighborLoader
(#5137, #5173) - Let
ImbalancedSampler
accepttorch.Tensor
as input (#5138) - Added
flow
argument togcn_norm
to correctly normalize the adjacency matrix inGCNConv
(#5149) NeighborSampler
supports graphs without edges (#5072)- Added the
MeanSubtractionNorm
layer (#5068) - Added
pyg_lib.segment_matmul
integration withinRGCNConv
(#5052, #5096) - Support
SparseTensor
as edge label inLightGCN
(#5046) - Added support for
BasicGNN
models withinto_hetero
(#5091) - Added support for computing weighted metapaths in
AddMetapaths
(#5049) - Added inference benchmark suite (#4915)
- Added a dynamically sized batch sampler for filling a mini-batch with a variable number of samples up to a maximum size (#4972)
- Added fine grained options for setting
bias
anddropout
per layer in theMLP
model (#4981) - Added
EdgeCNN
model (#4991) - Added scalable
inference
mode inBasicGNN
with layer-wise neighbor loading (#4977) - Added inference benchmarks (#4892, #5107)
- Added PyTorch 1.12 support (#4975)
- Added
unbatch_edge_index
functionality for splitting anedge_index
tensor according to abatch
vector (#4903) - Added node-wise normalization mode in
LayerNorm
(#4944) - Added support for
normalization_resolver
(#4926, #4951, #4958, #4959) - Added notebook tutorial for
torch_geometric.nn.aggr
package to documentation (#4927) - Added support for
follow_batch
for lists or dictionaries of tensors (#4837) - Added
Data.validate()
andHeteroData.validate()
functionality (#4885) - Added
LinkNeighborLoader
support toLightningDataModule
(#4868) - Added
predict()
support to theLightningNodeData
module (#4884) - Added
time_attr
argument toLinkNeighborLoader
(#4877, #4908) - Added a
filter_per_worker
argument to data loaders to allow filtering of data within sub-processes (#4873) - Added a
NeighborLoader
benchmark script (#4815, #4862) - Added support for
FeatureStore
andGraphStore
inNeighborLoader
(#4817, #4851, #4854, #4856, #4857, #4882, #4883, #4929, #4992, #4962, #4968, #5037, #5088) - Added a
normalize
parameter todense_diff_pool
(#4847) - Added
size=None
explanation to jittableMessagePassing
modules in the documentation (#4850) - Added documentation to the
DataLoaderIterator
class (#4838) - Added
GraphStore
support toData
andHeteroData
(#4816) - Added
FeatureStore
support toData
andHeteroData
(#4807, #4853) - Added
FeatureStore
andGraphStore
abstractions (#4534, #4568) - Added support for dense aggregations in
global_*_pool
(#4827) - Added Python version requirement (#4825)
- Added TorchScript support to
JumpingKnowledge
module (#4805) - Added a
max_sample
argument toAddMetaPaths
in order to tackle very dense metapath edges (#4750) - Test
HANConv
with empty tensors (#4756, #4841) - Added the
bias
vector to theGCN
model definition in the "Create Message Passing Networks" tutorial (#4755) - Added
transforms.RootedSubgraph
interface with two implementations:RootedEgoNets
andRootedRWSubgraph
(#3926) - Added
ptr
vectors forfollow_batch
attributes withinBatch.from_data_list
(#4723) - Added
torch_geometric.nn.aggr
package (#4687, #4721, #4731, #4762, #4749, #4779, #4863, #4864, #4865, #4866, #4872, #4934, #4935, #4957, #4973, #4973, #4986, #4995, #5000, #5034, #5036, #5039, #4522, #5033, #5085, #5097, #5099, #5104, #5113, #5130, #5098, #5191) - Added the
DimeNet++
model (#4432, #4699, #4700, #4800) - Added an example of using PyG with PyTorch Ignite (#4487)
- Added
GroupAddRev
module with support for reducing training GPU memory (#4671, #4701, #4715, #4730) - Added benchmarks via
wandb
(#4656, #4672, #4676) - Added
unbatch
functionality (#4628) - Confirm that
to_hetero()
works with custom functions, e.g.,dropout_adj
(4653) - Added the
MLP.plain_last=False
option (4652) - Added a check in
HeteroConv
andto_hetero()
to ensure thatMessagePassing.add_self_loops
is disabled (4647) - Added
HeteroData.subgraph()
support (#4635) - Added the
AQSOL
dataset (#4626) - Added
HeteroData.node_items()
andHeteroData.edge_items()
functionality (#4644) - Added PyTorch Lightning support in GraphGym (#4511, #4516 #4531, #4689, #4843)
- Added support for returning embeddings in
MLP
models (#4625) - Added faster initialization of
NeighborLoader
in case edge indices are already sorted (viais_sorted=True
) (#4620, #4702) - Added
AddPositionalEncoding
transform (#4521) - Added
HeteroData.is_undirected()
support (#4604) - Added the
Genius
andWiki
datasets tonn.datasets.LINKXDataset
(#4570, #4600) - Added
nn.aggr.EquilibrumAggregation
implicit global layer (#4522) - Added support for graph-level outputs in
to_hetero
(#4582) - Added
CHANGELOG.md
(#4581) - Added
HeteroData
support to theRemoveIsolatedNodes
transform (#4479) - Added
HeteroData.num_features
functionality (#4504) - Added support for projecting features before propagation in
SAGEConv
(#4437) - Added
Geom-GCN
splits to thePlanetoid
datasets (#4442) - Added a
LinkNeighborLoader
for training scalable link predictions models #4396, #4439, #4441, #4446, #4508, #4509) - Added an unsupervised
GraphSAGE
example onPPI
(#4416) - Added support for
LSTM
aggregation inSAGEConv
(#4379) - Added support for floating-point labels in
RandomLinkSplit
(#4311, #4383) - Added support for
torch.data
DataPipes
(#4302, #4345, #4349) - Added support for the
cosine
argument in theKNNGraph
/RadiusGraph
transforms (#4344) - Added support graph-level attributes in
networkx
conversion (#4343) - Added support for renaming node types via
HeteroData.rename
(#4329) - Added an example to load a trained PyG model in C++ (#4307)
- Added a
MessagePassing.explain_message
method to customize making explanations on messages (#4278, #4448)) - Added support for
GATv2Conv
in thenn.models.GAT
model (#4357) - Added
HeteroData.subgraph
functionality (#4243) - Added the
MaskLabel
module and a corresponding masked label propagation example (#4197) - Added temporal sampling support to
NeighborLoader
(#4025) - Added an example for unsupervised heterogeneous graph learning based on "Deep Multiplex Graph Infomax" (#3189)
Changed
- Changed docstring for
RandomLinkSplit
(#5190) - Switched to PyTorch
scatter_reduce
implementation - experimental feature (#5120) - Fixed
RGATConv
device mismatches forf-scaled
mode (#5187] - Allow for multi-dimensional
edge_labels
inLinkNeighborLoader
(#5186] - Fixed
GINEConv
bug with non-sequential input (#5154] - Improved error message (#5095)
- Fixed
HGTLoader
bug which produced outputs with missing edge types (#5067) - Fixed dynamic inheritance issue in data batching (#5051)
- Fixed
load_state_dict
inLinear
withstrict=False
mode (5094) - Fixed typo in
MaskLabel.ratio_mask
(5093) - Fixed
data.num_node_features
computation for sparse matrices (5089) - Fixed
torch.fx
bug withtorch.nn.aggr
package (#5021)) - Fixed
GenConv
test (4993) - Fixed packaging tests for Python 3.10 (4982)
- Changed
act_dict
(part ofgraphgym
) to create individual instances instead of reusing the same ones everywhere (4978) - Fixed issue where one-hot tensors were passed to
F.one_hot
(4970) - Fixed
bool
arugments inargparse
inbenchmark/
(#4967) - Fixed
BasicGNN
fornum_layers=1
, which now respects a desired number ofout_channels
(#4943) len(batch)
will now return the number of graphs inside the batch, not the number of attributes (#4931)- Fixed
data.subgraph
generation for 0-dim tensors (#4932) - Removed unnecssary inclusion of self-loops when sampling negative edges (#4880)
- Fixed
InMemoryDataset
inferring wronglen
for lists of tensors (#4837) - Fixed
Batch.separate
when using it for lists of tensors (#4837) - Correct docstring for SAGEConv (#4852)
- Fixed a bug in
TUDataset
wherepre_filter
was not applied wheneverpre_transform
was present (#4842) - Renamed
RandomTranslate
toRandomJitter
- the usage ofRandomTranslate
is now deprecated (#4828) - Do not allow accessing edge types in
HeteroData
with two node types when there exists multiple relations between these types (#4782) - Allow
edge_type == rev_edge_type
argument inRandomLinkSplit
(#4757) - Fixed a numerical instability in the
GeneralConv
andneighbor_sample
tests (#4754) - Fixed a bug in
HANConv
in which destination node features rather than source node features were propagated (#4753) - Fixed versions of
checkout
andsetup-python
in CI (#4751) - Fixed
protobuf
version (#4719) - Fixed the ranking protocol bug in the RGCN link prediction example (#4688)
- Math support in Markdown (#4683)
- Allow for
setter
properties inData
(#4682, #4686) - Allow for optional
edge_weight
inGCN2Conv
(#4670) - Fixed the interplay between
TUDataset
andpre_transform
that modify node features (#4669) - Make use of the
pyg_sphinx_theme
documentation template (#4664, #4667) - Refactored reading molecular positions from sdf file for qm9 datasets (4654)
- Fixed
MLP.jittable()
bug in casereturn_emb=True
(#4645, #4648) - The generated node features of
StochasticBlockModelDataset
are now ordered with respect to their labels (#4617) - Fixed typos in the documentation (#4616, #4824, #4895, #5161)
- The
bias
argument inTAGConv
is now actually applied (#4597) - Fixed subclass behaviour of
process
anddownload
inDatsaet
(#4586) - Fixed filtering of attributes for loaders in case
__cat_dim__ != 0
(#4629) - Fixed
SparseTensor
support inNeighborLoader
(#4320) - Fixed average degree handling in
PNAConv
(#4312) - Fixed a bug in
from_networkx
in case some attributes are PyTorch tensors (#4486) - Added a missing clamp in
DimeNet
(#4506, #4562) - Fixed the download link in
DBP15K
(#4428) - Fixed an autograd bug in
DimeNet
when resetting parameters (#4424) - Fixed bipartite message passing in case
flow="target_to_source"
(#4418) - Fixed a bug in which
num_nodes
was not properly updated in theFixedPoints
transform (#4394) - PyTorch Lightning >= 1.6 support (#4377)
- Fixed a bug in which
GATConv
was not jittable (#4347) - Fixed a bug in which the GraphGym config was not stored in each specific experiment directory (#4338)
- Fixed a bug in which
nn.models.GAT
did not produceout_channels
-many output channels (#4299) - Fixed mini-batching with empty lists as attributes (#4293)
- Fixed a bug in which
GCNConv
could not be combined withto_hetero
on heterogeneous graphs with one node type (#4279)
Removed
- Remove internal metrics in favor of
torchmetrics
(#4287)
Full commit list: 2.0.4...2.1.0