diff --git a/docs/source/_figures/dist_part.png b/docs/source/_figures/dist_part.png new file mode 100644 index 000000000000..d79cd6a9c808 Binary files /dev/null and b/docs/source/_figures/dist_part.png differ diff --git a/docs/source/_figures/dist_sampling.png b/docs/source/_figures/dist_sampling.png new file mode 100644 index 000000000000..01e6bc5f607a Binary files /dev/null and b/docs/source/_figures/dist_sampling.png differ diff --git a/docs/source/_figures/intel_kumo.png b/docs/source/_figures/intel_kumo.png new file mode 100644 index 000000000000..750b48b7dd16 Binary files /dev/null and b/docs/source/_figures/intel_kumo.png differ diff --git a/docs/source/_static/thumbnails/distributed_pyg.png b/docs/source/_static/thumbnails/distributed_pyg.png new file mode 100644 index 000000000000..624b49ae28ab Binary files /dev/null and b/docs/source/_static/thumbnails/distributed_pyg.png differ diff --git a/docs/source/conf.py b/docs/source/conf.py index 0923c37dc3f8..95b20e2a2894 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -61,6 +61,8 @@ '_static/thumbnails/explain.png', 'tutorial/shallow_node_embeddings': '_static/thumbnails/shallow_node_embeddings.png', + 'tutorial/distributed_pyg': + '_static/thumbnails/distributed_pyg.png', 'tutorial/multi_gpu_vanilla': '_static/thumbnails/multi_gpu_vanilla.png', 'tutorial/multi_node_multi_gpu_vanilla': diff --git a/docs/source/tutorial/distributed_pyg.rst b/docs/source/tutorial/distributed_pyg.rst index 0a80c8dd6f8e..7564a9550f0b 100644 --- a/docs/source/tutorial/distributed_pyg.rst +++ b/docs/source/tutorial/distributed_pyg.rst @@ -1,6 +1,9 @@ Distributed Training in PyG =========================== +.. figure:: ../_figures/intel_kumo.png + :width: 400px + .. note:: We are thrilled to announce the first **in-house distributed training solution** for :pyg:`PyG` via :class:`torch_geometric.distributed`, available from version 2.5 onwards. Developers and researchers can now take full advantage of distributed training on large-scale datasets which cannot be fully loaded in memory of one machine at the same time. @@ -15,11 +18,11 @@ Key Advantages -------------- #. **Balanced graph partitioning** via METIS ensures minimal communication overhead when sampling subgraphs across compute nodes. -#. Utilizing **DDP for model training in conjunction with RPC for remote sampling and feature fetching routines** (with TCP/IP protocol and `gloo `_ communication backend) allows for data parallelism with distinct data partitions at each node. +#. Utilizing **DDP for model training** in conjunction with **RPC for remote sampling and feature fetching routines** (with TCP/IP protocol and `gloo `_ communication backend) allows for data parallelism with distinct data partitions at each node. #. The implementation via custom :class:`~torch_geometric.data.GraphStore` and :class:`~torch_geometric.data.FeatureStore` APIs provides a flexible and tailored interface for distributing large graph structure information and feature storage. -#. Distributed neighbor sampling is capable of sampling in both local and remote partitions through RPC communication channels. - All advanced functionality of single-node sampling are also applicable for distributed training, *e.g.*, heterogeneous sampling, link-level sampling, temporal sampling, *etc*.. -#. Distributed data loaders offer a high-level abstraction for managing sampler processes, ensuring simplicity and seamless integration with standard :pyg:`PyG` data loaders.. +#. **Distributed neighbor sampling** is capable of sampling in both local and remote partitions through RPC communication channels. + All advanced functionality of single-node sampling are also applicable for distributed training, *e.g.*, heterogeneous sampling, link-level sampling, temporal sampling, *etc*. +#. **Distributed data loaders** offer a high-level abstraction for managing sampler processes, ensuring simplicity and seamless integration with standard :pyg:`PyG` data loaders. #. Incorporating the Python `asyncio `_ library for asynchronous processing on top of :pytorch:`PyTorch`-based RPCs further enhances the system's responsiveness and overall performance. Architecture Components @@ -57,6 +60,12 @@ This ensures that the resulting partitions provide maximal local access of neigh Through this partitioning approach, every edge receives a distinct assignment, while "halo nodes" (1-hop neighbors that fall into a different partition) are replicated. Halo nodes ensure that neighbor sampling for a single node in a single layer stays purely local. +.. figure:: ../_figures/dist_part.png + :align: center + :width: 100% + + Graph partitioning with halo nodes. + In our distributed training example, we prepared the `partition_graph.py `_ script to demonstrate how to apply partitioning on a selected subset of both homogeneous and heterogeneous graphs. The :class:`~torch_geometric.distributed.Partitioner` can also preserve node features, edge features, and any temporal attributes at the level of nodes and edges. Later on, each node in the cluster then owns a single partition of this graph. @@ -174,6 +183,12 @@ A batch of seed nodes follows three main steps before it is made available for t #. **Data conversion:** Based on the sampler output and the acquired node (or edge) features, a :pyg:`PyG` :class:`~torch_geometric.data.Data` or :class:`~torch_geometric.data.HeteroData` object is created. This object forms a batch used in subsequent computational operations of the model. +.. figure:: ../_figures/dist_sampling.png + :align: center + :width: 450px + + Local and remote neighbor sampling. + Distributed Data Loading ~~~~~~~~~~~~~~~~~~~~~~~~