diff --git a/docs/source/_figures/dist_part.png b/docs/source/_figures/dist_part.png
new file mode 100644
index 000000000000..d79cd6a9c808
Binary files /dev/null and b/docs/source/_figures/dist_part.png differ
diff --git a/docs/source/_figures/dist_sampling.png b/docs/source/_figures/dist_sampling.png
new file mode 100644
index 000000000000..01e6bc5f607a
Binary files /dev/null and b/docs/source/_figures/dist_sampling.png differ
diff --git a/docs/source/_figures/intel_kumo.png b/docs/source/_figures/intel_kumo.png
new file mode 100644
index 000000000000..750b48b7dd16
Binary files /dev/null and b/docs/source/_figures/intel_kumo.png differ
diff --git a/docs/source/_static/thumbnails/distributed_pyg.png b/docs/source/_static/thumbnails/distributed_pyg.png
new file mode 100644
index 000000000000..624b49ae28ab
Binary files /dev/null and b/docs/source/_static/thumbnails/distributed_pyg.png differ
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 0923c37dc3f8..95b20e2a2894 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -61,6 +61,8 @@
     '_static/thumbnails/explain.png',
     'tutorial/shallow_node_embeddings':
     '_static/thumbnails/shallow_node_embeddings.png',
+    'tutorial/distributed_pyg':
+    '_static/thumbnails/distributed_pyg.png',
     'tutorial/multi_gpu_vanilla':
     '_static/thumbnails/multi_gpu_vanilla.png',
     'tutorial/multi_node_multi_gpu_vanilla':
diff --git a/docs/source/tutorial/distributed_pyg.rst b/docs/source/tutorial/distributed_pyg.rst
index 0a80c8dd6f8e..7564a9550f0b 100644
--- a/docs/source/tutorial/distributed_pyg.rst
+++ b/docs/source/tutorial/distributed_pyg.rst
@@ -1,6 +1,9 @@
 Distributed Training in PyG
 ===========================
 
+.. figure:: ../_figures/intel_kumo.png
+   :width: 400px
+
 .. note::
     We are thrilled to announce the first **in-house distributed training solution** for :pyg:`PyG` via :class:`torch_geometric.distributed`, available from version 2.5 onwards.
     Developers and researchers can now take full advantage of distributed training on large-scale datasets which cannot be fully loaded in memory of one machine at the same time.
@@ -15,11 +18,11 @@ Key Advantages
 --------------
 
 #. **Balanced graph partitioning** via METIS ensures minimal communication overhead when sampling subgraphs across compute nodes.
-#. Utilizing **DDP for model training in conjunction with RPC for remote sampling and feature fetching routines** (with TCP/IP protocol and `gloo <https://github.com/facebookincubator/gloo>`_ communication backend) allows for data parallelism with distinct data partitions at each node.
+#. Utilizing **DDP for model training** in conjunction with **RPC for remote sampling and feature fetching routines** (with TCP/IP protocol and `gloo <https://github.com/facebookincubator/gloo>`_ communication backend) allows for data parallelism with distinct data partitions at each node.
 #. The implementation via custom :class:`~torch_geometric.data.GraphStore` and :class:`~torch_geometric.data.FeatureStore` APIs provides a flexible and tailored interface for distributing large graph structure information and feature storage.
-#. Distributed neighbor sampling is capable of sampling in both local and remote partitions through RPC communication channels.
-   All advanced functionality of single-node sampling are also applicable for distributed training, *e.g.*, heterogeneous sampling, link-level sampling, temporal sampling, *etc*..
-#. Distributed data loaders offer a high-level abstraction for managing sampler processes, ensuring simplicity and seamless integration with standard :pyg:`PyG`  data loaders..
+#. **Distributed neighbor sampling** is capable of sampling in both local and remote partitions through RPC communication channels.
+   All advanced functionality of single-node sampling are also applicable for distributed training, *e.g.*, heterogeneous sampling, link-level sampling, temporal sampling, *etc*.
+#. **Distributed data loaders** offer a high-level abstraction for managing sampler processes, ensuring simplicity and seamless integration with standard :pyg:`PyG`  data loaders.
 #. Incorporating the Python `asyncio <https://docs.python.org/3/library/asyncio.html>`_ library for asynchronous processing on top of :pytorch:`PyTorch`-based RPCs further enhances the system's responsiveness and overall performance.
 
 Architecture Components
@@ -57,6 +60,12 @@ This ensures that the resulting partitions provide maximal local access of neigh
 Through this partitioning approach, every edge receives a distinct assignment, while "halo nodes" (1-hop neighbors that fall into a different partition) are replicated.
 Halo nodes ensure that neighbor sampling for a single node in a single layer stays purely local.
 
+.. figure:: ../_figures/dist_part.png
+   :align: center
+   :width: 100%
+
+   Graph partitioning with halo nodes.
+
 In our distributed training example, we prepared the `partition_graph.py <https://github.com/pyg-team/pytorch_geometric/blob/master/examples/distributed/pyg/partition_graph.py>`_ script to demonstrate how to apply partitioning on a selected subset of both homogeneous and heterogeneous graphs.
 The :class:`~torch_geometric.distributed.Partitioner` can also preserve node features, edge features, and any temporal attributes at the level of nodes and edges.
 Later on, each node in the cluster then owns a single partition of this graph.
@@ -174,6 +183,12 @@ A batch of seed nodes follows three main steps before it is made available for t
 #. **Data conversion:** Based on the sampler output and the acquired node (or edge) features, a :pyg:`PyG` :class:`~torch_geometric.data.Data` or :class:`~torch_geometric.data.HeteroData` object is created.
    This object forms a batch used in subsequent computational operations of the model.
 
+.. figure:: ../_figures/dist_sampling.png
+   :align: center
+   :width: 450px
+
+   Local and remote neighbor sampling.
+
 Distributed Data Loading
 ~~~~~~~~~~~~~~~~~~~~~~~~