This repository contains code for performing data valuation on graph-structured data using the Precedence-Constrained Winter (PC-Winter) value method. The PC-Winter algorithm is designed to identify influential graph elements in graph datasets by considering the hierarchical structure of node contributions.
The image below illustrates our PC-Winter value estimation for graph data valuation.
-
Input Graphs (Inductive Setting): This includes three mutually exclusive graph structures:
- Training Graph: Contains the elements to be evaluated. In this graph:
- Nodes labeled as
$$l$$ ($$l_0$$ ,$$l_1$$ ) represent labeled nodes - Nodes labeled as
$$w$$ ($$w_0$$ -$$w_3$$ ) represent 1-distance neighbors of labeled nodes - Nodes labeled as
$$u$$ ($$u_0$$ -$$u_6$$ ) represent 2-distance neighbors of labeled nodes
- Nodes labeled as
- Validation Graph: Used for calculating accuracy as a utility function
- Testing Graph: Reserved for downstream tasks such as node dropping
- Training Graph: Contains the elements to be evaluated. In this graph:
-
PC-Winter Value Estimation Steps:
- Perform a Preorder Traversal with hierarchical truncations on the Computational Tree
$$\mathcal{T}$$ - Retrieve a permissive Permutation
$$\pi$$ - Calculate the marginal contribution of each player based on validation accuracy difference
- Estimate PC-Winter Value by averaging marginal contributions of each player over multiple permutations
- Perform a Preorder Traversal with hierarchical truncations on the Computational Tree
-
pc_winter_run.py
: Main script to run PC-Winter and calculate PC values for graph elements. This script implements an online version of the framework shown in the image, including local propagation, preorder traversal of the contribution tree, and hierarchical truncation. -
node_drop_large_cora.py
: Script to aggregate node values calculated by PC-Winter and evaluate performance by sequentially dropping high-value nodes. This script demonstrates the effectiveness of the calculated PC values. -
plot.ipynb
: Jupyter notebook to visualize the drop in model performance as high-value nodes are removed. This notebook helps in analyzing the results of the node dropping experiments. -
value/
: Directory to store the sampled PC values output bypc_winter_run.py
. -
res/
: Directory to store the model performance results from the node dropping experiments conducted bynode_drop_large_cora.py
.
To calculate PC values for graph elements, use the following command:
python pc_winter_run.py --dataset <dataset> --seed <seed> --num_perm <num_permutations> --group_trunc_ratio_hop_1 <ratio1> --group_trunc_ratio_hop_2 <ratio2>
Parameters:
<dataset>
: Name of the dataset (e.g., 'Cora', 'Citeseer')<seed>
: Random seed for reproducibility<num_permutations>
: Number of permutations for PC-Winter algorithm<ratio1>
: Truncation ratio for 1-hop neighbors<ratio2>
: Truncation ratio for 2-hop neighbors
This will sample PC values and store them in the value/
directory. The output files will be named according to the input parameters.
To evaluate the effectiveness of the PC values, run:
python node_drop_large_cora.py
This script performs the following steps:
- Loads and aggregates the PC values calculated by
pc_winter_run.py
- Ranks nodes based on their aggregated PC values
- Sequentially drops high-value nodes from the graph
- Retrains a Simple Graph Convolution (SGC) model after each node removal
- Saves the model performance results in the
res/
directory
The output files in res/
will contain the test and validation accuracies at each step of the node dropping process.
Use the plot.ipynb
notebook to visualize the results of the node dropping experiment:
- Open the notebook in Jupyter:
jupyter notebook plot.ipynb
- Run the cells to load the performance results from the
res/
directory - The notebook will generate plots showing the drop in model accuracy as high-value nodes are removed
A steeper drop in the accuracy curve indicates that the PC values more effectively identify influential nodes in the graph. This visualization helps in assessing the quality of the data valuation performed by the PC-Winter method.