PAC-MAN

This repository contains the PyTorch implementation for the IEEE Access paper: PAC-MAN: Multi-Relation Network in Social Community for Personalized Hashtag Recommendation

Padungkiatwattana, U., & Maneeroj, S. (2022). PAC-MAN: Multi-Relation Network in Social Community for Personalized Hashtag Recommendation. IEEE Access, 10, 131202-131228.

🌈 Introduction

PAC-MAN is a novel integral model for personalized hashtag recommendation, which has three main contributions:

💡 First, to derive fruitful user and hashtag representation from higher-order multiple relations, we propose Multi-relational Attentive Network (MAN) by applying GNN to jointly capture relations in three communities: (1) user-hashtag interaction (e.g., post, retweet, like); (2) user-user social (e.g., follow); and (3) hashtag-hashtag co-occurrence.

💡 Second, to personalize content at the word level, Person-And-Content based BERT (PAC) extends BERT to input not only word representations from the microblog but also the fruitful user representation from MAN, allowing each word to be fused with user aspects.

💡 Third, to capture sequenceless hashtag correlations, the fruitful hashtag representations from MAN that contain the hashtag’s community perspectives are inserted into BERT to integrate with the hashtag’s word-semantic perspectives, and a hashtag prediction task is then conducted under the mask concept for the recommendation.

📖 Dependencies

The script has been tested under the following dependencies:

torch==2.0.0
transformers==4.27.4
tensorboard==2.12.1
numpy==1.24.2
scipy==1.10.1
omegaconf==2.3.0
tqdm==4.65.0

Install all dependencies:

pip install -r requirements.txt

Repository

mangnn/ contains code for Multi-relational Attentive Network (MANGNN).
pacbert/ contains code for Person-And-Content based BERT (PACBERT).

⚙️ Configuration

Manage configuration for the model at:

MANGNN: mangnn/config/config.yaml.
PACBERT: pacbert/config/config.yaml.

📈 Dataset

MANGNN:
Prepare datasets and organize them as follows:

data
└─ twitter
    └─ twitter_train.npy
    └─ twitter_val.npy
    └─ twitter_test.npy
    └─ networks
        └─ networks.json
        └─ follow.npy
        └─ post.npy
        └─ like.npy
        └─ retweet.npy
        └─ cooccur.npy

Here are the details of the dataset:

twitter_train.npy contains numpy array of dataset. Here is an example of data structure:

# Format: [{user_id}, {tag_id}, {label}]
# Label: '1' means the user uses the hashtag, and '0' means otherwise.
array([[0, 1, 1], ..., [0, 1, 0]])

networks.json contains a list of structures for each network. Here is an example of data structure:

[
    {
        "name": "post",
        "src_type": "tag",
        "tgt_type": "user",
        "adj": "mangnn/data/twitter/networks/post.npy",
        "agg_src": true,
        "agg_tgt": true
    }
]

post.npy contains a numpy array of indices, values, and size to create a sparse tensor for an adjacency matrix that represents connections in the network. Here is an example of data structure:

# Format: [{indices}, {values}, {size}]
array([array([[0, 1], [0, 3], [1, 0], [1, 3]]), # indices
       array([1, 1, 1, 1]),                     # values
       array([5, 5])])                          # size

PACBERT:
Prepare datasets and organize them as follows:

data
└─ twitter
    └─ twitter_train.json
    └─ twitter_val.json
    └─ twitter_test.json
    └─ tag.txt

Here is an example of data structure:

[
    {
        "user": 0,
        "text": "the way to get started is to quit talking and begin doing.",
        "tag": ["life", "inspire", "goal"]
    }
]

🚀 Model Training

✨ Non-Distributed Training

You can train the model by using run.sh.

MANGNN:
```
mangnn/scripts/run.sh
```
PACBERT:
```
pacbert/scripts/run.sh
```

You can also parse arguments to the script:

{MODEL_NAME}/scripts/run.sh [$CONFIG]

where:

$CONFIG - Configuration path.

✨ Distributed Training on Single Node

You can perform distributed training on a single node by using run_single.sh.

MANGNN:
```
mangnn/scripts/run_single.sh
```
PACBERT:
```
pacbert/scripts/run_single.sh
```

You can also parse arguments to the script:

{MODEL_NAME}/scripts/run_single.sh [$NUM_TRAINERS] [$CONFIG]

where:

$NUM_TRAINERS - Number of GPUs/CPUs.
$CONFIG - Configuration path.

✨ Distributed Training on Multiple Nodes

You can perform distributed training on multiple nodes by using run_multi.sh.

MANGNN:
```
mangnn/scripts/run_multi.sh
```
PACBERT:
```
pacbert/scripts/run_multi.sh
```

You can also parse arguments to the script:

{MODEL_NAME}/scripts/run_multi.sh [$NUM_NODES] [$NUM_TRAINERS] [$NODE_RANK] [$MASTER_ADDR] [$MASTER_PORT] [$CONFIG]

where:

$NUM_NODES - Number of machines.
$NUM_TRAINERS - Number of GPUs/CPUs.
$NODE_RANK - Global rank.
$MASTER_ADDR - Master address.
$MASTER_PORT - Master port.
$CONFIG - Configuration path.

For example, running on multiple GPUs across 2 nodes.

On master node with 2 GPUs:

pacbert/scripts/run_multi.sh 2 2 0 123.456.789 1234

On worker node with 4 GPUs:

pacbert/scripts/run_multi.sh 2 4 1 123.456.789 1234

📌 Checkpoint and Logging

After training, the following files are created in folder {MODEL_NAME}/outputs/:

ckpt.pt - Model checkpoint.
logs/ - Tensorboard logs.
result.json - Training results containing train_loss, val_loss, and metrics.

⭐ Citation

If you find our work useful for your research, please cite the following paper:

@ARTICLE{9984162,
  author={Padungkiatwattana, Umaporn and Maneeroj, Saranya},
  journal={IEEE Access}, 
  title={PAC-MAN: Multi-Relation Network in Social Community for Personalized Hashtag Recommendation}, 
  year={2022},
  volume={10},
  number={},
  pages={131202-131228},
  doi={10.1109/ACCESS.2022.3229082}}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
mangnn		mangnn
pacbert		pacbert
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAC-MAN

Contents

🌈 Introduction

📖 Dependencies

Repository

⚙️ Configuration

📈 Dataset

🚀 Model Training

✨ Non-Distributed Training

✨ Distributed Training on Single Node

✨ Distributed Training on Multiple Nodes

📌 Checkpoint and Logging

⭐ Citation

About

Releases

Packages

Contributors 2

Languages

License

umapornp/PAC-MAN

Folders and files

Latest commit

History

Repository files navigation

PAC-MAN

Contents

🌈 Introduction

📖 Dependencies

Repository

⚙️ Configuration

📈 Dataset

🚀 Model Training

✨ Non-Distributed Training

✨ Distributed Training on Single Node

✨ Distributed Training on Multiple Nodes

📌 Checkpoint and Logging

⭐ Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages