RecLearn (Recommender Learning) which summarizes the contents of the master branch in Recommender System with TF2.0
is a recommended learning framework based on Python and TensorFlow2.x for students and beginners. Of course, if you are more comfortable with the master branch, you can clone the entire package, run some algorithms in example, and also update and modify the content of model and layer. The implemented recommendation algorithms are classified according to two application stages in the industry:
- matching recommendation stage (Top-k Recmmendation)
- ranking recommendeation stage (CTR predict model)
04/23/2022: update all matching model.
RecLearn is on PyPI, so you can use pip to install it.
pip install reclearn
dependent environment:
- python3.8+
- Tensorflow2.5-GPU+/Tensorflow2.5-CPU+
- sklearn0.23+
Clone Reclearn to local:
git clone -b reclearn [email protected]:ZiyaoGeng/RecLearn.git
In example, we have given a demo of each of the recommended models.
1. Divide the dataset.
Set the path of the raw dataset:
file_path = 'data/ml-1m/ratings.dat'
Please divide the current dataset into training dataset, validation dataset and test dataset. If you use movielens-1m
, Amazon-Beauty
, Amazon-Games
and STEAM
, you can call method data/datasets/*
of RecLearn directly:
train_path, val_path, test_path, meta_path = ml.split_seq_data(file_path=file_path)
meta_path
indicates the path of the metafile, which stores the maximum number of user and item indexes.
2. Load the dataset.
Complete the loading of training dataset, validation dataset and test dataset, and generate several negative samples (random sampling) for each positive sample. The format of data is dictionary:
data = {'pos_item':, 'neg_item': , ['user': , 'click_seq': ,...]}
If you're building a sequential recommendation model, you need to introduce click sequences. Reclearn provides methods for loading the data for the above four datasets:
# general recommendation model
train_data = ml.load_data(train_path, neg_num, max_item_num)
# sequence recommendation model, and use the user feature.
train_data = ml.load_seq_data(train_path, "train", seq_len, neg_num, max_item_num, contain_user=True)
3. Set hyper-parameters.
The model needs to specify the required hyperparameters. Now, we take BPR
model as an example:
model_params = {
'user_num': max_user_num + 1,
'item_num': max_item_num + 1,
'embed_dim': FLAGS.embed_dim,
'use_l2norm': FLAGS.use_l2norm,
'embed_reg': FLAGS.embed_reg
}
4. Build and compile the model.
Select or build the model you need and compile it. Take 'BPR' as an example:
model = BPR(**model_params)
model.compile(optimizer=Adam(learning_rate=FLAGS.learning_rate))
If you have problems with the structure of the model, you can call the summary method after compilation to print it out:
model.summary()
5. Learn the model and predict test dataset.
for epoch in range(1, epochs + 1):
t1 = time()
model.fit(
x=train_data,
epochs=1,
validation_data=val_data,
batch_size=batch_size
)
t2 = time()
eval_dict = eval_pos_neg(model, test_data, ['hr', 'mrr', 'ndcg'], k, batch_size)
print('Iteration %d Fit [%.1f s], Evaluate [%.1f s]: HR = %.4f, MRR = %.4f, NDCG = %.4f'
% (epoch, t2 - t1, time() - t2, eval_dict['hr'], eval_dict['mrr'], eval_dict['ndcg']))
Waiting......
The experimental environment designed by Reclearn is different from that of some papers, so there may be some deviation in the results. Please refer to Experiement for details.
Model | ml-1m | Beauty | STEAM | ||||||
---|---|---|---|---|---|---|---|---|---|
HR@10 | MRR@10 | NDCG@10 | HR@10 | MRR@10 | NDCG@10 | HR@10 | MRR@10 | NDCG@10 | |
BPR | 0.5768 | 0.2392 | 0.3016 | 0.3708 | 0.2108 | 0.2485 | 0.7728 | 0.4220 | 0.5054 |
NCF | 0.5834 | 0.2219 | 0.3060 | 0.5448 | 0.2831 | 0.3451 | 0.7768 | 0.4273 | 0.5103 |
DSSM | 0.5498 | 0.2148 | 0.2929 | - | - | - | - | - | - |
YoutubeDNN | 0.6737 | 0.3414 | 0.4201 | - | - | - | - | - | - |
MIND(Error) | 0.6366 | 0.2597 | 0.3483 | - | - | - | - | - | - |
GRU4Rec | 0.7969 | 0.4698 | 0.5483 | 0.5211 | 0.2724 | 0.3312 | 0.8501 | 0.5486 | 0.6209 |
Caser | 0.7916 | 0.4450 | 0.5280 | 0.5487 | 0.2884 | 0.3501 | 0.8275 | 0.5064 | 0.5832 |
SASRec | 0.8103 | 0.4812 | 0.5605 | 0.5230 | 0.2781 | 0.3355 | 0.8606 | 0.5669 | 0.6374 |
AttRec | 0.7873 | 0.4578 | 0.5363 | 0.4995 | 0.2695 | 0.3229 | - | - | - |
FISSA | 0.8106 | 0.4953 | 0.5713 | 0.5431 | 0.2851 | 0.3462 | 0.8635 | 0.5682 | 0.6391 |
Model | 500w(Criteo) | Criteo | ||
---|---|---|---|---|
Log Loss | AUC | Log Loss | AUC | |
FM | 0.4765 | 0.7783 | 0.4762 | 0.7875 |
FFM | - | - | - | - |
WDL | 0.4684 | 0.7822 | 0.4692 | 0.7930 |
Deep Crossing | 0.4670 | 0.7826 | 0.4693 | 0.7935 |
PNN | - | 0.7847 | - | - |
DCN | - | 0.7823 | 0.4691 | 0.7929 |
NFM | 0.4773 | 0.7762 | 0.4723 | 0.7889 |
AFM | 0.4819 | 0.7808 | 0.4692 | 0.7871 |
DeepFM | - | 0.7828 | 0.4650 | 0.8007 |
xDeepFM | 0.4690 | 0.7839 | 0.4696 | 0.7919 |
Paper|Model | Published | Author |
---|---|---|
BPR: Bayesian Personalized Ranking from Implicit Feedback|MF-BPR | UAI, 2009 | Steffen Rendle |
Neural network-based Collaborative Filtering|NCF | WWW, 2017 | Xiangnan He |
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data|DSSM | CIKM, 2013 | Po-Sen Huang |
Deep Neural Networks for YouTube Recommendations| YoutubeDNN | RecSys, 2016 | Paul Covington |
Session-based Recommendations with Recurrent Neural Networks|GUR4Rec | ICLR, 2016 | Balázs Hidasi |
Self-Attentive Sequential Recommendation|SASRec | ICDM, 2018 | UCSD |
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding|Caser | WSDM, 2018 | Jiaxi Tang |
Next Item Recommendation with Self-Attentive Metric Learning|AttRec | AAAAI, 2019 | Shuai Zhang |
FISSA: Fusing Item Similarity Models with Self-Attention Networks for Sequential Recommendation|FISSA | RecSys, 2020 | Jing Lin |
Paper|Model | Published | Author |
---|---|---|
Factorization Machines|FM | ICDM, 2010 | Steffen Rendle |
Field-aware Factorization Machines for CTR Prediction|FFM | RecSys, 2016 | Criteo Research |
Wide & Deep Learning for Recommender Systems|WDL | DLRS, 2016 | Google Inc. |
Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features|Deep Crossing | KDD, 2016 | Microsoft Research |
Product-based Neural Networks for User Response Prediction|PNN | ICDM, 2016 | Shanghai Jiao Tong University |
Deep & Cross Network for Ad Click Predictions|DCN | ADKDD, 2017 | Stanford University|Google Inc. |
Neural Factorization Machines for Sparse Predictive Analytics|NFM | SIGIR, 2017 | Xiangnan He |
Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks|AFM | IJCAI, 2017 | Zhejiang University|National University of Singapore |
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction|DeepFM | IJCAI, 2017 | Harbin Institute of Technology|Noah’s Ark Research Lab, Huawei |
xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems|xDeepFM | KDD, 2018 | University of Science and Technology of China |
Deep Interest Network for Click-Through Rate Prediction|DIN | KDD, 2018 | Alibaba Group |
- If you have any suggestions or questions about the project, you can leave a comment on
Issue
. - wechat: