SemEval2023_QUST

He is an Assistant Professor of the Qingdao University of Science and Technology (QUST).

The system created with this was the 2nd best system in Italian and Spanish on subtask 1, see the public leaderboard

Preparation / Requirements

Utils folder contains data preprocessing scripts for each subtask separately. e.g., train_data_task1.pyis used for the training and dev data in the subtask-1. (Note that the training and dev data are merged as we implement 10-fold cross validation.)
train_pred folder contains train and predict scripts for each subtask separately. e.g., t1_kfold.py will train the preprocessed data from above step through a 10-fold cross validation setup. We also applies early stopping and only save the best model checkpoint from the 10-fold.
after training, the prediction scripts is combining the top 3 best checkpoints to make a average ensemble for the test data. e.g., 't1_pred.py' will load the top 3 best checkpoint (the selection of the top 3 checkpoints are made manually by checking the training log once the training phase is done), and generate the prediction .txt file for each language in each subtask.