This project is my research group project, and it is also a study of TensorFlow, Deep Learning(Fasttext, CNN, LSTM, RCNN, etc.).
The main objective of the project is to solve the multi-label text classification problem based on Convolutional Neural Networks. Thus, the format of the data label is like [0, 1, 0, ..., 1, 1] according to the characteristics of such problem.
- Python 3.6
- Tensorflow 1.8 +
- Numpy
- Gensim
Research data may attract copyright protection under China law. Thus, there is only code.
实验数据属于实验室与某公司的合作项目,涉及商业机密,在此不予提供,还望谅解。
- Make the data support Chinese and English.(Which use
jieba
seems easy) - Can use your own pre-trained word vectors.(Which use
gensim
seems easy) - Add embedding visualization based on the tensorboard.
- Add the correct L2 loss calculation operation.
- Add gradients clip operation to prevent gradient explosion.
- Add learning rate decay with exponential decay.
- Add a new Highway Layer.(Which is useful based on the performance)
- Add Batch Normalization Layer.
- Can choose to train the model directly or restore the model from checkpoint in
train.py
. - Can predict the labels via threshold and topK in
train.py
andtest.py
. - Add
test.py
, the model test code, it can show the predict value of each labels of the data in Testset when create the final prediction file. - Add other useful data preprocess functions in
data_helpers.py
. - Use
logging
for helping recording the whole info(including parameters display, model training info, etc.).
Depends on what your data and task are.
You can use jieba
package if you are going to deal with the chinese text data.
- Use
gensim
package to pre-train data. - Use
glove
tools to pre-train data. - Even can use a fasttext network to pre-train data.
References:
References:
- Convolutional Neural Networks for Sentence Classification
- A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification
Warning: Model can use Not finished yet 🤪!
- Add BN-LSTM cell unit.
- Add attention.
References:
Warning: Not finished yet 🤪!
References:
Warning: Not finished yet 🤪!
References:
Warning: Not finished yet 🤪!
References:
Warning: Model can use but not finished yet 🤪!
- Add attention penalization loss.
- Add visualization.
References:
黄威,Randolph
SCU SE Bachelor; USTC CS Master
Email: [email protected]
My Blog: randolph.pro
LinkedIn: randolph's linkedin