Deep Learning for Multi-Label Text Classification

This project is my research group project, and it is also a study of TensorFlow, Deep Learning(Fasttext, CNN, LSTM, RCNN, etc.).

The main objective of the project is to solve the multi-label text classification problem based on Convolutional Neural Networks. Thus, the format of the data label is like [0, 1, 0, ..., 1, 1] according to the characteristics of such problem.

Requirements

Python 3.6
Tensorflow 1.8 +
Numpy
Gensim

Data

Research data may attract copyright protection under China law. Thus, there is only code.

实验数据属于实验室与某公司的合作项目，涉及商业机密，在此不予提供，还望谅解。

Innovation

Data part

Make the data support Chinese and English.(Which use jieba seems easy)
Can use your own pre-trained word vectors.(Which use gensim seems easy)
Add embedding visualization based on the tensorboard.

Model part

Add the correct L2 loss calculation operation.
Add gradients clip operation to prevent gradient explosion.
Add learning rate decay with exponential decay.
Add a new Highway Layer.(Which is useful based on the performance)
Add Batch Normalization Layer.

Code part

Can choose to train the model directly or restore the model from checkpoint in train.py.
Can predict the labels via threshold and topK in train.py and test.py.
Add test.py, the model test code, it can show the predict value of each labels of the data in Testset when create the final prediction file.
Add other useful data preprocess functions in data_helpers.py.
Use logging for helping recording the whole info(including parameters display, model training info, etc.).

Data Preprocessing

Depends on what your data and task are.

Text Segment

You can use jieba package if you are going to deal with the chinese text data.

Pre-trained Word Vectors

Use gensim package to pre-train data.
Use glove tools to pre-train data.
Even can use a fasttext network to pre-train data.

Network Structure

FastText

References:

Bag of Tricks for Efficient Text Classification

TextANN

TextCNN

References:

TextRNN

Warning: Model can use Not finished yet 🤪!

TODO

Add BN-LSTM cell unit.
Add attention.

References:

Recurrent Neural Network for Text Classification with Multi-Task Learning

TextRCNN

Warning: Not finished yet 🤪!

References:

Recurrent Convolutional Neural Networks for Text Classification

TextCRNN

TextHAN

Warning: Not finished yet 🤪!

References:

Hierarchical Attention Networks for Document Classification

TextMANN

Warning: Not finished yet 🤪!

References:

TextSANN

Warning: Model can use but not finished yet 🤪!

TODO

Add attention penalization loss.
Add visualization.

References:

A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING

About Me

黄威，Randolph

SCU SE Bachelor; USTC CS Master

Email: [email protected]

My Blog: randolph.pro

LinkedIn: randolph's linkedin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning for Multi-Label Text Classification

Requirements

Data

Innovation

Data part

Model part

Code part

Data Preprocessing

Text Segment

Pre-trained Word Vectors

Network Structure

FastText

TextANN

TextCNN

TextRNN

TODO

TextRCNN

TextCRNN

TextHAN

TextMANN

TextSANN

TODO

About Me

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
ANN		ANN
CNN		CNN
CRNN		CRNN
FastText		FastText
HAN		HAN
MANN		MANN
RCNN		RCNN
RNN		RNN
SANN		SANN
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

LeiShenVictoria/Multi-Label-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Deep Learning for Multi-Label Text Classification

Requirements

Data

Innovation

Data part

Model part

Code part

Data Preprocessing

Text Segment

Pre-trained Word Vectors

Network Structure

FastText

TextANN

TextCNN

TextRNN

TODO

TextRCNN

TextCRNN

TextHAN

TextMANN

TextSANN

TODO

About Me

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages