DbPedia-contents-classification

Classifying Contents in DbPedia dataset using a VGG-based network

This repo is related to the classification of DBpedia sentences. DbPedia dataset is a large free access dataset of ontology contents which comprise up more than 630k in 14 categories which namely are: Company, EducationalInstitution, Artist, Athlete, OfficeHolder, MeanOfTransportation, Building, NaturalPlace, Village, Animal, Plant, Album, Film, WrittenWork.

This dataset has been already split into training and test sets. To classify DbPedia contexts, a new VGG-based CNN model has been designed. Since this model is a CNN, we considered each sentence as 1014 characters and converted each sentence to a 1014 ⨉ 16 matrix, leveraging an embedding layer in Tensorflow library. This model managed to classify the test set samples with the accuracy of 94.25 %. This model contains one embedding layer, 9 convolution layers, and three fully-connected layers. The below fig. shows the model architecture:

Requirements

Python > 3

Tensorflow == 1.15

NLTK

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dbpedia_csv		dbpedia_csv
LICENSE		LICENSE
README.md		README.md
data_preparing.py		data_preparing.py
evaluation.py		evaluation.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DbPedia-contents-classification

Requirements

About

Releases

Packages

Languages

License

sajadtavakoli/DbPedia-contents-classification

Folders and files

Latest commit

History

Repository files navigation

DbPedia-contents-classification

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages