The main source of our project describes our two tasks that are: Sentiment Analysis and Offensive Language Identification on Dravidian Languages (Tamil and Malayalam). These tasks are a popular task for some years. The dataset we used will contain full of code mixed text. Extracting Sentiments and finding Offensive from the sentence is a challenging task. Our first aim of this project is to clear noisy texts and removing unnecessary contents. Then we had created baseline models by training the traditional Machine Learning (ML) models, such as Support Vector Machine (SVM), Naïve Bayes Classifier, Logistic Regression, and Random Forest with feature vectors are extracted by using the TF-IDF method. Then we created Neural Network models such as Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM). Here, we had extracted features by using Word2Vec one-hot Method. The above methods are performed the same for Sentiment Analysis and Offensive Language Identification. As we will see, the Traditional machine learning algorithms are much better in performance than the Neural Network approachs with the F1 scores.
-
Notifications
You must be signed in to change notification settings - Fork 1
rahulponnusamy/Miniproject1-SA
About
Sentiment Analysis on code-mixed Tamil-English text
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published