A curated list of research papers and resources on code-switching
-
Updated
Dec 18, 2024
A curated list of research papers and resources on code-switching
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
The main aim of the project is to develop a sentiment analyzer that can be used on twitter data to classify it as positive or negative. Our project takes care of the challenge of bilingual comments, where people tweet in two languages, in this case Hindi and English, in the Latin Alphabet.
CalBERT - Code-mixed Adaptive Language representations using BERT, published at AAAI-MAKE 2022
Unsupervised Sentiment Analysis for Code-mixed Data
Implementation for the paper titled, " Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross Linguistic Contextual Understanding", IEEE Access, 2023
This repository contains the system description and the codes that we implemented for participating in EACL-2021 shared tasks.
This repository implements a Hidden Markov Model (HMM) for performing Parts of Speech (POS) Tagging on Assamese-English code-mixed texts.
302-Person-Hindi-and-English-Bilingual-Spontaneous-Monologue-smartphone-speech-dataset
This repository implements a Multilingual BERT (mBERT) model for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
The code for SOP project done for the topic of Abuse detection in multilingual code-switched and code-mixed language using federated learning
This repository implements a Bidirectional Long Short Term Memory (BiLSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
This repository implements a Conditional Random Field (CRF) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
Tweet ids for code-mixed Russian-German and Russian-Hebrew tweets
A code-mixed annotation tool aimed at increasing the annotation quality whilst reducing the annotation time and various overheads associated with code-mixed data.
This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.
This repository implements a Long Short Term Memory (LSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
A Python-based machine translation system for English-to-Hinglish, leveraging models like mBART and T5-small to address the complexities of code-mixed languages, achieving state-of-the-art performance with a BLEU score of 43.23.
Add a description, image, and links to the code-mixed topic page so that developers can more easily learn about it.
To associate your repository with the code-mixed topic, visit your repo's landing page and select "manage topics."