This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.
PoS tagging is the process that identifies and labels grammatical roles of words in texts, supporting applications like machine translation and sentiment analysis. While different languages may have their own PoS tags, I have used my own custom PoS tags for this model. The Table below defines the custom PoS tags used in this model-
- The code starts by importing all the necessary libraries.
- Following that I added the "dictionaries" which are CSV files containing words and their respective Parts of Speech tags. I made two dictionaries - one for the English language (containing English words and their parts of speech tags) and the other for the Assamese language.
- After that, I simply run the code, which then asks for my input in the form of a "CSV file" or a "sentence" for parts of speech tagging.
I used Google Colab for this Model.
- Simply create a new notebook (or file) on Google Colab.
- Paste the code.
- Upload your dictionaries to Google Colab.
- Please make sure that you update the "dictionaries" part of the code based on your CSV file names and file path.
- Run the code.
- Enter your preferred input type (CSV or Sentence)
- The output will be displayed and saved as a different CSV file.
You can also VScode or any other platform (this code is just a python a code)
- In this case, you will have to make sure you have the necessary libraries installed and dictionaries loaded correctly.
- Simply run the program for the output.
In case of any help or queries, you can reach out to me in the comments or via my socials. My socials are:
- Discord: jessicasaikia
- Instagram: jessicasaikiaa
- LinkedIn: jessicasaikia (www.linkedin.com/in/jessicasaikia-787a771b2)
Additionally, you can find the custom dictionaries that I have used in this project and the dataset in their respective repositories on my profile. Have fun coding and good luck! :D