Skip to content

This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.

License

Notifications You must be signed in to change notification settings

jessicasaikia/rule-based

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Rule-Based Model

This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.

Introduction to Parts-of-Speech Tagging (PoS Tagging)

PoS tagging is the process that identifies and labels grammatical roles of words in texts, supporting applications like machine translation and sentiment analysis. While different languages may have their own PoS tags, I have used my own custom PoS tags for this model. The Table below defines the custom PoS tags used in this model-

Table

How does this work?

  1. The code starts by importing all the necessary libraries.
  2. Following that I added the "dictionaries" which are CSV files containing words and their respective Parts of Speech tags. I made two dictionaries - one for the English language (containing English words and their parts of speech tags) and the other for the Assamese language.
  3. After that, I simply run the code, which then asks for my input in the form of a "CSV file" or a "sentence" for parts of speech tagging.

Where should you run this code?

I used Google Colab for this Model.

  1. Simply create a new notebook (or file) on Google Colab.
  2. Paste the code.
  3. Upload your dictionaries to Google Colab.
  4. Please make sure that you update the "dictionaries" part of the code based on your CSV file names and file path.
  5. Run the code.
  6. Enter your preferred input type (CSV or Sentence)
  7. The output will be displayed and saved as a different CSV file.

You can also VScode or any other platform (this code is just a python a code)

  1. In this case, you will have to make sure you have the necessary libraries installed and dictionaries loaded correctly.
  2. Simply run the program for the output.

Additional Notes from me

In case of any help or queries, you can reach out to me in the comments or via my socials. My socials are:

Additionally, you can find the custom dictionaries that I have used in this project and the dataset in their respective repositories on my profile. Have fun coding and good luck! :D