Skip to content

A class project for CS585: Introduction to Natural Language Processing. A Structured Perceptron Linear Classifier and Viterbi Decoder for Part of Speech Tagging.

License

Notifications You must be signed in to change notification settings

morsecodist/CS585_Structured_Perceptron_Viterbi_POS_tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Structured Perceptron and Viterbi Based Part of Speech Tagger

Assignment Description (Viterbi, Perceptron)

Description

Trains a Structured Perceptron Linear Classifier to tag parts of speech using the Viterbi algorithm for decoding. The assignment code has been cleaned up and streamlined to facilitate reading and usage. This means the complete solution to the assignment is not here, just what I deemed the most relevant part for sharing.

Instructor Implementations

baseline.py

All

vit.py

  • dict_argmax
  • goodness_score
  • exhaustive
  • randomized_test

structperc.py

  • dict_subtract
  • dict_argmax
  • dict_dotprod
  • read_tagging_file
  • do_evaluation
  • fancy_eval
  • show_predictions
  • greedy_decode

Modifications to Instructor Implementations

structperc.py

  • local_emission_features: Added suffix features
  • train: Implemented inner loop, core of the training algorithm. Instructor code just a skeleton.

Implementations I provided

vit.py

  • viterbi

structperc.py

  • get_averaged_weights
  • predict_seq
  • features_for_seq-
  • calc_factor_scores-

Demo

To train a tagger with 10 iterations of structured perceptron, using viterbi:

python structperc.py

baseline.py checks the accuracy of assuming every word has the same tag. To check this baseline:

python baseline.py

Usage

# Import
from structperc import train

# Reads tagging files in the format of oct27.train and oct27.dev
import read_tagging_file

# Train with averaging on the oct27.train data, evaluating with oct27.dev data
train(read_tagging_file('oct27.train'), do_averaging=True, devdata=read_tagging_file('oct27.dev'))

About

A class project for CS585: Introduction to Natural Language Processing. A Structured Perceptron Linear Classifier and Viterbi Decoder for Part of Speech Tagging.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages