-
Hidden Markov Model Tagger
Use the Pomegranate library to build a hidden Markov model for part of speech tagging with a universal tagset. Hidden Markov models have been able to achieve >96% tag accuracy with larger tagsets on realistic text corpora. -
Topic Modeling
Using LDA to classify text in a document to a particular topic category. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions.This project uses a list of over one million news headlines published over 15 years. -
Spam Classifier
Using the Naive Bayes algorithm to create a model that can classify dataset SMS messages(spam or non-spam) as a binary classification problem. Also, this is a supervised learning problem, as we will be feeding a labelled dataset into the model, that it can learn from, to make future predictions. -
Attention Basics This is focus on implementing attention in isolation from a larger model and will implement attention scoring as well as calculating an attention context vector.. That's because when implementing attention in a real-world model, a lot of the focus goes into piping the data and juggling the various vectors rather than the concepts of attention themselves.
-
Deciphering Code with Character-Level RNN The dataset consists of 10,000 encrypted phrases and the plaintext version of each encrypted phrase. This project is to build a recurrent neural network and train it to decipher strings encrypted with a certain cipher. This project uses the techniques of preprocessing and model-building that will come in handy when start building more advanced models for machine translation, text summarization, and beyond.
-
RNN Sentiment Analysis Use Recurrent Neural Networks to solve sentiment analysis for IMDb movie reviews, and in particular LSTMs, to perform sentiment analysis in Keras. The dataset is from Keras' built-in IMDb movie reviews dataset.