Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

Natural Language Processing (NLP) Nanodegree

Projects

Part of Speech Tagging
- Purpose: Tag verb, noun, etc. in sentences.
- Library: Pomegranate.
- Algorithm: HMM and Supervised Learning.
- Main Program
Machine Translation
- Purpose: Translate English texts to French texts.
- Framework: Keras.
- Algorithm: Recursive Encoder-Decoder RNN.
- Dataset: Subset of WMT
- Report
- Main Program
DNN Speech Recognizer
- Purpose: Implement UVI (User-Voice-Interface).
- Framework: Keras
- Algorithm: 2-Dimensional CNN + RNN + Dense Layer.
- Dataset: LibriSpeech
- Report
- Main Program

Labs

Part 1: Introduction to Natural Language
- Text Processing
  - Purpose: Tokenize articles
  - Libraries: Pandas and NLTK
  - Key APIs:
    - Tokenize: nltk.tokenize.word_tokenize(text)
    - Stopwords: nltk.corpus.stopwords.words('english')
    - Stem/Lemmatize:
      - Stem: nltk.stem.PorterStemmer().stem(word)
      - Lemmatize: nltk.stem.WordNetLemmatizer().lemmatize(word, pos='v')
- Spam Classifier
  - Purpose: Classify spam email.
  - Libraries: Pandas and Scikit-Learn.
  - Algorithm: Apply naive Bayes to BOW (Bag of Words).
  - Key Concept:
    - Bag Of Words: It is a statictis of corpus and ingnores the order of words. For example, "chicago bulls" might be treated as a city and an animal, rather than the basketball team.
  - Key APIs:
    - Pre-process + Vectorize + BOW: sklearn.feature_extraction.text.CountVectorizer().fit_transform(text)
    - Split train/test set: sklearn.cross_validation.train_test_split()
    - Naive Bayes: sklearn.naive_bayes.MultinomialNB().fit()
    - F1 score, recall score, ...:
      - sklearn.metrics.f1_score()
      - sklearn.metrics.accuracy_score()
      - sklearn.metrics.precision_score()
      - sklearn.metrics.recall_score()
- IBM Bookworm
  - Purpose: A simple question-answering system built using IBM Watson's NLP services.
Part 2: Computing with Natural Language
- Topic Modeling
  - Purpose: Classify text to a particular topic
  - Libraries: Gensim and Pandas.
  - Algorithm: LDA (Latent Dirichlet Allocation) using TF-IDF (Trem Frequency-Inverse Document Frequency).
  - Key concept:
    - TF-IDF: Consider a document containing 100 words wherein the word 'tiger' appears 3 times.
      - TF:
        
        The term frequency (i.e., tf) for 'tiger' is then: TF = (3 / 100) = 0.03.
      - IDF:
        
        Now, assume we have 10 million documents and the word 'tiger' appears in 1000 of these. Then, the inverse document frequency (i.e., idf) is calculated as: IDF = log(10,000,000 / 1,000) = 4.
      - TF-IDF:
        
        Thus, the Tf-idf weight is the product of these quantities: TF-IDF = 0.03 * 4 = 0.12.
  - Key APIs:
    - Normalize and Tokenize: gensim.utils.simple_preprocess(text)
    - Stopswords: gensim.parsing.preprocessing.STOPWORDS
    - Lemmatize/Stem:
      - Lemmatize: nltk.stem.WordNetLemmatizer().lemmatize(word)
      - Stem: nltk.stem.SnowballStemmer().stem(word)
    - Create Dictionary: gensim.corpora.Dictionary(docs)
    - Filter rare/common words: gensim.corpora.Dictionary(docs).filter_extrems()
    - BOW/TF-IDF:
      - BOW: bow_corpus = gensim.corpora.Dictionary(docs).doc2bow(text)
      - TF-IDF: tfidf_corpus = gensim.models.TfidfModel(bow_corpus)
    - LDA:
      - gensim.models.LdaMulticore(bow_corpus, num_topics)
      - gensim.models.LdaMulticore(tfidf_corpus, num_topics)
- Sentiment Analysis
  - Purpose: Predict positive or negative sentiment upon a comment.
  - Libraries: Sklearn.
  - Algorithm: Naive Bayes and Gradient-Boosted Decision Tree classifier.
- Attention Basic
  - Purpose: Implement basic block in Attention algorithm.
  - Algorithm: Attention
- RNN Keras Lab
  - Purpose: Decipher strings encrypted with a certain cipher.
  - Framework: Keras.
  - Algorithm: Char-level RNN using GRU.
  - Key APIs:
    - Char-level Tokenize: keras.preprocessing.text.Tokenizer(char_level=True).fit_on_texts(text).texts_to_sequences(text)
    - Padding: keras.preprocessing.sequence.pad_sequences(tokens, maxlen, padding='post')
    - Keras:
      - keras.models.Model
      - keras.layers
        
        keras.layers.Input
        
        keras.layers.GRU
        
        keras.layers.Dense
        
        keras.layers.TimeDistributed
        
        keras.layers.Activation
      - keras.optimizer.Adam
      - keras.losses.sparse_categorical_crossentropy
Part 3: Communicating with Natural Language
- Voice Data
  - Purpose: Explore the LibriSpeech data set and format

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP_nanodegree

NLP_nanodegree

README.md

Natural Language Processing (NLP) Nanodegree

Projects

Labs

Files

NLP_nanodegree

Directory actions

More options

Directory actions

More options

Latest commit

History

NLP_nanodegree

Folders and files

parent directory

README.md

Natural Language Processing (NLP) Nanodegree

Projects

Labs