Skip to content

Latest commit

 

History

History
120 lines (82 loc) · 7.21 KB

Code in NLP Guide.md

File metadata and controls

120 lines (82 loc) · 7.21 KB

Different Types of Word Embeddings

  1. Frequency-based Embedding - Count Vector, TF-IDF Vector
  2. Prediction-based Embedding - CBOW (Continuous Bag of words), Skip – Gram model

Word Embedding Algorithms - word2vec, GloVe

Bag of words and N gram Models

image image image

CV and KNN, Random Forest, Multinomial Naive Bayes

image image image image

Label using KNN

image

Balance the imbalanced data with SMOTE

image

image

TF-IDF

image image image image image

One hot encoding

image image image image image image

Tokenizer

image

TokenizerTransformer

image

pad_sequences

image

Skip-gram: works well with small amount of the training data, represents well even rare words or phrases. CBOW: several times faster to train than the skip-gram, slightly better accuracy for the frequent words.

image image

image image

  1. CBOW (Continuous Bag of words) image image http://www.claudiobellei.com/2018/01/07/backprop-word2vec-python/

  2. Skip – Gram model image image http://www.claudiobellei.com/2018/01/07/backprop-word2vec-python/

image image image image image image image image

GloVe(Global Vectors for Word Representation)

https://github.com/vg11072001/NLP-with-Python/blob/master/Toxic%20Comments%20LSTM%20GloVe.ipynb

fastText

image

Models

SimpleRNN

image image

LSTM

image image Uploading image.png…

Named Entity Recognition (NER)

POS Tagging

Ref - https://github.com/codebasics/nlp-tutorials https://github.com/siddiquiamir/Python-Data-Preprocessing https://medium.com/@diegoglozano/building-a-pipeline-for-nlp-b569d51db2d1 https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/ https://towardsdatascience.com/deep-learning-pipeline-for-natural-language-processing-nlp-c6f4074897bb https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data-skip-gram.html https://www.analyticsvidhya.com/blog/2021/06/practical-guide-to-word-embedding-system/?utm_source=reading_list&utm_medium=https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/