Welcome to the Lexicon! This repository contains a comprehensive collection of Jupyter notebooks and datasets focused on various Natural Language Processing (NLP) tasks.
1_Regex_for_information_extraction.ipynb
- Regular expressions for information extraction.2_Spacy_vs_Nltk.ipynb
- Comparison between Spacy and NLTK for tokenization.3_Spacy_Tokenize.ipynb
- Tokenization techniques using Spacy.4_Spacy_Pipelines.ipynb
- Pipelines in Spacy: Stemming and Lemmatization.5_Stemming_Lemmatization.ipynb
- Stemming and lemmatization methods.5_Stemming_Lemmatization_2.ipynb
- Continuation of stemming, lemmatization, and POS tagging.6_Parts_of_Speech_2.ipynb
- POS tagging, Bag of Words, and NER with Spacy.6_Parts_of_Speech_in_Spacy.ipynb
- Detailed POS tagging with Spacy.
7_NER.ipynb
- Named entity recognition with Spacy.7_NER_2.ipynb
- Additional NER tasks and implementations.
8_Bag_of_Words_2_SentimentAnalysis.ipynb
- Sentiment analysis using Bag of Words.8_Bag_of_Words_SpamClassifier.ipynb
- Spam classification with Bag of Words.9_Stop_Words.ipynb
- Handling stop words in text preprocessing.9_Stop_Words_2.ipynb
- Further exploration of stop words, Bag of Words, and N-grams.10_Bag_of_N_Grams_2_Fake_News_Prediction.ipynb
- Fake news prediction using N-grams.10_Bag_of_N_Grams_News_Classification.ipynb
- News classification with N-grams.
11_TF_IDF_2_EmotionDetection.ipynb
- Emotion detection using TF-IDF.11_TF_IDF_TextClassification_Ecommerce_Goods.ipynb
- E-commerce goods classification using TF-IDF.
12_Overview_Spacy_Word_Vectors.ipynb
- Overview of word vectors using Spacy and Gensim.13_Spacy_Word_Embeddings_News_Category_Classification.ipynb
- News category classification using Spacy word embeddings.14_Nlp_Word_Vectors_Gensim_Overview.ipynb
- Overview of word vectors using Gensim.15_Gensim_w2v_Google_Fake_News_Detection.ipynb
- Fake news detection with Gensim.
16_Fasttext_Indian_Food_Receipe_Classification.ipynb
- Classification of Indian food recipes using FastText.17_Fasttext_Ecommerce_Classification.ipynb
- E-commerce classification using FastText.
cosine_similarity.ipynb
- Computing cosine similarity between text vectors.
Cleaned_Indian_Food_Dataset.csv
- Dataset for Indian food recipes classification.Fake_Real_Data.csv
- Dataset containing fake and real news.news_story.txt
- Text file with a sample news story.spam.csv
- Spam dataset for classification tasks.students.txt
- Additional text file for experimentation.