This repository contains the codes and the notebooks for NLP Workshop which was organized by ML India on October 10-11.
The first notebook- Notebook 1 contains the contents for Day-1 of the session. Quora Binary Classification was chosen as the main topic.
The contents include:
- Statistical Analysis or Words
- Word Based Frequency /Gram analysis
- Vectorization
- Statistical Models
- Dimensionality Reduction
- Embeddings
- Neural Network architectures
- LSTM/CNN based models
The second notebook- Notebook 2 contains the contents for Day-2 of the session. This mainly relies on Transformer models.
The contents include:
- Encoder Decoder Architecture
- Disadvantages of Encoder Decoders
- Transformer Architectures
- Attention Mechanism
- Bahdanau,Luong Attention
- Self and Multi Head Attention
- Designing a Keras Transformer
- Finetuning Bert
- Finetuning and Training DistilBert,Roberta,XLM-Roberta
- Finetuning GPT-2,BART and Transformer-XL
- Evaluating via TPU Clusters
This code has been released under Apache License. The resources for the notebooks is present inside Kaggle,particularly embedding files. These can be used locally by either downloading them from kaggle manually or can be used in kaggle notebooks by using the "Add Data" tab in kaggle notebooks.