Active Learning in NLP

The aim of this project is to identify if active learning can help in building better models with less good quality data in the NLP domain. The project is undertaken as part of the Full Stack Deep Learning Course, 2021. The code is not production ready yet and it is in experimental stage.

Initial Plan

The initial plan is to build an end-to-end project that contains the following components,

Active learning (using custom code or available library)
Multi-class classification model using Transformers
Experiment tracking using MLflow or wandb
Labeling using labelstudio
App using streamlit/Dash
Explainability for NLP models - stretch goal
Unit testing using pytest/unittest
CI/CD using Github Actions

Components completed

Currently, the following components are present (some of which are still in not 100% complete),

Active learning is performed using uncertainity based sampling (random, least confidence, entropy based)
Multi-class classification of news articles is done using Simpletransformers library
Experiments are tracked using wandb
CLI based annotation tool
GUI based Dash app for annotation (functionalities are not completed yet)
Pytest and coverage (doesn't cover all the code yet)
CI/CD using Github Actions (in progress)
Explainability of NLP models (will be done in the future)
Expose API using FastAPI/Flask (will be done in the future)

Authors

Demo

Insert gif or link to demo (will be provided later)

Documentation

[Documentation] (will be updated later)

Environment Variables

To run this project, you will need to add the following path to your environment variable,

export PYTHONPATH="${PYTHONPATH}:<path_to_the_project_root_dir>

Running the CLI tool

To run the CLI tool, use the following command

python scripts/annotator.py <path_to_data_to_be_annotated> \
                                <sampling_method> \
                                <no_of_samples>  \
                                <path_to_store_the annotated_data>

Running the GUI annotation tool

To run the GUI tool, use the following command

python app/index.py

Train model

To train the model on the news corpus (by directly downloading it from HuggingFace Datasets), do the following

python scripts/download_data.py
python scripts/train.py

If you've downloaded the data from the Kaggle competition, then use the following commands,

# To prepare the data
python scripts/data.py

# To train the model
jupyter notebook

Running Tests

To run tests, run the following command

make test

Running Styling

To run the styling on this project, use the following command

make style

Roadmap

Fix the issues with annotation app (Dash)
Add testcases for all the modules
Add features to train and inference using Dash app
Fix Github Actions
Dockerize the application
Add documentation

Screenshots

CLI

Dash Tool

Acknowledgement

@katherinepeterson for development and design of README.so.
@GokuMohandas for the course on MLOps (https://github.com/GokuMohandas/MadeWithML).

License

MIT

Appendix

Any additional information goes here

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.data_tracking		.data_tracking
.github/workflows		.github/workflows
app		app
data		data
notebooks		notebooks
scripts		scripts
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Active Learning in NLP

Initial Plan

Components completed

Authors

Demo

Documentation

Environment Variables

Running the CLI tool

Running the GUI annotation tool

Train model

Running Tests

Running Styling

Roadmap

Screenshots

CLI

Dash Tool

Acknowledgement

License

Appendix

About

Releases

Packages

Contributors 3

Languages

License

AbinayaM02/Active_Learning_in_NLP

Folders and files

Latest commit

History

Repository files navigation

Active Learning in NLP

Initial Plan

Components completed

Authors

Demo

Documentation

Environment Variables

Running the CLI tool

Running the GUI annotation tool

Train model

Running Tests

Running Styling

Roadmap

Screenshots

CLI

Dash Tool

Acknowledgement

License

Appendix

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages