diff --git a/entity_recognition/entity_recognition_training.ipynb b/entity_recognition/entity_recognition_training.ipynb index 0883942..30f2098 100644 --- a/entity_recognition/entity_recognition_training.ipynb +++ b/entity_recognition/entity_recognition_training.ipynb @@ -23,6 +23,8 @@ "source": [ "This notebook demonstrates how to train a NLP model for entity recognition and use it to produce out-of-sample predicted probabilities for each token. These are required inputs to find label issues in token classification datasets with cleanlab. The specific token classification task we consider here is Named Entity Recognition with the [CoNLL-2003 dataset](https://deepai.org/dataset/conll-2003-english), and we train a Transformer network from [HuggingFace's transformers library](https://github.com/huggingface/transformers). This notebook demonstrates how to produce the `pred_probs`, using them to find label issues is demonstrated in cleanlab's [Token Classification Tutorial](https://docs.cleanlab.ai/stable/tutorials/token_classification.html). \n", "\n", + "Note: running this notebook requires the **.py** files from the **entity_recognition/** parent folder, if running in Colab or locally, make sure you've copied these helper **.py** files to your environment as well. \n", + "\n", "**Overview of what we'll do in this notebook:** \n", "- Read and process text datasets with per-token labels in the CoNLL format. \n", "- Compute out-of-sample predicted probabilities by training a BERT Transformer network via cross-validation. \n",