postal_address_canada_parsing

Fine-tuning a base model for a token classification task (postal address parsing)

Steps

Step 0: get millions of postal addresses from openaddresses.io
Step 1: select columns of interests and create train / validation / split
Step 2: attempt to "de-normalize" the postal addressess so that they are more likely to represent what we get in real life.
Step 3: fine-tune a fill-mask base model for a token classification; e.g. base_model = google-bert/bert-base-multilingual-uncased
Step 4: demo time on HuggingFace Spaces: Didier/Postal_address_canada_parsing

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
requirements.txt		requirements.txt
step1_openaddresses_data_prep.ipynb		step1_openaddresses_data_prep.ipynb
step2_token_classif_dataset.ipynb		step2_token_classif_dataset.ipynb
step3_token_classif_train.ipynb		step3_token_classif_train.ipynb

Provide feedback