Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 792 Bytes

README.md

File metadata and controls

9 lines (8 loc) · 792 Bytes

postal_address_canada_parsing

Fine-tuning a base model for a token classification task (postal address parsing)

Steps

  • Step 0: get millions of postal addresses from openaddresses.io
  • Step 1: select columns of interests and create train / validation / split
  • Step 2: attempt to "de-normalize" the postal addressess so that they are more likely to represent what we get in real life.
  • Step 3: fine-tune a fill-mask base model for a token classification; e.g. base_model = google-bert/bert-base-multilingual-uncased
  • Step 4: demo time on HuggingFace Spaces: Didier/Postal_address_canada_parsing