Skip to content

Commit

Permalink
solved merge
Browse files Browse the repository at this point in the history
  • Loading branch information
davebulaval committed Oct 6, 2023
2 parents 3391f94 + 9d2426e commit 0679de0
Show file tree
Hide file tree
Showing 13 changed files with 138 additions and 31 deletions.
9 changes: 9 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,15 @@ jobs:
runs-on: ubuntu-latest

steps:
# Appears that we get disk memory space problem, thus as recommended by this
# thread (https://github.com/actions/runner-images/issues/2840#issuecomment-790492173)
# we clean the runner before starting the tests to free some spaces.
- name: Remove unnecessary files
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- uses: actions/checkout@v3
- name: Build the Docker image
run: |
Expand Down
18 changes: 18 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ jobs:
python-version: [ "3.8", "3.9", "3.10", "3.11" ]

steps:
# Appears that we get disk memory space problem, thus as recommended by this
# thread (https://github.com/actions/runner-images/issues/2840#issuecomment-790492173)
# we clean the runner before starting the tests to free some spaces.
- name: Remove unnecessary files
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
Expand All @@ -29,6 +38,15 @@ jobs:
python-version: [ "3.8", "3.9", "3.10", "3.11" ]

steps:
# Appears that we get disk memory space problem, thus as recommended by this
# thread (https://github.com/actions/runner-images/issues/2840#issuecomment-790492173)
# we clean the runner before starting the tests to free some spaces.
- name: Remove unnecessary files
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
Expand Down
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,4 +338,6 @@
- Add a Dockerfile and a `docker-compose.yml` to build a Docker container for the API.
- Bug-fix the default pre-processors that were not all apply but only the last one.

## dev
## dev

- Improve documentation
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,14 +224,14 @@ address_parser = AddressParser(
address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6")
```

### Parse Address With Our Out-Of-The-Box FastAPI Parse Model
### Parse Address With Our Out-Of-The-Box API

You can use Out-Of-The-Box RESTAPI to parse addresses:
We also offer an out-of-the-box RESTAPI to parse addresses using FastAPI.

#### Installation:

First, ensure that you have Docker Engine and Docker Compose installed on your machine.
if not, you can install them using the following documentations in the following order:
If not, you can install them using the following documentations in the following order:

1. [Docker Engine](https://docs.docker.com/engine/install/)
2. [Docker Compose](https://docs.docker.com/compose/install/linux/#install-using-the-repository)
Expand All @@ -246,7 +246,7 @@ docker compose up app

#### Sentry:
Also, you can monitor your application usage with [Sentry](https://sentry.io) by setting the environment variable `SENTRY_DSN` to your Sentry's project
DSN. There is an example of the .env file in the root of the project named `.env_example`. you can just copy it using the following command:
DNS. There is an example of the `.env` file in the project's root named `.env_example`. You can copy it using the following command:

```shell
cp .env_example .env
Expand All @@ -259,7 +259,7 @@ the `.env` file will also work. The application will run without any problem if

#### Request Examples:

Once the application is up and running and the port 8000 is exported on your localhost, you can send request with one
Once the application is up and running and port `8000` is exported on your localhost, you can send a request with one
of the following methods:

##### cURL POST request:
Expand Down
2 changes: 1 addition & 1 deletion deepparse/parser/address_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ def __call__(
replaced as ``'3 305'`` for the parsing. Where ``'3'`` is the unit, and ``'305'`` is the street number.
We use a regular expression to replace alphanumerical characters separated by a hyphen at
the start of the string. We do so since some cities use hyphens in their names. The default
is ``False``. If True, it adds the :func:`~deepparse.pre_processing.pre_processor.hyphen_cleaning`
is ``False``. If True, it adds the :func:`~deepparse.pre_processing.address_cleaner.hyphen_cleaning`
pre-processor **at the end** of the pre-processor list to apply.
pre_processors (Union[None, List[Callable]]): A list of functions (callable) to apply pre-processing on
all the addresses to parse before parsing. See :ref:`pre_processor_label` for examples of
Expand Down
14 changes: 7 additions & 7 deletions deepparse/pre_processing/address_cleaner.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

def double_whitespaces_cleaning(address: str) -> str:
"""
Pre-processor to remove double whitespace by one whitespace.
Pre-processor to remove double whitespace (``" "``) by one whitespace (``" "``).
The regular expression use to clean multiple whitespaces is the following ``" {2,}"``.
Args:
Expand All @@ -17,10 +17,10 @@ def double_whitespaces_cleaning(address: str) -> str:

def trailing_whitespace_cleaning(address: str) -> str:
"""
Pre-processor to remove trailing whitespace.
Pre-processor to remove trailing whitespace (``" "``).
Args:
address: The address to apply trailing whitespace cleaning on.
address: The address to apply trailing whitespace (``" "``) cleaning on.
Return:
The trailing whitespace cleaned address.
Expand Down Expand Up @@ -64,16 +64,16 @@ def hyphen_cleaning(address: str) -> str:
"""
Pre-processor to clean hyphen between the street number and unit in an address. Since some addresses use the
hyphen to split the unit and street address, we replace the hyphen with whitespaces to allow a
proper splitting of the address. For example, the proper parsing of the address 3-305 street name is
Unit: 3, StreetNumber: 305, StreetName: street name.
proper splitting of the address. For example, the proper parsing of the address ``"3-305 street name"`` is
``"Unit": "3", "StreetNumber": "305", "StreetName": "street name"``.
See `issue 137 <https://github.com/GRAAL-Research/deepparse/issues/137>`_ for more details.
The regular expression use to clean hyphen is the following ``"^([0-9]*[a-z]?)-([0-9]*[a-z]?) "``.
The first group is the unit, and the second is the street number. Both include letters since they can include
letters in some countries. For example, unit 3a or address 305a.
letters in some countries. For example, ``unit 3a`` or address ``305a``.
Note: the hyphen is also used in some cities' names, such as Saint-Jean; thus, we use regex to detect
Note: the hyphen is also used in some cities' names, such as ``"Saint-Jean"``; thus, we use regex to detect
the proper hyphen to replace.
Args:
Expand Down
68 changes: 68 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
.. role:: hidden
:class: hidden-section

Parse Address With Our Out-Of-The-Box API
=========================================

We also offer an out-of-the-box RESTAPI to parse addresses using FastAPI.

Installation
************

First, ensure that you have Docker Engine and Docker Compose installed on your machine.
If not, you can install them using the following documentations in the following order:

1. `Docker Engine <https://docs.docker.com/engine/install/>`_
2. `Docker Compose <https://docs.docker.com/compose/install/>`_

Once you have Docker Engine and Docker Compose installed, you can run the following command to start the FastAPI application:

.. code-block:: sh
docker compose up app
Sentry
******

Also, you can monitor your application usage with `Sentry <https://sentry.io>`_ by setting the environment variable ``SENTRY_DSN`` to your Sentry's project
DSN. There is an example of the ``.env`` file in the project's root named ``.env_example``. You can copy it using the following command:

.. code-block:: sh
cp .env_example .env
Request Examples
----------------

Once the application is up and running and port ``8000`` is exported on your localhost, you can send a request with one
of the following methods:

cURL POST request
~~~~~~~~~~~~~~~~~

.. code-block:: shell
curl -X POST --location "http://127.0.0.1:8000/parse/bpemb-attention" --http1.1 \
-H "Host: 127.0.0.1:8000" \
-H "Content-Type: application/json" \
-d "[
{\"raw\": \"350 rue des Lilas Ouest Quebec city Quebec G1L 1B6\"},
{\"raw\": \"2325 Rue de l'Université, Québec, QC G1V 0A6\"}
]"
Python POST request
~~~~~~~~~~~~~~~~~~~

.. code-block:: python
import requests
url = 'http://localhost:8000/parse/bpemb'
addresses = [
{"raw": "350 rue des Lilas Ouest Quebec city Quebec G1L 1B6"},
{"raw": "2325 Rue de l'Université, Québec, QC G1V 0A6"}
]
response = requests.post(url, json=addresses)
parsed_addresses = response.json()
print(parsed_addresses)
3 changes: 2 additions & 1 deletion docs/source/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ We do not handle the ``seq2seq_params`` fine-tuning argument for now.

Test
****

This command allows a user to test the ``base_parsing_model`` (or the retrained one using the
``--path_to_retrained_model``) on the ``train_dataset_path`` dataset.
For the testing, the CSV or Pickle dataset is loader in a specific dataloader (see
Expand Down Expand Up @@ -136,4 +137,4 @@ Command to pre-download model weights and requirements. Here is the list of argu
- ``model_type``: The parsing module to download. The possible choice are ``'fasttext'``, ``'fasttext-attention'``, ``'fasttext-light'``, ``'bpemb'`` and ``'bpemb-attention'``.
- ``--saving_cache_dir``: To change the default saving cache directory (default to ``None``, e.g. default path).

.. autofunction:: deepparse.cli.download.main
.. autofunction:: deepparse.cli.download_model.main
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = 'en'

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down
25 changes: 17 additions & 8 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -653,32 +653,40 @@ class name) when reloading it.
address_parser.retrain(training_container, train_ratio=0.8, epochs=5, batch_size=8, name_of_the_retrain_parser="MyNewParser")
Parse Address With Our Out-Of-The-Box FastAPI Parse Model
*********************************************************
You can use Out-Of-The-Box RESTAPI to parse addresses:
Parse Address With Our Out-Of-The-Box API
*****************************************
We also offer an out-of-the-box RESTAPI to parse addresses using FastAPI.

Installation
------------
First, ensure that you have Docker Engine and Docker Compose installed on your machine.
if not, you can install them using the following documentations in the following order:
If not, you can install them using the following documentations in the following order:


1. `Docker Engine <https://docs.docker.com/engine/install/>`_

2. `Docker Compose <https://docs.docker.com/compose/install/>`_

Also, you can monitor your application usage with `Sentry <https://sentry.io>`_ by setting the environment variable SENTRY_DSN to your Sentry's project DSN. There is an example of the .env file in the root of the project named .env_example.

Once you have Docker Engine and Docker Compose installed, you can run the following command to start the FastAPI application:

.. code-block:: shell
docker compose up app
Sentry
******

Also, you can monitor your application usage with `Sentry <https://sentry.io>`_ by setting the environment variable ``SENTRY_DSN`` to your Sentry's project
DSN. There is an example of the ``.env`` file in the project's root named ``.env_example``. You can copy it using the following command:

.. code-block:: sh
cp .env_example .env
Request Examples
----------------

Once the application is up and running and the port 8000 is exported on your localhost, you can send request with one of the following methods:
Once the application is up and running and port ``8000`` is exported on your localhost, you can send a request with one
of the following methods:

cURL POST request
~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -828,6 +836,7 @@ API Reference
dataset_container
comparer
cli
api

.. toctree::
:glob:
Expand Down
10 changes: 5 additions & 5 deletions docs/source/pre_processor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ Pre-Processors
Here are the available pre-processor in Deepparse. The first four are used as default settings when parsing
addresses.

.. autofunction:: deepparse.pre_processing.pre_processor.coma_cleaning
.. autofunction:: deepparse.pre_processing.pre_processor.lower_cleaning
.. autofunction:: deepparse.pre_processing.pre_processor.trailing_whitespace_cleaning
.. autofunction:: deepparse.pre_processing.pre_processor.double_whitespaces_cleaning
.. autofunction:: deepparse.pre_processing.pre_processor.hyphen_cleaning
.. autofunction:: deepparse.pre_processing.address_cleaner.coma_cleaning
.. autofunction:: deepparse.pre_processing.address_cleaner.lower_cleaning
.. autofunction:: deepparse.pre_processing.address_cleaner.trailing_whitespace_cleaning
.. autofunction:: deepparse.pre_processing.address_cleaner.double_whitespaces_cleaning
.. autofunction:: deepparse.pre_processing.address_cleaner.hyphen_cleaning
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ target-version = ['py38', 'py39', 'py310', 'py311']

line-length = 120
skip-string-normalization = true
required-version = "23.3.0"
required-version = "23.9.1"
extend-exclude = "/(slides)/"

[tool.pylint.ini_options]
Expand Down
2 changes: 1 addition & 1 deletion styling_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
black==23.3.0
black==23.9.1
pylint==2.16.2
pylint-django[with_django]==2.5.3
pre-commit==3.3.3

0 comments on commit 0679de0

Please sign in to comment.