Web pages scrapping and parsing for data extraction for the following projects.
The project is based on Apache AirFlow and can be deployed in Docker.
NB. The user, password and key must be specified in docker-compose.yml
(see <REPLACE_BY_AIRFLOW_USER>, <REPLACE_BY_AIRFLOW_PASSWORD> and <REPLACE_BY_RANDOM_STRING>).
The initialization script is ./db/init/utils/init_db_sources.py
.
The DAG described in ./dags/grab_rss.py
.
The result can be accessed in Redis DB #1.
The initialization isn't required.
The DAG described in ./dags/grab_currency_rate.py
.
The result can be accessed in Redis DB #2.