A repository for exploring the pinochet dataset [freire2019pinochet] and hopefully more data. Read more about this dataset in https://github.com/danilofreire/pinochet.
- dbt documentation
- Elementary Report using Elementary
- FastAPI Docs hosted on fly.io
- Graphql endpoint using strawberry-graphql
This project follows a monorepo structure. The main services are:
pinochet-rettig-fastapi
: A fastapi service to serve the data as a REST API.pinochet-rettig-linked-data
: A virtual knowledge graph to query the data with SPARQL.pinochet-rettig-dbt
: A dbt project to transform the raw data into actionable data.postgis
: A postgis instance to store the data.- 'metabase': A metabase instance to explore the data visually.
pinochet-rettig-streamlit
: A streamlit app to explore the data visually.
Each service has its own Dockerfile
and docker-compose.yml
file to run locally. You can explore commands in the Makefile
in the root directory. For those projects that use python, you can use poetry
to manage the dependencies.
Install poetry with pipx and install the dependencies
# install pipx
python3 -m pip install --user pipx
python3 -m pipx ensurepath
# install poetry
pipx install poetry
# install dependencies
poetry install
After installing the dependencies, you need to configure your dbt at ~/.dbt/profiles.yml
.
# ~/.dbt/profiles.yml
dbt_pinochet:
outputs:
dev:
type: postgres
threads: 1
host: 0.0.0.0
port: 5433
user: postgres
pass: postgres
dbname: pinochet
schema: api
target: dev
See the credentials being used in the settings.py file and those set in the postgis instance set in docker-compose.postgis.yml
you can now run dbt models with
dbt build --target dev
Get the dbt documentation with
make dbt_docs.devserver
Check other commands with make help
.
These services are deployed locally using docker compose. You can start them with docker compose -f <docker-compose-file>.yml up -d
.
You need to provide a valid .env
file, see example.env
for an example.
A database with postgis extension enabled. The database is empty by default, and it is populated by dbt with
docker compose -f docker-compose.postgis.yml up -d
dbt build --target dev
We use dbt to transform the raw csv into actionable data. The models are in ./models
. You can run the models with
dbt deps
dbt build --target dev
We use ontop to serve a virtual knowledge graph to query the database with SPARQL.
cp .env.example .env
cp ./pinochet-rettig-linked-data/ontop/input/mapping.protege.properties.example ./pinochet-rettig-linked-data/ontop/input/mapping.protege.properties
docker compose -f docker-compose.postgis.yml -f docker-compose.ontop.yml up -d
And go to <0.0.0.0:8083> to run an example query with the virtual knowledge graph.
PREFIX : <http://example.org/pinochet-rettig#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?victim ?lastName {
?victim a :Victim ; foaf:lastName ?lastName .
}
We use fastapi to serve the data as a REST API. You can start the service with
make api.upd
Check the necessary configuration parameters in the docker compose file.
Install the devcontainer CLI with
npm install -g devcontainer-cli
and run devcontainer open
to start the a visualstudio code instance from the fastapi service at /app/
.
Note
The whole repository is mounted inside in /workspaces/
but the application is being run from /app/
.
I need to figure out yet a clean workflow, but so far it is working.
Authentication is done using JWT. You can get a token with the credentials for a default user created as a migration script
We use strawberry-graphql to serve the data as a GraphQL API. It is located under localhost:8000/graphql
.
Note: the graphql endpoint is still in progress. It accepts very few queries at the moment
I'm using fly.io to deploy the services. The configuration is in fly.toml
files inside ./pinochet-rettig-fastapi
and ./postgis
.
You must install the flyctl cli tool to deploy the services. See https://fly.io/docs/getting-started/installing-flyctl/ for details. After installing the cli tool, you can deploy the services with
cd pinochet-rettig-fastapi
flyctl deploy
and
cd postgis
flyctl deploy
You will need to set the POSTGRES_PASSWORD
secret for the fastapi deployment, with flyctl secrets set
. Note that other secrets were set by fly.io automatically, you can check them with flyctl secrets list
:
$ cd pinochet-rettig-fastapi
$ flyctl secrets list
NAME DIGEST CREATED AT
DATABASE_URL xxxxxxxxxxxxxxxx 23h7m ago
POSTGRES_PASSWORD xxxxxxxxxxxxxxxx 22h49m ago
SECRET_KEY xxxxxxxxxxxxxxxx 22h30m ago
SENTRY_DSN xxxxxxxxxxxxxxxx 21h37m ago
Set the POSTGRES_PASSWORD
secret with
flyctl secrets set POSTGRES_PASSWORD=password_from_flyio_db
You will also need to activate the postgis extension, by connecting to the database with flyctl pg connect
and running CREATE EXTENSION postgis;
:
❯ fly pg connect --app pinochet-api-prod --database pinochet_api
Connecting to fdaa:3:d37e:a7b:1be:de4c:ba0b:2... complete
psql (15.3 (Debian 15.3-1.pgdg120+1))
Type "help" for help.
pinochet_api=# CREATE EXTENSION postgis;
We can use metabase to explore the data visually. You can start metabase with
docker compose -f docker-compose.postgis.yml -f docker-compose.metabase.yml up -d
And go to localhost:3010 to explore the data. Metabase will ask you to create an admin user and setup the database connection.
We use streamlit to explore the data visually. You can start the service with
docker compose -f docker-compose.postgis.yml -f docker-compose.streamlit.yml up -d