diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml new file mode 100644 index 00000000..ad167922 --- /dev/null +++ b/.github/workflows/docs.yml @@ -0,0 +1,31 @@ +name: docs - build, deploy +on: + push: + branches: + - main + - develop +permissions: + contents: write +jobs: + deploy: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Configure Git Credentials + run: | + git config user.name github-actions[bot] + git config user.email 41898282+github-actions[bot]@users.noreply.github.com + - uses: actions/setup-python@v5 + with: + python-version: 3.x + - run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV + - uses: actions/cache@v4 + with: + key: mkdocs-material-${{ env.cache_id }} + path: .cache + restore-keys: | + mkdocs-material- + # We need to install not just the doc dependencies but also the package itself + # since it has the docs/tools/asset_type_replacer plugin. + - run: python -m pip install -e ".[docs]" + - run: python -m mkdocs gh-deploy --force \ No newline at end of file diff --git a/docs/Contributing.md b/docs/Contributing.md index 854139a3..8ed78f7d 100644 --- a/docs/Contributing.md +++ b/docs/Contributing.md @@ -1 +1,97 @@ # Contributing + +You're looking to contribute to the AI-on-Demand metadata catalogue, that's great! +AI-on-Demand is an open project and we welcome contributions by all, no matter their experience or background. + +## Types of Contributions +There are many types of contributions to be made to the metadata catalogue, and this section +covers a non-exhaustive list. If you find something that's not outlined here, just reach out to us! + +### No Code +Good examples for contributions that do not require any coding include: + + - Giving [our GitHub repository](https://github.com/aiondemand/AIOD-rest-api) a star ⭐️ + - [Helping people on the GitHub issue tracker](#helping-people-on-github) 🤝 + - [Reporting bugs](#reporting-bugs) 🐛 + +### Documentation +You can help improve our documentation by correcting mistakes, such as typos, adding clarifications, +or even adding new sections. For small changes, documentation can be edited directly on GitHub by +finding the related markdown file and clicking the pencil icon in the top-right corner (✐). + +If you intend to make bigger changes, please first open a new issue the documents the suggested change. +That way, we can give you feedback before you write, and verify that it is actually a change we would be interested in adopting. +For big changes, we also recommend you to follow the instructions on ["Setting up a development environment"](#setting-up-a-development-environment) +so that you can render the changed documentation locally. + +### Code +For code changes, please first co-ordinate with the developers on what you will work on. +You can do this by either leaving a comment on an existing issue that you would like to help with, +or by opening a new issue proposing your change. By first communicating with the developers, they +can let you know ahead of time whether or not the change is wanted, make sure they have time to +support you, and provide any feedback. We really want to avoid a scenario where you may work hard on a contribution +only to find out that it is not in line with the vision of the project and thus will not be accepted. +When starting your first code contribution, visit the ["Setting up a development environment"](#setting-up-a-development-environment) +section for more information on how to get started. + +## Helping people on GitHub +Helping people on the [GitHub issue tracker](https://github.com/aiondemand/AIOD-rest-api/issues) just requires a GitHub account. +You can help by people by answering their questions, or weighing in on discussions. +Even if you do not have an answer, you can verify that you can reproduce the behavior they report or +ask clarifying questions to make their bug report better (see also ["Reporting Bugs"](#reporting-bugs)). +This helps the core contributors resolve the issue with more ease and is hugely appreciated. +As always, please be kind and patient with others. + +## Reporting Bugs +When you find a bug and want to report it, the first step is to check that it has not already been reported. +Use the search functionality of the GitHub issue tracker to find identical or related issues. +If your issue or bug has already been reported, please do not open a new issue. +Instead, either "upvote" the original report by leaving a 👍 reaction or, +if you have additional information which may be relevant to the discussion, leave a comment on that issue. + +If your bug isn't reported yet, create a new issue. +Provide a clear explanation of the expected behavior and the observed behavior, and explain why you think this is a bug. +Add instructions on how to reproduce the expected behavior through a [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). +Without it, it may be very hard for contributors to solve the issue (or may not even understand it). + +## Setting up a development environment + +### Cloning +First, make sure you can get the local metadata catalogue up and running by following the ["Hosting" instructions](Hosting.md). +During the installation step, use `git` to clone the repository. +If you have write access to this repository, you can follow the instruction as-is. +If you do not have write access to this repository, you must [fork it](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo). +After forking the repository, your clone command will be (substituting `USERNAME` for your GitHub username): + +```commandline +git clone https://github.com/USERNAME/AIOD-rest-api.git +``` + +### Installing Dependencies +Always make sure to install your dependencies in a local environment, for example with the built in `venv` module: + +```commandline +python -m pip venv venv +source venv/bin/activate +``` + +and then install the Python dependencies + +```commandline +python -m pip install -e ".[dev, docs]" +``` + +we install the optional `dev` (developer) and `docs` (documentation) dependencies so that we can +build documentation and run tests locally. + +### Configuration + +It is also generally useful to set add an `override.env` file to the project's root directory with +the line `USE_LOCAL_DEV=true` added. This will allow utility scripts `./scripts/up.sh` and `./scripts/down.sh` +to start docker containers in a way that reflects local changes. + +## Making a Code Changes +See the ["Developer Documentation"](developer/index.md) for the technical documentation of this project. +More to be added. + +[//]: # (## Setting up a pull request) diff --git a/docs/Hosting.md b/docs/Hosting.md index 3c54a7d0..ccae0138 100644 --- a/docs/Hosting.md +++ b/docs/Hosting.md @@ -1,6 +1,7 @@ # Hosting the Metadata Catalogue This page has information on how to host your own metadata catalogue. -If you plan to locally develop the REST API, please follow the installation procedure in ["Contributing"](../contributing) instead. +If you plan to locally develop the REST API, please follow the installation procedure in ["Contributing"](../contributing) +after following the instructions on this page. ## Prerequisites The platform is tested on Linux, but should also work on Windows and MacOS. @@ -17,6 +18,10 @@ However, we do need to fetch files the latest release of the repository: git clone https://github.com/aiondemand/AIOD-rest-api.git ``` + It is also possible to clone using [SSH](https://docs.github.com/en/authentication/connecting-to-github-with-ssh). + If you plan to develop the metadata catalogue, check the ["Contributing"](Contributing.md#cloning) page + for more information on this step. + === "UI (browser)" * Navigate to the project page [aiondemand/AIOD-rest-api](https://github.com/aiondemand/AIOD-rest-api). @@ -146,4 +151,4 @@ Make sure that the specified `--network` is the docker network that has the `sql The alembic directory is mounted to ensure the latest migrations are available, the src directory is mounted so the migration scripts can use defined classes and variable from the project. -[//]: # (TODO: Write documentation for when some of the migrations are not applicable. E.g., when a database was created in a new release.) +[//]: # (TODO: Write documentation for when some of the migrations are not applicable. E.g., when a table was created in a new release.) diff --git a/docs/README.md b/docs/README.md index 33a4cd65..099ceeca 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,322 +1,25 @@ -# AIoD API +# 🇪🇺AI-on-Demand Metadata Catalogue -This repository contains the AI on Demand (AIoD) REST API. It is built with -[FastAPI](https://fastapi.tiangolo.com/) -that interacts with a database ([MySQL](https://hub.docker.com/_/mysql)) -and [OpenML's REST API](https://www.openml.org/apis). -Both the database and the REST API are run from docker in separate containers. +This repository contains code and configurations for the AI-on-Demand Metadata Catalogue. +The metadata catalogue provides a unified view of AI assets and resources stored across the AI landscape. +It collects metadata from platforms such as [_Zendodo_](https://zenodo.org) and [_OpenML_](https://openml.org), +and is connected to European projects like [Bonsapps](https://bonsapps.eu) and [AIDA](https://www.i-aida.org). +Metadata of datasets, models, papers, news, and more from all of these sources is available through a REST API at [api.aiod.eu](https://api.aiod.eu/). -The AIoD REST API will allow any kind of service to interact with the AIoD portal to discover, -retrieve, and share AI resources. It forms the connection between user-facing components, such -as the AIoD website or Python Client API, and the backend. The metadata for datasets, models -and other resources can be accessed, added, updated and deleted through this API. +**🧑‍🔬 For most users:** +Many users will only use the REST API indirectly, for example; +through [My Resources](https://github.com/aiondemand/AIOD-marketplace-frontend/) to browse assets, +through [RAIL](https://github.com/aiondemand/aiod-rail) to conduct ML experiments, +or through the [Python SDK](https://github.com/aiondemand/aiondemand) to access the metadata in Python scripts. +For documentation on how to use the REST API directly, visit the ["Using the API"](Using.md). -## Architecture -All metadata is stored in the AIoD metadata database. For every instance of the API (there will -be multiple running instances, for reliability and performance), an instance of this database -will run on the same machine (on the same node). The type of database is not yet determined, for -now we use a simple MySQL database. +**🧑‍💻 For service developers:** +To use the metadata catalogue from your service, use the [Python SDK](https://github.com/aiondemand/aiondemand) +or use the REST API directly as detailed in the ["Using the API"](Using.md) documentation. -The metadata is stored in AIoD format. When the user requests an item, such as a dataset, it can -be returned in AIoD format, or converted to any supported format, as requested by the user. For -datasets, we will for instance support schema.org and DCAT-AP. +**🌍 Hosting:** For information on how to host the metadata catalogue, see the ["Hosting" documentation](Hosting.md). -Requesting a dataset will therefore be simply: +**🧑‍🔧 API Development:** The ["Developer Guide"](developer/index.md) has information about the code in this repository and how to make contributions. -![Get dataset UML](media/GetDatasetUML.png) - -To fill the database, a synchronization process must be running continuously for every platform -(e.g. HuggingFace or OpenML). This synchronization service of a platform will be deployed at a -single node. The synchronization service queries its platform for updates, converts the metadata -to the AIoD format and updates the database. - -Note that this synchronization process between the platform and the database, is different from -the synchronization between database instances. The latter is under discussion in the AIoD -Synchronization Meetings. - -### AIoD Metadata - -The models are found in `src/database/model`. The AIoD Metadata team is responsible for -determining the fields of the metadata, whereafter the classes are implemented in this metadata -catalogue. To check the existing fields, the easiest way it to start this application (see -"Using Docker Compose") and check the automatically generated swagger documentation -(http://localhost:8000/docs). - -We use inheritance to make sure that generic fields, such as name and description, are present -and consistent over all resources. A partial overview of the metadata model can be found in the -following figure: - -![AIoD Metadata model](media/AIoD_Metadata_Model.drawio.png) - -## Prerequisites -- Linux/MacOS/Windows (should all work) -- [Docker](https://docs.docker.com/get-docker/) -- [Docker Compose](https://docs.docker.com/compose/install/) version 2.21.0 or higher - -For development: -- `Python3.11` with `python3.11-dev` (`sudo apt install python3.11-dev` on Debian) -- Additional 'mysqlclient' dependencies. Please have a look at [their installation instructions](https://github.com/PyMySQL/mysqlclient#install). - -## Production environment - -For production environments elasticsearch recommends -Xss4G and -Xmx8G for the JVM settings.\ -This parameters can be defined in the .env file. -See the [elasticsearch guide](https://www.elastic.co/guide/en/logstash/current/jvm-settings.html). - -For Keycloak, the `--http-enabled=true` and `--hostname-strict-https=false` should be omitted -from the docker-compose file. - -## Installation - -This repository contains two systems; the database and the REST API. -As a database we use a containerized MySQL server (through Docker), the REST API can be run locally or containerized. -Information on how to install Docker is found in [their documentation](https://docs.docker.com/desktop/). - -### Using docker compose -```bash -docker compose --profile examples up -d -``` - -starts the MYSQL Server, the REST API, Keycloak for Identity and access management and Nginx for reverse proxying. \ -Once started, you should be able to visit the REST API server at: http://localhost and Keycloak at http://localhost/aiod-auth \ -To authenticate to the REST API swagger interface the predefined user is: user, and password: password \ -To authenticate as admin to Keycloak the predefined user is: admin and password: password \ -To use a different DNS hostname, refer to the ["Changing the configuration"](#changing-the-configuration) section below for instructions on how to ovverride `HOSTNAME` in `.env` and `opendid_connect_url` in `config.toml`. \ -This configuration is intended for development, DO NOT use it in production. - -To turn if off again, use -```bash -docker compose --profile examples down -``` - -To connect to the database use `./scripts/database-connect.sql`. - -```bash -mysql> SHOW DATABASES; -+--------------------+ -| Database | -+--------------------+ -| information_schema | -| mysql | -| performance_schema | -| sys | -+--------------------+ -4 rows in set (0.03 sec) -``` - -Now, you can visit the server from your browser at `localhost:8000/docs`. - - -### Changing the configuration -You may need to change the configuration locally, for example if you want different ports to be used. -Do not change files, instead add overrides. - -#### Docker Compose -For docker compose, the environment variables are defined in the `.env` file. -To override variables, for example `AIOD_LOGSTASH_PORT`, add a new file called `override.env`: -```bash {title='override.env'} -AIOD_LOGSTASH_PORT=5001 -``` -Then also specify this when you invoke docker compose, e.g.: -`docker compose --env-file=.env --env-file=override.env up` -Note that **order is important**, later environment files will override earlier ones. -You may also use the `./scripts/up.sh` script to achieve this (see ["Shorthands"](#shorthands) below). - -#### Config.toml -The main application supports configuration options through a `toml` file. -The defaults can be found at `src/config.default.toml`. -To override them, add a `src/config.override.toml` file. -It follows the same structure as the default file, but you only need to specify the variables to override. - -#### Using connectors -You can specify different connectors using - -```bash -docker compose --profile aibuilder --profile examples --profile huggingface-datasets --profile openml --profile zenodo-datasets up -d -docker compose --profile aibuilder --profile examples --profile huggingface-datasets --profile openml --profile zenodo-datasets down -``` - -Make sure you use the same profile for `up` and `down`, or use `./scripts/down.sh` (see below), -otherwise some containers might keep running. - -##### Configuring AIBuilder connector -To access the AIBuilder API you need to provide a valid API token through the `AIBUILDER_API_TOKEN` variable. \ -Use the `override.env` file for that as explained above. \ -Please note that for using the url of the `same_as` field of the AIBuilder models, you will need to substitute `AIBUILDER_API_TOKEN` on the url for your actual API token value. - -### Shorthands -We provide two auxiliary scripts for launching docker containers and bringing them down. -The first, `./scripts/up.sh` invokes `docker compose up -d` and takes any number of profiles to launch as parameters. -It will also ensure that the changes of the configurations (see above) are observed. -If `USE_LOCAL_DEV` is set to `true` (e.g., in `override.env`) then your local source code will be mounted on the containers, -this is useful for local development but should not be used in production. -E.g., with `USE_LOCAL_DEV` set to `true`, `./scripts/up.sh` resolves to: -`docker compose --env-file=.env --env-file=override.env -f docker-compose.yaml -f docker-compose.dev.yaml --profile examples up -d` - -The second script is a convenience for bringing down all services, including all profiles: `./scripts/down.sh` - -#### Local Installation - -If you want to run the server locally, you need **Python 3.11**. -We advise creating a virtual environment first and install the dependencies there: - -```bash -python3.11 -m venv venv -source venv/bin/activate -python -m pip install . -``` - -For development, you will need to install the optional dependencies as well: - -```bash -source venv/bin/activate -python -m pip install ".[dev]" -``` - -Moreover, you are encouraged to install the pre-commit hooks, so that black, mypy and the unittests -run before every commit: -```bash -pre-commit install -``` -You can run -```bash -pre-commit run --all-files -``` -To run pre-commit manually. - -After installing the dependencies you can start the server. You have 3 options: - -1. Run from your machine: -```bash -cd src -python main.py --reload -``` -The `--reload` argument will automatically restart the app if changes are made to the source files. -2. Run using docker. For instance using `scripts/run_apiserver.sh` -3. Run using DevContainer (see next subsection) - -### Authentication -Currently, the code is by default running using the local Keycloak. To make -this work, you need to set an environment variable. You can do this by setting the -`KEYCLOAK_CLIENT_SECRET` in `src/.env`. - -```bash -# src/.env -KEYCLOAK_CLIENT_SECRET=[SECRET] -``` - -Alternatively, you can connect to a different keycloak instance by modifying `src/.env`. EGI -Checkin can for instance be used on a deployed instance - not on local host. Marco Rorro is the -go-to person to request the usage of the EGI Checkin. - -The reason that EGI Checkin doesn't work on localhost, is that the redirection url of EGI -Checkin is strict - as it should be. On our development keycloak, any redirection url is -accepted, so that it works on local host or wherever you deploy. This should never be the case -for a production instance. - -See [authentication README](developer/auth.md) for more information. - -### Creating the Database - -By default, the app will create a database on the provided MySQL server. -You can change this behavior through the **build-db** command-line parameter, -it takes the following options: - * never: *never* creates the database, not even if there does not exist one yet. - Use this only if you expect the database to be created through other means, such - as MySQL group replication. - * if-absent: Creates a database only if none exists. (default) - * drop-then-build: Drops the database on startup to recreate it from scratch. - **THIS REMOVES ALL DATA PERMANENTLY. NO RECOVERY POSSIBLE.** - -### Populating the Database -To populate the database with some examples, run the `connectors/fill-examples.sh` script. -When using `docker compose` you can easily do this by running the "examples" profile: -`docker compose --profile examples up` - -## Usage - -Following the installation instructions above, the server may be reached at `127.0.0.1:8000`. -REST API documentation is automatically built and can be viewed at `127.0.0.1:8000/docs`. - - -#### Automatically Restart on Change - -If you want to automatically restart the server when a change is made to a file in the project, use the `--reload` -parameter. -It is important to realize that this also re-initializes the connection to the database, and possibly will do any -start-up work (e.g., populating the database). - -#### Database Structure - -The Python classes that define the database tables are found in [src/database/model/](../src/database/model/). -The structure is based on the -[metadata schema](https://github.com/aiondemand/metadata-schema). - - -## Adding resources - -See [src/README.md](developer/code.md). - -## Backups and Restoration - -We provide several scripts to facilitate the scheduling of backups and the manual restoration of files. For details on these scripts and others, please see [scripts/README.md](scripts/README.md). - -## Releases - -### Breaking changes -Breaking changes of a resource include deleting a field, changing the name of an existing field, -or changing the datatype of a field. Adding new fields is not a breaking change. - -On a breaking change for a resource (e.g. for Dataset), a new router with a new version should -be created. The existing router should be deprecated, and rewritten so that it can handle the -new metadata of the database. This deprecation of a router will be visible in the Swagger -documentation. Calls to a deprecated router will still work, but a response header "Deprecated" -will be added with the deprecation date. The deprecated router will then be deleted on the next -release. - -On non-breaking changes of a resource, a new version is not needed for the corresponding router. - -Example: -- Start www.aiod.eu/api/datasets/v0 -- Release 1: www.aiod.eu/api/datasets/v0 (no breaking changes) -- Release 2: - - www.aiod.eu/api/datasets/v0 (deprecated) - - www.aiod.eu/api/datasets/v1 -- Release 3: www.aiod.eu/api/datasets/v1 - -### Database migration - -The database should always be up-to-date with the latest version of the metadata. As database -migration tool, [Alembic](https://alembic.sqlalchemy.org/en/latest/) is the default choice for -SQLAlchemy. The setup of this db migration for AIOD remains a TODO for now. - -### Changelog - -As changelog we use the Github tags. For each release, a release branch should be created with a -bumped version in the pyproject.toml, and merged with the master. The tag should contain a -message detailing all the breaking and non-breaking changes. This message should adhere to the -guiding principles as described in https://keepachangelog.com/. - -- Show all tags: https://github.com/aiondemand/AIOD-rest-api/tags -- Show a specific tag: https://github.com/aiondemand/AIOD-rest-api/releases/tag/0.3.20220501 - -This information can also be extracted using the Github REST API. - - -### Create a release -To create a new release, -1. Make sure all requested functionality is merged with the `develop` branch. -2. From develop: `git checkout -b release/[VERSION]`. Example of version: `1.1.20231129` -3. Update the version in `pyproject.toml`. -4. Test all (most of) the functionality. Checkout the project in a new directory and remove all - your local images, and make sure it works out-of-the box. -5. Go to https://github.com/aiondemand/AIOD-rest-api/releases and draft a new release from the - release branch. Look at all closed PRs and create a changelog -6. Create a PR from release branch to master -7. After that's merged, create a PR from master to develop -8. Deploy on the server(s): - - Check which services currently work (before the update). It's a sanity check for if a service _doesn't_ work later. - - Update the code on the server by checking out the release - - Merge configurations as necessary - - Make sure the latest database migrations are applied: see ["Schema Migrations"](developer/migration.md#update-the-database) -9. Notify everyone (e.g., in the API channel in Slack). +### Acknowledgement +Funded by the European Union 🇪🇺 diff --git a/docs/developer/index.md b/docs/developer/index.md new file mode 100644 index 00000000..6314e232 --- /dev/null +++ b/docs/developer/index.md @@ -0,0 +1,253 @@ +# Metadata Catalogue API + +!!! note + + This page was the old readme. Re-organizing and updating it into our structured documentation + pages is work in progress. This page will serve as an overview page that serves as a + "getting started" page and quick reference, with references to pages with in-depth information. + +This repository contains the AI on Demand (AIoD) REST API. It is built with +[FastAPI](https://fastapi.tiangolo.com/) +that interacts with a database ([MySQL](https://hub.docker.com/_/mysql)) +and [OpenML's REST API](https://www.openml.org/apis). +Both the database and the REST API are run from docker in separate containers. + +The AIoD REST API will allow any kind of service to interact with the AIoD portal to discover, +retrieve, and share AI resources. It forms the connection between user-facing components, such +as the AIoD website or Python Client API, and the backend. The metadata for datasets, models +and other resources can be accessed, added, updated and deleted through this API. + +## Architecture +All metadata is stored in the AIoD metadata database. For every instance of the API (there will +be multiple running instances, for reliability and performance), an instance of this database +will run on the same machine (on the same node). The type of database is not yet determined, for +now we use a simple MySQL database. + +The metadata is stored in AIoD format. When the user requests an item, such as a dataset, it can +be returned in AIoD format, or converted to any supported format, as requested by the user. For +datasets, we will for instance support schema.org and DCAT-AP. + +Requesting a dataset will therefore be simply: + +![Get dataset UML](../media/GetDatasetUML.png) + +To fill the database, a synchronization process must be running continuously for every platform +(e.g. HuggingFace or OpenML). This synchronization service of a platform will be deployed at a +single node. The synchronization service queries its platform for updates, converts the metadata +to the AIoD format and updates the database. + +Note that this synchronization process between the platform and the database, is different from +the synchronization between database instances. The latter is under discussion in the AIoD +Synchronization Meetings. + + +## Prerequisites +- Linux/MacOS/Windows (should all work) +- [Docker](https://docs.docker.com/get-docker/) +- [Docker Compose](https://docs.docker.com/compose/install/) version 2.21.0 or higher + +For development: +- `Python3.11` with `python3.11-dev` (`sudo apt install python3.11-dev` on Debian) +- Additional 'mysqlclient' dependencies. Please have a look at [their installation instructions](https://github.com/PyMySQL/mysqlclient#install). + +## Production environment + +For production environments elasticsearch recommends -Xss4G and -Xmx8G for the JVM settings.\ +This parameters can be defined in the .env file. +See the [elasticsearch guide](https://www.elastic.co/guide/en/logstash/current/jvm-settings.html). + +For Keycloak, the `--http-enabled=true` and `--hostname-strict-https=false` should be omitted +from the docker-compose file. + +## Installation + +This repository contains two systems; the database and the REST API. +As a database we use a containerized MySQL server (through Docker), the REST API can be run locally or containerized. +Information on how to install Docker is found in [their documentation](https://docs.docker.com/desktop/). + +### Using docker compose +```bash +docker compose --profile examples up -d +``` + +starts the MYSQL Server, the REST API, Keycloak for Identity and access management and Nginx for reverse proxying. \ +Once started, you should be able to visit the REST API server at: http://localhost and Keycloak at http://localhost/aiod-auth \ +To authenticate to the REST API swagger interface the predefined user is: user, and password: password \ +To authenticate as admin to Keycloak the predefined user is: admin and password: password \ +To use a different DNS hostname, refer to the ["Changing the configuration"](#changing-the-configuration) section below for instructions on how to ovverride `HOSTNAME` in `.env` and `opendid_connect_url` in `config.toml`. \ +This configuration is intended for development, DO NOT use it in production. + +To turn if off again, use +```bash +docker compose --profile examples down +``` + +To connect to the database use `./scripts/database-connect.sql`. + +```bash +mysql> SHOW DATABASES; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mysql | +| performance_schema | +| sys | ++--------------------+ +4 rows in set (0.03 sec) +``` + +Now, you can visit the server from your browser at `localhost:8000/docs`. + + +### Changing the configuration +You may need to change the configuration locally, for example if you want different ports to be used. +Do not change files, instead add overrides. + +#### Docker Compose +For docker compose, the environment variables are defined in the `.env` file. +To override variables, for example `AIOD_LOGSTASH_PORT`, add a new file called `override.env`: +```bash {title='override.env'} +AIOD_LOGSTASH_PORT=5001 +``` +Then also specify this when you invoke docker compose, e.g.: +`docker compose --env-file=.env --env-file=override.env up` +Note that **order is important**, later environment files will override earlier ones. +You may also use the `./scripts/up.sh` script to achieve this (see ["Shorthands"](#shorthands) below). + +#### Config.toml +The main application supports configuration options through a `toml` file. +The defaults can be found at `src/config.default.toml`. +To override them, add a `src/config.override.toml` file. +It follows the same structure as the default file, but you only need to specify the variables to override. + +#### Using connectors +You can specify different connectors using + +```bash +docker compose --profile examples --profile huggingface-datasets --profile openml --profile zenodo-datasets up -d +docker compose --profile examples --profile huggingface-datasets --profile openml --profile zenodo-datasets down +``` + +Make sure you use the same profile for `up` and `down`, or use `./scripts/down.sh` (see below), +otherwise some containers might keep running. + +### Shorthands +We provide two auxiliary scripts for launching docker containers and bringing them down. +The first, `./scripts/up.sh` invokes `docker compose up -d` and takes any number of profiles to launch as parameters. +It will also ensure that the changes of the configurations (see above) are observed. +If `USE_LOCAL_DEV` is set to `true` (e.g., in `override.env`) then your local source code will be mounted on the containers, +this is useful for local development but should not be used in production. +E.g., with `USE_LOCAL_DEV` set to `true`, `./scripts/up.sh` resolves to: +`docker compose --env-file=.env --env-file=override.env -f docker-compose.yaml -f docker-compose.dev.yaml --profile examples up -d` + +The second script is a convenience for bringing down all services, including all profiles: `./scripts/down.sh` + +#### Local Installation + +If you want to run the server locally, you need **Python 3.11**. +We advise creating a virtual environment first and install the dependencies there: + +```bash +python3.11 -m venv venv +source venv/bin/activate +python -m pip install . +``` + +For development, you will need to install the optional dependencies as well: + +```bash +source venv/bin/activate +python -m pip install ".[dev]" +``` + +Moreover, you are encouraged to install the pre-commit hooks, so that black, mypy and the unittests +run before every commit: +```bash +pre-commit install +``` +You can run +```bash +pre-commit run --all-files +``` +To run pre-commit manually. + +After installing the dependencies you can start the server. You have 3 options: + +1. Run from your machine: +```bash +cd src +python main.py --reload +``` +The `--reload` argument will automatically restart the app if changes are made to the source files. +2. Run using docker. For instance using `scripts/run_apiserver.sh` +3. Run using DevContainer (see next subsection) + +### Authentication +Currently, the code is by default running using the local Keycloak. To make +this work, you need to set an environment variable. You can do this by setting the +`KEYCLOAK_CLIENT_SECRET` in `src/.env`. + +```bash +# src/.env +KEYCLOAK_CLIENT_SECRET=[SECRET] +``` + +Alternatively, you can connect to a different keycloak instance by modifying `src/.env`. EGI +Checkin can for instance be used on a deployed instance - not on local host. Marco Rorro is the +go-to person to request the usage of the EGI Checkin. + +The reason that EGI Checkin doesn't work on localhost, is that the redirection url of EGI +Checkin is strict - as it should be. On our development keycloak, any redirection url is +accepted, so that it works on local host or wherever you deploy. This should never be the case +for a production instance. + +See [authentication README](developer/auth.md) for more information. + +### Creating the Database + +By default, the app will create a database on the provided MySQL server. +You can change this behavior through the **build-db** command-line parameter, +it takes the following options: + * never: *never* creates the database, not even if there does not exist one yet. + Use this only if you expect the database to be created through other means, such + as MySQL group replication. + * if-absent: Creates a database only if none exists. (default) + * drop-then-build: Drops the database on startup to recreate it from scratch. + **THIS REMOVES ALL DATA PERMANENTLY. NO RECOVERY POSSIBLE.** + +### Populating the Database +To populate the database with some examples, run the `connectors/fill-examples.sh` script. +When using `docker compose` you can easily do this by running the "examples" profile: +`docker compose --profile examples up` + +## Usage + +Following the installation instructions above, the server may be reached at `127.0.0.1:8000`. +REST API documentation is automatically built and can be viewed at `127.0.0.1:8000/docs`. + + +#### Automatically Restart on Change + +If you want to automatically restart the server when a change is made to a file in the project, use the `--reload` +parameter. +It is important to realize that this also re-initializes the connection to the database, and possibly will do any +start-up work (e.g., populating the database). + +#### Database Structure + +The Python classes that define the database tables are found in [src/database/model/](../src/database/model/). +The structure is based on the +[metadata schema](https://github.com/aiondemand/metadata-schema). + + +## Adding resources + +See [src/README.md](developer/code.md). + +## Backups and Restoration + +We provide several scripts to facilitate the scheduling of backups and the manual restoration of files. For details on these scripts and others, please see [scripts/README.md](scripts/README.md). + +## Releases + diff --git a/docs/developer/releases.md b/docs/developer/releases.md new file mode 100644 index 00000000..7dac2894 --- /dev/null +++ b/docs/developer/releases.md @@ -0,0 +1,63 @@ +# Releasing and Versioning + +The project loosely uses [Semantic Versions](https://semver.org) where the patch/micro number matches the release date. + +## Breaking changes + +!!! note "Work in Progress" + + Guidelines in "Breaking Changes" are the desired workflow, but in practice we are not always following them + as 1) the metadata model wasn't yet matured and 2) the infrastructure for this needs to be + developed. For now, we make sure all URLs are at least under a version suffix, which makes + support in the future possible. + +Breaking changes of a resource include deleting a field, changing the name of an existing field, +or changing the datatype of a field. Adding new fields is not a breaking change. + +On a breaking change for a resource (e.g. for Dataset), a new router with a new version should +be created. The existing router should be deprecated, and rewritten so that it can handle the +new metadata of the database. This deprecation of a router will be visible in the Swagger +documentation. Calls to a deprecated router will still work, but a response header "Deprecated" +will be added with the deprecation date. The deprecated router will then be deleted on the next +release. + +On non-breaking changes of a resource, a new version is not needed for the corresponding router. + +Example: +- Start www.aiod.eu/api/datasets/v0 +- Release 1: www.aiod.eu/api/datasets/v0 (no breaking changes) +- Release 2: + - www.aiod.eu/api/datasets/v0 (deprecated) + - www.aiod.eu/api/datasets/v1 +- Release 3: www.aiod.eu/api/datasets/v1 + +## Changelog + +As changelog we use the Github tags. For each release, a release branch should be created with a +bumped version in the pyproject.toml, and merged with the master. The tag should contain a +message detailing all the breaking and non-breaking changes. This message should adhere to the +guiding principles as described in https://keepachangelog.com/. + +- Show all tags: https://github.com/aiondemand/AIOD-rest-api/tags +- Show a specific tag: https://github.com/aiondemand/AIOD-rest-api/releases/tag/0.3.20220501 + +This information can also be extracted using the Github REST API. + + +## Creating a release +To create a new release, +1. Make sure all requested functionality is merged with the `develop` branch. +2. From develop: `git checkout -b release/[VERSION]`. Example of version: `1.1.20231129` +3. Update the version in `pyproject.toml`. +4. Test all (most of) the functionality. Checkout the project in a new directory and remove all + your local images, and make sure it works out-of-the box. +5. Go to https://github.com/aiondemand/AIOD-rest-api/releases and draft a new release from the + release branch. Look at all closed PRs and create a changelog +6. Create a PR from release branch to master +7. After that's merged, create a PR from master to develop +8. Deploy on the server(s): + - Check which services currently work (before the update). It's a sanity check for if a service _doesn't_ work later. + - Update the code on the server by checking out the release + - Merge configurations as necessary + - Make sure the latest database migrations are applied: see ["Schema Migrations"](developer/migration.md#update-the-database) +9. Notify everyone (e.g., in the API channel in Slack). diff --git a/docs/developer/schema/attributes.md b/docs/developer/schema/attributes.md new file mode 100644 index 00000000..9c00b46b --- /dev/null +++ b/docs/developer/schema/attributes.md @@ -0,0 +1,120 @@ +# Working with Attributes +In the metadata schema, every object has attributes that are represented by simple types like strings and numbers. +For example,`News` objects have a `source` and `Publication` objects have a `title`. +This page details how to change attributes on existing metadata types. + +Here is an example of the code that defines the `start_date` attribute of the `Event` class: +```python +class EventBase(AIResourceBase): + start_date: datetime | None = Field( + description="The start date and time of the event, formatted using the ISO 8601 date-time " + "format.", + default=None, + schema_extra={"example": "2021-02-03T15:15:00"}, + ) +``` +Let's unpack this statement. +The `start_date: datetime` defines the name of the attribute (`start_date`) and the type of the attribute (`datetime | None`, an optional datetime object). +[Python type hints](https://docs.python.org/3/library/typing.html) are used by `Pydantic` to do input validation, and by `SQLAlchemy` to infer column types in the database - this all happens "under the hood" by `SQLModel`. +The `Field` object allows to define additional information about the attribute: + + - The content of `description` will be shown on the generated documentation pages, and should clear up any ambiguity on how to interpret the attribute. + - The `default` parameter may be specified to provide a default value that will be used for the attribute if no explicit value is provided. + - The `schema_extra` parameter may specify additional information, the most important of which is `example`. The `example` value will be used in the formatting of example responses generated by the Swagger and ReDoc pages. + +## Adding an attribute to an existing asset type + +This guide explains how to add a simple attribute, i.e., an attribute which does not refer to any other table, to an existing asset type. + +Adding an attribute to an asset requires three things: + + - An update to the schema in Python, i.e., a change to the SQLModel object. + - Added or updated tests that reflect the change. + - The addition of a migration script, which specifies how the database schema should be updated. + +This guide will discuss them in order, using the example of adding a `source` attribute to the `News` asset (see also [pull request](https://github.com/aiondemand/AIOD-rest-api/pull/395)). + +### Updating the schema in Python + +Navigate to the asset definition you want to change. For simple attributes this is the class that ends in "Base", e.g., "NewsBase". There, add the attribute and define its metadata through adding type hints and setting it to a [SQLModel Field](https://sqlmodel.tiangolo.com/tutorial/create-db-and-table/#define-the-fields-columns). There are already examples in the code base of many different types of constraints (e.g., string minimum or maximum lengths, specifying defaults, and so on). + +???- info "Working with Strings" + + In Python there is no limit to the amount of characters in a string. + However, when working with a database it may be wise to put constraints on the length of the string. + This is more efficient and may in some cases lead to avoiding unwanted mistakes. + When picking a string length, please choose from the pre-existing options in `src.database.model.field_length.py`. + If in doubt, go for a larger size. + + ???- note "Example: event name" + + Assume that we wanted to add a "name" field to `Event` to store its name. + We might consider a few examples names, such as "Hacktoberfest" or "Forty-Second International Conference on Machine Learning". + Both fall under 64 characters (`SHORT`), but the latter is already cutting it close. + In that case, it's probably smart to go one bigger: 256 characters (`NORMAL`). + + ```python + from database.model.field_length import NORMAL + + class Event(...): + name: str = Field(max_length=NORMAL, schema_extra={"example": "The name of this event."}) + ``` + + +### Adding or updating tests that reflect the change +Most of the time, it should be sufficient to navigate to the tests of the resource router for the resource you are trying to edit. In many cases, there is only one such test in a file named `test_router_ASSET_TYPE.py` where `ASSET_TYPE` is e.g., "news", and the one test is called "test_happy_path". Here, you may add or update a line that tests setting the specific attribute that is added. + +### Adding a migration script +We start off by having [Alembic](https://alembic.sqlalchemy.org/en/latest/) generate a new migration script. +Follow the ["Using Alembic"]() instructions to generate a template migration script. +In this script, you will find two empty functions: + + +```python +def upgrade() -> None: + pass + + +def downgrade() -> None: + pass +``` + +All you have to do to finish the migration script is implement these two functions. +The `upgrade` function specifies the instruction to go from the old database schema (without the added attribute) to the new one (with the added attribute). +Below, you'll find some examples. + +=== "Nullable String" + + ```python + def upgrade() -> None: + op.add_column( + table_name="news", + column=Column("source", String(LONG), nullable=True), + ) + + + def downgrade() -> None: + op.drop_column(table_name="news", column_name="source") + ``` + +=== "Required Integer" + + ```python + def upgrade() -> None: + op.add_column( + table_name="event", + column=Column("max_participants", int, nullable=False), + ) + + + def downgrade() -> None: + op.drop_column(table_name="event", column_name="max_participants") + ``` + + +Pay attention to: + - Specifying whether or not the column is nullable. `sqlalchemy.Column`s are nullable by default. We strongly encourage you to make this explicit. + - The name of the column. Getting the name of the table wrong should result in an error during migration, but getting the name of the column wrong will simply lead to unexpected errors in the REST API. + - Any other constraints, such as the maximum length of a string. + +Note that downgrading procedures may lead to a loss of data. This is unavoidable. diff --git a/docs/developer/schema/index.md b/docs/developer/schema/index.md new file mode 100644 index 00000000..c7375522 --- /dev/null +++ b/docs/developer/schema/index.md @@ -0,0 +1,51 @@ +# The AI-on-Demand Metadata Schema +The conceptual AI-on-Demand metadata schema is defined in its own dedicated repository [aiondemand/metadata-schema](https://github.com/aiondemand/metadata-schema). +Questions about the conceptual metadata schema and requests for changes should be directed at that repository instead. + +In the REST API, we have an implementation of the schema defined in our [`src/database/model`](https://github.com/aiondemand/AIOD-rest-api/tree/develop/src/database/model) directory. +For the model implementation we make use of [SQLModel](https://sqlmodel.tiangolo.com/), a layer +on top of the ORM framework [SQLAlchemy](https://www.sqlalchemy.org/) and the serialization, +validation and documentation (creating Swagger) framework [pydantic](https://docs.pydantic.dev/), +created by the developer of FASTApi, the framework we use for routing. + +SQLModel makes it possible to define only a single model instead of defining the database-layer +(SQLAlchemy) and the logic-layer (Pydantic) separately. +Our implementation relies on inheritance to follow the same class hierarchy as defined in the [metadata schema](https://github.com/aiondemand/metadata-schema), +this makes sure that generic fields, such as name and description, are present and consistent over all resources, +and changes to the conceptual model and the model implementation should be similar. + +A partial overview of the metadata model can be found in the +following figure: + +![AIoD Metadata model](../../media/AIoD_Metadata_Model.drawio.png) + + +## Reading the Conceptual Metadata Schema +Tools and documentation on how to read the conceptual metadata model are currently being written. +This section will be updated at a later date (as of 16-12-2024). + +## Reading the Metadata Schema Implementation +This section will be updated at a later date (as of 16-12-2024) and will describe: + - The use various class variants, such as `XBase`, `XORM`, `XCreate`, with a link to the ["objects"](objects.md) page. + - A brief discussion on how to read an attribute definition, with a link to the ["attributes"](attributes.md) page. + - A brief discussion on how to relationships an attribute definition, with a link to the ["relationships"](relationships.md) page. + +## Changing the Metadata Schema Implementation +On a high level, changes to the metadata schema implementation consist of three steps: + + * updating the schema implementation in [`src/database/model`](https://github.com/aiondemand/AIOD-rest-api/tree/develop/src/database/model), + * updating or adding tests which test those changes, and + * adding a [database migration script]() which updates the database accordingly. + +This last step isn't needed during development, where you may recreate a database anytime to model changes. +However, to deploy the changed schema in production we need to be able to change the database, +both its schema and its content, to match the schema defined by the Python classes. +For this reason, a migration script is also _required_ when making changes to the metadata schema implementation. + +The subsections in the sidebar document how to execute these steps depending on the type of change you want to make (work in progress): + + - [Attributes](attributes.md) explains how to work with attributes that do not refer to any external tables. For example, adding a field which stores a URL. + - [Relationships](relationships.md) explains how to work with attributes which define relationships between objects. For example, an asset's creator which is represented with a link to an `Agent`. + - [Objects](objects.md) explains how work with objects as a whole. For example, adding an entirely new entity to the schema. + + diff --git a/docs/developer/schema/objects.md b/docs/developer/schema/objects.md new file mode 100644 index 00000000..d8086bff --- /dev/null +++ b/docs/developer/schema/objects.md @@ -0,0 +1 @@ +to be added \ No newline at end of file diff --git a/docs/developer/schema/relationships.md b/docs/developer/schema/relationships.md new file mode 100644 index 00000000..d8086bff --- /dev/null +++ b/docs/developer/schema/relationships.md @@ -0,0 +1 @@ +to be added \ No newline at end of file diff --git a/mkdocs.yaml b/mkdocs.yaml index 57bc4bcc..2043a60d 100644 --- a/mkdocs.yaml +++ b/mkdocs.yaml @@ -6,9 +6,17 @@ theme: - content.code.copy nav: + - Home: README.md - Using the API: Using.md - Hosting the API: Hosting.md - - 'Developer Resources': README.md + - 'Developer Resources': + - developer/index.md + - 'Metadata Schema': + - developer/schema/index.md + - 'Attributes': developer/schema/attributes.md + - 'Relationships': developer/schema/relationships.md + - 'Objects': developer/schema/objects.md + - 'Contributing': Contributing.md - 'Unorganized Docs': - 'Code Advice': developer/code.md - 'Keycloak': developer/auth.md @@ -22,4 +30,7 @@ markdown_extensions: - pymdownx.details - pymdownx.superfences - pymdownx.tabbed: - alternate_style: true \ No newline at end of file + alternate_style: true + +plugins: + - section-index \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index 3428db81..a8a8ac54 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -51,6 +51,10 @@ dev = [ "responses==0.25.3", "freezegun==1.5.1", ] +docs = [ + "mkdocs-material", + "mkdocs-section-index", +] [tool.setuptools] py-modules = []