From 919b91961fdfc93bc858b21fca02fec415d9fefa Mon Sep 17 00:00:00 2001 From: Mitchell Shiell <59712867+MitchellShiell@users.noreply.github.com> Date: Wed, 14 Feb 2024 09:13:32 -0500 Subject: [PATCH 1/3] Initial Docs commit --- docs/contribution/contribution.md | 59 +++++++++++ docs/index.md | 47 +++++++++ docs/operation/operation.md | 169 ++++++++++++++++++++++++++++++ 3 files changed, 275 insertions(+) create mode 100644 docs/contribution/contribution.md create mode 100644 docs/index.md create mode 100644 docs/operation/operation.md diff --git a/docs/contribution/contribution.md b/docs/contribution/contribution.md new file mode 100644 index 00000000..50a1ec27 --- /dev/null +++ b/docs/contribution/contribution.md @@ -0,0 +1,59 @@ + +**Navigation** + +- [Background](../index.md) +- [Operation](../operation/operation.md) + +--- + +# Contribution Guidelines + +We appreciate your interest in contributing to the Overture! Please take the time to read and follow these guidelines before submitting your contributions. By participating in this project, you are expected to abide by the [Code of Conduct](https://github.com/overture-stack/SONG/blob/readme-update/code_of_conduct.md). Please take the time to read it carefully before contributing. + +## Running Song from Source Code + +To contribute to Song, you'll need to run the application from the source code. Here's how you can do that: + +1. **Clone the Repository**: First, clone the SONG repository to your local machine using `git clone https://github.com/overture-stack/SONG.git`. + +2. **Install Dependencies**: Navigate to the cloned directory and install all required dependencies. + +3. **Run the Application Locally**: After installing the dependencies, start the application locally. This usually involves running a command like `npm start` or `python manage.py runserver`, again depending on the language and framework used. + +4. **Test Your Changes**: Before submitting a pull request, make sure to test your changes thoroughly. Run the automated tests (if available) and perform manual testing to ensure that your contribution does not introduce regressions. + +## Issues and Feature Requests + +Before opening a new issue or feature request, please check if a similar one already exists. If so, please add a comment to the existing issue instead of creating a new one. When submitting an issue or feature request, please include a detailed description of the problem or feature you would like to see, along with any relevant code, error messages, or screenshots. This will help us better understand the issue and respond more efficiently. + +### Pull Requests + +We welcome and encourage pull requests from the community. To submit a pull request, please follow these steps: + +1. **Fork the Repository**: Fork the Song repository on GitHub. +2. **Clone Your Fork**: Clone your forked repository to your local machine. +3. **Create a New Branch**: Create a new branch for your changes. +4. **Make Your Changes**: Implement your changes and commit them to your branch. +5. **Push Your Changes**: Push your changes to your forked repository. +6. **Submit a Pull Request**: Open a pull request against the main repository. + +Please ensure your code adheres to the following guidelines before submission: + +- The code should be well-documented and readable. +- The code should be tested. +- Include a clear description of the changes made and the reason for the changes in the pull request. + +For any questions or discussions regarding your pull request, please refer to the GitHub issues system or join our [Slack channel](http://slack.overture.bio/) for further assistance. + +We use GitHub issues and pull requests for communication-related to code changes. For general discussion, feel free to join our [Slack channel](http://slack.overture.bio/). + +Thank you for contributing to Overture-Stack! + +--- + +**Navigation** + +- [Background](../index.md) +- [Operation](../operation/operation.md) + +--- \ No newline at end of file diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000..c7ffa3fb --- /dev/null +++ b/docs/index.md @@ -0,0 +1,47 @@ +# Song Developer Documentation + +--- + +**Navigation** +- [Operation](./operation/operation.md) +- [Contribution](./contribution/contribution.md) + +--- + +# Background + +Song is a metadata validation and tracking tool designed to streamline the management of genomics data across multiple cloud storage systems. With Song, users can create high-quality and reliable metadata repositories with minimal human intervention. As a metadata management system, Song does not handle file transfer and object storage. Song interacts with a required companion application, Score, which manages file transfers and object storage. + +## Data Submission + +**Analysis Files:** An analysis is a description of a set of one or more files plus the metadata describing that collection of files. + +**Metadata Validation:** Analyses get validated against the administrator's Dynamic Schema. That defines the vocabulary and structure of the analysis document. + +**Tracking of Metadata to File Data:** Once validated, the analysis document is stored in the Song repository and given an automated analysis ID. The analysis ID is then used when uploading all associated file data through Score. Analysis IDs associate the metadata stored in Song with the file data being transferred by score and stored in the cloud. + +## Data Administration + +**Dynamic Schemas:** The data administrator creates the Dynamic Schema, which again, provides the vocabulary for the structural validation of JSON formatted data (Analysis documents), for example, ensuring that required fields are present or that the contents of a field match the desired data type or allowed values. + +**Data Lifecycle Management:** Analyses uploaded to a Song repository are `UNPUBLISHED` by default. When data is ready for search and download, administrators can make it available by updating it to a `PUBLISHED` state. If data is no longer relevant, the data administrators can set it to a `SUPPRESSED` state, making it unavailable for search and download through downstream services. + +We created the `song-client` command line tool to streamline interactions with Songs REST API endpoints. For more information on what the `song-client` can do, see our [Song client command reference documentation](/documentation/song/reference/commands/). + +## Integrations + +As part of the larger Overture.bio software suite, Song can be optionally used with additional integrations, including: + +- **Event Streaming:** Built-in support for Apache Kafka event streaming allows other services to respond when analyses are registered and published. + + +- **Maestro Indexing:** Song is built to natively integrate with Maestro, which will easily index data into a configurable Elasticsearch index, to be used for convenient searching of data. + +--- + +**Navigation** + +- [Operation](./operation/operation.md) +- [Contribution](./contribution/contribution.md) + +--- \ No newline at end of file diff --git a/docs/operation/operation.md b/docs/operation/operation.md new file mode 100644 index 00000000..b2d3ecb9 --- /dev/null +++ b/docs/operation/operation.md @@ -0,0 +1,169 @@ + +**Navigation** + +- [Background](../index.md) +- [Contribution](../contribution/contribution.md) + +--- + +# Operational Docs + +Welcome to the Operational Documentation for setting up and managing the Score Server in a development environment. This document provides detailed steps and configurations needed to get your development environment up and running with either a standalone setup or using Docker. This page also includes instructions for integrating Keycloak for authentication and authorization. + +## On this page + +- [Setting up the Development Environment](#setting-up-the-development-environment) + - [Standalone score-server](#standalone-song-server) + - [Clone the Score Repository](#clone-the-song-repository) + - [Build](#build) + - [Start the Server](#start-the-server) + - [Docker for Song](#docker-for-song) + - [Start Song-server and all dependencies](#start-song-server-and-all-dependencies) + - [Start the Song-server (Mac M1 Users)](#start-the-song-server-mac-m1-users) + - [Stop Song-server and clean up](#stop-song-server-and-clean-up) +- [Integrating Keycloak](#integrating-keycloak) + - [Standalone](#standalone) + +# Developer Setup + +## Setting up the development environment + +There are two ways to set up a song-server in a development environment: +​ +- As a **[Standalone Server](#standalone-song-server)** (requires dependent services) +- Or in a **[Docker environment](#docker-for-song)** + +## Standalone song-server + +### Clone the Song Repository + +Clone the Song repository to your local computer: + +```bash +git clone https://github.com/overture-stack/SONG.git +``` + +### Build + +[JDK11](https://www.oracle.com/ca-en/java/technologies/downloads/) and [Maven3](https://maven.apache.org/download.cgi) are required to set up this service. +​ +To build the song-server run the following command from the Song directory: + +```bash +./mvnw clean install -DskipTests +``` + +### Start the server + +Before running your song-server, ensure that your local machine is connected and running the following dependent services: +​ + +- Song database (default localhost:5432) +- Kafka (default localhost:9092). Required only if _kafka_ Spring profile is enabled +- Ego or Keycloak server is required +- Score server is required only if the _score-client-cred_ Spring profile is enabled + ​ + +Set the configuration of above dependent services on **song-server/​src/main/resources/application.yml** and make sure to use the profiles acording your needs. + +**Profiles** +| Profile | Description | +| - | - | +| _secure_ | Required to secure endpoints | +| _noSecurityDev_ | To not secure endpoints | +| _kafka_ | Required to send messages to Kafka | +| _default_ | To not send messages to Kafka | +| _score-client-cred_ | Required to set score server credentials | + +Run the following command to start the song-server: +​ + +```bash +cd song-server/ +mvn spring-boot:run -Dspring-boot.run.profiles=noSecurityDev,default,score-client-cred +``` + +> **Warning:** +> This guide is meant to demonstrate the configuration and usage of SONG for development purposes and is **_not intended for production_**. If you ignore this warning and use this in any public or production environment, please remember to use Spring profiles accordingly. For production, use the following profiles: **Kafka,secure,score-client-cred**. + +### Docker for Song + +Several _make_ targets are provided for locally deploying the dependent services using Docker. As the developer, you can replicate a live environment for **song-server** and **song-client**. Using Docker allows you to develop locally, test submissions, create manifests, publish, unpublish and test score uploads/downloads in an isolated environment. +​ +For more information on the different targets, run `make help` or read the comments above each target for a description. +​ + +> Note: +> We will need an internet connection for the _make_ command, which may take several minutes to build. No external services are required for the _make_ command. + +### Start song-server and all dependencies. + +To start song-server and all dependencies, use the following command: +​ + +```bash +make clean start-song-server +``` + +### Start the song-server (Mac M1 Users) + +On a Mac M1 you must set the Docker BuildKit environment variable to the legacy builder. +​ + +```bash +DOCKER_BUILDKIT=0 make clean start-song-server +``` + +### Stop song-server and clean up + +To clean everything, including killing all services, maven cleaning, and removing generated files/directories, use the following command: +​ + +```bash +make clean +``` + +> **Warning** +> Docker for Song is meant to demonstrate the configuration and usage of Song, and is **_not intended for production_**. If you ignore this warning and use this in any public or production environment, please remember to change the passwords, accessKeys, and secretKeys. + +## Integrating Keycloak + +[Keycloak](https://www.keycloak.org/) is an open-source identity and access management solution that can be used to manage users and application permissions. You can find basic information on integrating Score and Keycloak using docker from our user docs [located here](https://www.overture.bio/documentation/song/configuration/authentication/). For a comprehensive guide on installing and configuring Keycloak, refer to the [Keycloak documentation](https://www.keycloak.org/documentation). + +### Standalone: + +If you’re building song using the the source code, the following configuration is required in _song-server/src/main/resources/application.yml_ + +```bash +auth: + server: + # check API Key endpoint + url: http://localhost/realms/myrealm/apikey/check_api_key/ + tokenName: apiKey + clientID: song + clientSecret: songsecret + provider: keycloak + # Keycloak config + keycloak: + host: http://localhost + realm: "myrealm" + +spring: + security: + oauth2: + resourceserver: + jwt: + # EGO public key + #public-key-location: "http://localhost:9082/oauth/token/public_key" + # Keycloak JWK + jwk-set-uri: http://localhost/realms/myrealm/protocol/openid-connect/certs +``` + +--- + +**Navigation** + +- [Background](../index.md) +- [Contribution](../contribution/contribution.md) + +--- \ No newline at end of file From 1c13f03525baa8ae805acdb1eca342a2f615e343 Mon Sep 17 00:00:00 2001 From: Mitchell Shiell <59712867+MitchellShiell@users.noreply.github.com> Date: Wed, 28 Feb 2024 10:26:29 -0500 Subject: [PATCH 2/3] Update docs/index.md Co-authored-by: Jon Eubank --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index c7ffa3fb..f63e99aa 100644 --- a/docs/index.md +++ b/docs/index.md @@ -16,7 +16,7 @@ Song is a metadata validation and tracking tool designed to streamline the manag **Analysis Files:** An analysis is a description of a set of one or more files plus the metadata describing that collection of files. -**Metadata Validation:** Analyses get validated against the administrator's Dynamic Schema. That defines the vocabulary and structure of the analysis document. +**Metadata Validation:** Analyses get validated against a metadata schema that defines the vocabulary and structure of the analysis document. Song allows administrators to define custom schemas that describe the Analyses they intend to manage. **Tracking of Metadata to File Data:** Once validated, the analysis document is stored in the Song repository and given an automated analysis ID. The analysis ID is then used when uploading all associated file data through Score. Analysis IDs associate the metadata stored in Song with the file data being transferred by score and stored in the cloud. From 1b37b8c8fb92954c87e15a9b6f3d52bd2b698899 Mon Sep 17 00:00:00 2001 From: Mitchell Shiell <59712867+MitchellShiell@users.noreply.github.com> Date: Wed, 28 Feb 2024 10:47:18 -0500 Subject: [PATCH 3/3] updated based on PR feedback --- docs/index.md | 26 ++++++++++++++++---------- docs/operation/operation.md | 2 +- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/docs/index.md b/docs/index.md index c7ffa3fb..1dc3a713 100644 --- a/docs/index.md +++ b/docs/index.md @@ -10,21 +10,29 @@ # Background -Song is a metadata validation and tracking tool designed to streamline the management of genomics data across multiple cloud storage systems. With Song, users can create high-quality and reliable metadata repositories with minimal human intervention. As a metadata management system, Song does not handle file transfer and object storage. Song interacts with a required companion application, Score, which manages file transfers and object storage. +Song is a metadata validation and tracking tool designed to streamline the management of genomics data across multiple cloud storage systems. It functions as a file catalog tracking files and managing their metadata, without handling file transfer and object storage itself. Song interacts with a required companion application, [Score](https://github.com/overture-stack/score), which manages file transfers and object storage. ## Data Submission -**Analysis Files:** An analysis is a description of a set of one or more files plus the metadata describing that collection of files. +**Analyses:** An analysis is a collection of one or more files, along with the metadata that describes this collection. -**Metadata Validation:** Analyses get validated against the administrator's Dynamic Schema. That defines the vocabulary and structure of the analysis document. +**Metadata Validation:** Analyses get validated against a metadata schema that defines the vocabulary and structure of the analysis document. Song allows administrators to define custom schemas that describe the Analyses they intend to manage. -**Tracking of Metadata to File Data:** Once validated, the analysis document is stored in the Song repository and given an automated analysis ID. The analysis ID is then used when uploading all associated file data through Score. Analysis IDs associate the metadata stored in Song with the file data being transferred by score and stored in the cloud. +**Tracking of Metadata to File Data:** Once validated, the analysis document is stored in the Song repository and assigned an automated analysis ID. This ID is then used when uploading all associated file data through Score. The analysis ID links the metadata stored in Song with the file data being transferred by Score and stored in the cloud. ## Data Administration -**Dynamic Schemas:** The data administrator creates the Dynamic Schema, which again, provides the vocabulary for the structural validation of JSON formatted data (Analysis documents), for example, ensuring that required fields are present or that the contents of a field match the desired data type or allowed values. +**Dynamic Schemas:** With Song, data administrators can create Dynamic Schemas for multiple types of analyses. These schemas define the vocabulary for the structural validation of JSON formatted data. -**Data Lifecycle Management:** Analyses uploaded to a Song repository are `UNPUBLISHED` by default. When data is ready for search and download, administrators can make it available by updating it to a `PUBLISHED` state. If data is no longer relevant, the data administrators can set it to a `SUPPRESSED` state, making it unavailable for search and download through downstream services. +This ensures that: + +- All required fields are included upon submission. +- The contents of each field match the expected data type. +- Only allowed values (enums) are used. + +**Data Lifecycle Management:** Analyses uploaded to a Song repository are `UNPUBLISHED` by default. To make data available for search and download, administrators update it to a `PUBLISHED` state. If data is no longer relevant, it can be set to a `SUPPRESSED` state, making it unavailable for search and download through downstream services. + +**Note on File Availability:** An analysis cannot be published unless all its associated files have been uploaded to and are available with Score. This ensures that all published analyses have their files available for download through Score. We created the `song-client` command line tool to streamline interactions with Songs REST API endpoints. For more information on what the `song-client` can do, see our [Song client command reference documentation](/documentation/song/reference/commands/). @@ -32,15 +40,13 @@ Song is a metadata validation and tracking tool designed to streamline the manag As part of the larger Overture.bio software suite, Song can be optionally used with additional integrations, including: -- **Event Streaming:** Built-in support for Apache Kafka event streaming allows other services to respond when analyses are registered and published. +- **Event Streaming:** Built-in support for [Apache Kafka](https://kafka.apache.org/) event streaming allows other services to respond when analyses are registered and published. - -- **Maestro Indexing:** Song is built to natively integrate with Maestro, which will easily index data into a configurable Elasticsearch index, to be used for convenient searching of data. +- **Maestro Indexing:** Song is built to natively integrate with [Maestro](https://github.com/overture-stack/maestro), which will easily index data into a configurable Elasticsearch index, to be used for convenient searching of data. --- **Navigation** - - [Operation](./operation/operation.md) - [Contribution](./contribution/contribution.md) diff --git a/docs/operation/operation.md b/docs/operation/operation.md index b2d3ecb9..1fac1985 100644 --- a/docs/operation/operation.md +++ b/docs/operation/operation.md @@ -8,7 +8,7 @@ # Operational Docs -Welcome to the Operational Documentation for setting up and managing the Score Server in a development environment. This document provides detailed steps and configurations needed to get your development environment up and running with either a standalone setup or using Docker. This page also includes instructions for integrating Keycloak for authentication and authorization. +This page provides detailed steps and configurations required to set up your development environment with Song, either through a standalone setup or using Docker. Additionally, it includes instructions for integrating Keycloak for authentication and authorization. ## On this page