Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Song Dev Docs Update #838

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,37 +10,43 @@

# Background

Song is a metadata validation and tracking tool designed to streamline the management of genomics data across multiple cloud storage systems. With Song, users can create high-quality and reliable metadata repositories with minimal human intervention. As a metadata management system, Song does not handle file transfer and object storage. Song interacts with a required companion application, <a href="https://github.com/overture-stack/score" target="_blank" rel="noopener noreferrer">Score</a>, which manages file transfers and object storage.
Song is a metadata validation and tracking tool designed to streamline the management of genomics data across multiple cloud storage systems. It functions as a file catalog tracking files and managing their metadata, without handling file transfer and object storage itself. Song interacts with a required companion application, [Score](https://github.com/overture-stack/score), which manages file transfers and object storage.

## Data Submission

**Analysis Files:** An analysis is a description of a set of one or more files plus the metadata describing that collection of files.
**Analyses:** An analysis is a collection of one or more files, along with the metadata that describes this collection.

**Metadata Validation:** Analyses get validated against the administrator's Dynamic Schema. That defines the vocabulary and structure of the analysis document.
**Metadata Validation:** Analyses get validated against a metadata schema that defines the vocabulary and structure of the analysis document. Song allows administrators to define custom schemas that describe the Analyses they intend to manage.

**Tracking of Metadata to File Data:** Once validated, the analysis document is stored in the Song repository and given an automated analysis ID. The analysis ID is then used when uploading all associated file data through Score. Analysis IDs associate the metadata stored in Song with the file data being transferred by score and stored in the cloud.
**Tracking of Metadata to File Data:** Once validated, the analysis document is stored in the Song repository and assigned an automated analysis ID. This ID is then used when uploading all associated file data through Score. The analysis ID links the metadata stored in Song with the file data being transferred by Score and stored in the cloud.

## Data Administration

**Dynamic Schemas:** The data administrator creates the Dynamic Schema, which again, provides the vocabulary for the structural validation of JSON formatted data (Analysis documents), for example, ensuring that required fields are present or that the contents of a field match the desired data type or allowed values.
**Dynamic Schemas:** With Song, data administrators can create Dynamic Schemas for multiple types of analyses. These schemas define the vocabulary for the structural validation of JSON formatted data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if With Song adds any info. Can probably omit.


**Data Lifecycle Management:** Analyses uploaded to a Song repository are `UNPUBLISHED` by default. When data is ready for search and download, administrators can make it available by updating it to a `PUBLISHED` state. If data is no longer relevant, the data administrators can set it to a `SUPPRESSED` state, making it unavailable for search and download through downstream services.
This ensures that:

- All required fields are included upon submission.
- The contents of each field match the expected data type.
- Only allowed values (enums) are used.

**Data Lifecycle Management:** Analyses uploaded to a Song repository are `UNPUBLISHED` by default. To make data available for search and download, administrators update it to a `PUBLISHED` state. If data is no longer relevant, it can be set to a `SUPPRESSED` state, making it unavailable for search and download through downstream services.

**Note on File Availability:** An analysis cannot be published unless all its associated files have been uploaded to and are available with Score. This ensures that all published analyses have their files available for download through Score.

<Note title="The Song Client">We created the `song-client` command line tool to streamline interactions with Songs REST API endpoints. For more information on what the `song-client` can do, see our [Song client command reference documentation](/documentation/song/reference/commands/).</Note>

## Integrations

As part of the larger Overture.bio software suite, Song can be optionally used with additional integrations, including:

- **Event Streaming:** Built-in support for <a href="https://kafka.apache.org/" target="_blank" rel="noopener noreferrer">Apache Kafka</a> event streaming allows other services to respond when analyses are registered and published.
- **Event Streaming:** Built-in support for [Apache Kafka](https://kafka.apache.org/) event streaming allows other services to respond when analyses are registered and published.


- **Maestro Indexing:** Song is built to natively integrate with <a href="https://github.com/overture-stack/maestro" target="_blank" rel="noopener noreferrer">Maestro</a>, which will easily index data into a configurable Elasticsearch index, to be used for convenient searching of data.
- **Maestro Indexing:** Song is built to natively integrate with [Maestro](https://github.com/overture-stack/maestro), which will easily index data into a configurable Elasticsearch index, to be used for convenient searching of data.

---

**Navigation**

- [Operation](./operation/operation.md)
- [Contribution](./contribution/contribution.md)

Expand Down
2 changes: 1 addition & 1 deletion docs/operation/operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

# Operational Docs

Welcome to the Operational Documentation for setting up and managing the Score Server in a development environment. This document provides detailed steps and configurations needed to get your development environment up and running with either a standalone setup or using Docker. This page also includes instructions for integrating Keycloak for authentication and authorization.
This page provides detailed steps and configurations required to set up your development environment with Song, either through a standalone setup or using Docker. Additionally, it includes instructions for integrating Keycloak for authentication and authorization.

## On this page

Expand Down