- Introduction
- Installation
- Other Contribution Guidelines
- Command Line Usage
- Testing
- Code Style
- Contributors
SCHEMATIC is an acronym for Schema Engine for Manifest Ingress and Curation. The Python based infrastructure provides a novel schema-based, metadata ingress ecosystem, that is meant to streamline the process of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors.
- Python 3.7.1 or higher
Note: You need to be a registered and certified user on synapse.org
, and also have the right permissions to download the Google credentials files from Synapse.
Create and activate a virtual environment within which you can install the package:
python3 -m venv .venv
source .venv/bin/activate
Note: Python 3 has a built-in support for virtual environment venv so you no longer need to install virtualenv.
Install and update the package using pip:
python3 -m pip install schematicpy
If you run into error: Failed building wheel for numpy, the error might be able to resolve by upgrading pip. Please try to upgrade pip by:
pip3 install --upgrade pip
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.
Please note we have a code of conduct, please follow it in all your interactions with the project.
- Clone the
schematic
package repository.
git clone https://github.com/Sage-Bionetworks/schematic.git
-
Install
poetry
(version 1.2 or later) using either the official installer or pipx. If you have an older installation of Poetry, we recommend uninstalling it first. -
Start the virtual environment by doing:
poetry shell
- Install the dependencies by doing:
poetry install
This command will install the dependencies based on what we specify in poetry.lock. If this step is taking a long time, try to go back to step 2 and check your version of poetry. Alternatively, you could also try deleting the lock file and regenerate it by doing poetry install
(Please note this method should be used as a last resort because this would force other developers to change their development environment)
- Fill in credential files: Note: If you won't interact with Synapse, please ignore this section.
There are two main configuration files that need to be edited: config.yml and synapseConfig
Configure .synapseConfig File
Download a copy of the .synapseConfig
file, open the file in the
editor of your choice and edit the username
and authtoken
attribute under the authentication
section
Note: You could also visit configparser doc to see the format that .synapseConfig
must have. For instance:
[authentication]
username = ABC
authtoken = abc
Configure config.yml File
Note: Below is only a brief explanation of some attributes in config.yml
. Please use the link here to get the latest version of config.yml
in develop
branch.
Description of config.yml
attributes
definitions:
synapse_config: "~/path/to/.synapseConfig"
service_acct_creds: "~/path/to/service_account_creds.json"
synapse:
master_fileview: "syn23643253" # fileview of project with datasets on Synapse
manifest_folder: "~/path/to/manifest_folder/" # manifests will be downloaded to this folder
manifest_basename: "filename" # base name of the manifest file in the project dataset, without extension
service_acct_creds: "syn25171627" # synapse ID of service_account_creds.json file
manifest:
title: "example" # title of metadata manifest file
# to make all manifests enter only 'all manifests'
data_type:
- "Biospecimen"
- "Patient"
model:
input:
location: "data/schema_org_schemas/example.jsonld" # path to JSON-LD data model
file_type: "local" # only type "local" is supported currently
style: # configuration of google sheet
google_manifest:
req_bg_color:
red: 0.9215
green: 0.9725
blue: 0.9803
opt_bg_color:
red: 1.0
green: 1.0
blue: 0.9019
master_template_id: '1LYS5qE4nV9jzcYw5sXwCza25slDfRA1CIg3cs-hCdpU'
strict_validation: true
Note: Paths can be specified relative to the config.yml
file or as absolute paths.
- Login to Synapse by using the command line On the CLI in your virtual environment, run the following command:
synapse login -u <synapse username> -p <synapse password> --rememberMe
Please make sure that you run the command before running schematic init
below
- Obtain Google credential Files
To obtain
schematic_service_account_creds.json
, please run:
schematic init --config ~/path/to/config.yml
As v22.12.1 version of schematic, using
token
mode of authentication (in other words, usingtoken.pickle
andcredentials.json
) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click here to learn more.
Notes: Use the schematic_service_account_creds.json
file for the service
account mode of authentication (for Google services/APIs). Service accounts
are special Google accounts that can be used by applications to access Google APIs
programmatically via OAuth2.0, with the advantage being that they do not require
human authorization.
Background: schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future.
For new features, bugs, enhancements
- Pull the latest code from develop branch in the upstream repo
- Checkout a new branch develop-<feature/fix-name> from the develop branch
- Do development on branch develop-<feature/fix-name> a. may need to ensure that schematic poetry toml and lock files are compatible with your local environment
- Add changed files for tracking and commit changes using best practices
- Have granular commits: not “too many” file changes, and not hundreds of code lines of changes
- Commits with work in progress are encouraged: a. add WIP to the beginning of the commit message for “Work In Progress” commits
- Keep commit messages descriptive but less than a page long, see best practices
- Push code to develop-<feature/fix-name> in upstream repo
- Branch out off develop-<feature/fix-name> if needed to work on multiple features associated with the same code base
- After feature work is complete and before creating a PR to the develop branch in upstream a. ensure that code runs locally b. test for logical correctness locally c. wait for git workflow to complete (e.g. tests are run) on github
- Create a PR from develop-<feature/fix-name> into the develop branch of the upstream repo
- Request a code review on the PR
- Once code is approved merge in the develop branch
- Delete the develop-<feature/fix-name> branch
Note: Make sure you have the latest version of the develop
branch on your local machine.
- Install docker from https://www.docker.com/ .
- Identify docker image of interest from Schematic DockerHub
Exdocker pull sagebionetworks/schematic:latest
from the CLI or, rundocker compose up
after cloning the schematic github repo
in this case,sagebionetworks/schematic:latest
is the name of the image chosen - Run Schematic Command with
docker run <flags> <schematic command and args>
.
- For more information on flags fordocker run
and what they do, visit the Docker Documentation
- These example commands assume that you have navigated to the directory you want to run schematic from. To specify your working directory, use$(pwd)
on MacOS/Linux or%cd%
on Windows.
- If not using the latest image, then the full name should be specified: iesagebionetworks/schematic:commit-e611e4a
- If using local image created bydocker compose up
, then the docker image name should be changed: i.e.schematic_schematic
- Using the--name
flag sets the name of the container running locally on your machine
docker run --rm -p 3001:3001 \
-v $(pwd):/schematic -w /schematic --name schematic \
-e SCHEMATIC_CONFIG=/schematic/config.yml \
-e GE_HOME=/usr/src/app/great_expectations/ \
sagebionetworks/schematic \
python /usr/src/app/run_api.py
Use content of config.yml
and schematic_service_account_creds.json
as an environment variable to run API endpoints:
-
save content of
config.yml
as to environment variableSCHEMATIC_CONFIG_CONTENT
by doing:export SCHEMATIC_CONFIG_CONTENT=$(cat /path/to/config.yml)
-
Similarly, save the content of
schematic_service_account_creds.json
asSERVICE_ACCOUNT_CREDS
by doing:export SERVICE_ACCOUNT_CREDS=$(cat /path/to/schematic_service_account_creds.json)
-
Pass
SCHEMATIC_CONFIG_CONTENT
andschematic_service_account_creds
as environment variables by usingdocker run
docker run --rm -p 3001:3001 \
-v $(pwd):/schematic -w /schematic --name schematic \
-e GE_HOME=/usr/src/app/great_expectations/ \
-e SCHEMATIC_CONFIG_CONTENT=$SCHEMATIC_CONFIG_CONTENT \
-e SERVICE_ACCOUNT_CREDS=$SERVICE_ACCOUNT_CREDS \
sagebionetworks/schematic \
python /usr/src/app/run_api.py
To run example below, first clone schematic into your home directory git clone https://github.com/sage-bionetworks/schematic ~/schematic
Then update .synapseConfig with your credentials
docker run \
-v ~/schematic:/schematic \
-w /schematic \
-e SCHEMATIC_CONFIG=/schematic/config.yml \
-e GE_HOME=/usr/src/app/great_expectations/ \
sagebionetworks/schematic schematic model \
-c /schematic/config.yml validate \
-mp /schematic/tests/data/mock_manifests/Valid_Test_Manifest.csv \
-dt MockComponent \
-js /schematic/tests/data/example.model.jsonld
docker run -v %cd%:/schematic \
-w /schematic \
-e GE_HOME=/usr/src/app/great_expectations/ \
sagebionetworks/schematic \
schematic model \
-c config.yml validate -mp tests/data/mock_manifests/inValid_Test_Manifest.csv -dt MockComponent -js /schematic/data/example.model.jsonld
cd docs
- After making relevant changes, you could run the
make html
command to re-generate thebuild
folder. - Please contact the dev team to publish your updates
Other helpful resources:
If you install external libraries by using poetry add <name of library>
, please make sure that you include pyproject.toml
and poetry.lock
file in your commit.
You can use the Issues
tab to create bug and feature requests. Providing enough details to the developers to verify and troubleshoot your issue is paramount:
- Provide a clear and descriptive title as well as a concise summary of the issue to identify the problem.
- Describe the exact steps which reproduce the problem in as many details as possible.
- Describe the behavior you observed after following the steps and point out what exactly is the problem with that behavior.
- Explain which behavior you expected to see instead and why.
- Provide screenshots of the expected or actual behaviour where applicable.
Please visit more documentation here
All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the tests subdirectory.
You can run the test suite in the following way:
pytest -vs tests/
- Duplicate the entity being updated (or folder if applicable).
- Edit the duplicates (e.g. annotations, contents, name).
- Update the test suite in your branch to use these duplicates, including the expected values in the test assertions.
- Open a PR as per the usual process (see above).
- Once the PR is merged, leave the original copies on Synapse to maintain support for feature branches that were forked from
develop
before your update.- If the old copies are problematic and need to be removed immediately (e.g. contain sensitive data), proceed with the deletion and alert the other contributors that they need to merge the latest
develop
branch into their feature branches for their tests to work.
- If the old copies are problematic and need to be removed immediately (e.g. contain sensitive data), proceed with the deletion and alert the other contributors that they need to merge the latest
- Please consult the Google Python style guide prior to contributing code to this project.
- Be consistent and follow existing code conventions and spirit.
Main contributors and developers: