From 0c9c87662eeebea572cc04011896d5305fb3f566 Mon Sep 17 00:00:00 2001 From: Robert H Stolz <7143905+garden-of-delete@users.noreply.github.com> Date: Thu, 12 May 2022 17:32:08 -0700 Subject: [PATCH] [docs] README.md (#14) --- README.md | 193 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 190 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 57b6335..61227b9 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,195 @@ -# orchard +# Orchard > An intentional plantation of trees or shrubs that is maintained for food production. [![CircleCI](https://circleci.com/gh/salesforce/orchard.svg?style=svg)](https://circleci.com/gh/salesforce/orchard) -Orchard is an orchestration service that manages workflows and activities -inspired by AWS Data Pipeline +Orchard is an orchestration service that manages data pipelines, compute workflows, associated ETL/ELT activities, AND manages the underlying resource lifecycle (provisioning, monitoring, termination). +Inspired by AWS' [Data Pipeline](https://aws.amazon.com/datapipeline/) service, Orchard is designed for enterprise use-cases that demand security, extreme concurrency, granular control over the resource lifecycle, and flexible integration with a cloud-based microservice architecture. + +## Design +Like Apache Spark, Orchard is written in functional Scala. This gives Orchard the power of Scala's well-developed concurrency features, and in particular, the Actor pattern as enabled by Scala's [Akka](https://github.com/akka/akka) library. + +## Setup +Orchard is designed to be deployed into a cloud environment as a service, but can alternatively be set up locally for exploration and development. To do so, follow these steps. + +**Install Apps (OSX)** +```sh +# clone orchard into a local directory +git clone git@github.com:salesforce/orchard.git + +# use sdkman to install Scala Build Tool (SBT) (if needed) +curl -s "https://get.sdkman.io" | bash +sdk install sbt + +# use brew to install postman (for API calls) and docker (if needed) +brew install -cask postman +brew install -cask docker +``` + +**Configure Postgres Database** + +Orchard uses a Postgres database in the docker-compose stack to store the state of each active task. Set the password for this database by adding a [.env file](https://docs.docker.com/compose/environment-variables/#the-env-file) to the project's root containing `ORCHARD_PG_SECRET=orchardsecret`, substituting `orchardsecret` for your own secret. + +or set directly in the environment with: +```sh +export ORCHARD_PG_SECRET=orchardsecret +``` + +**Start the Docker Compose stack** +```sh +docker-compose up +``` + +This will start the database container, provision the required tables, and start the Orchard web-serivce. + +**Authentication** + +Orchard is by default running a development configuration where authentication is disabled. To enable API authentication, set `orchard.auth.enabled = true` in [application.conf](https://github.com/salesforce/orchard/blob/master/orchard-ws/conf/application.conf). Orchard will then pull the keys specified in +``` +hashed-keys = { + user = [ ${?MCE_ENV_X_API_USER}, ${?MCE_ENV_X_API_ADMIN_USER} ] + admin = [ ${?MCE_ENV_X_API_ADMIN}, ${?MCE_ENV_X_API_ADMIN_USER} ] + } +``` +which must match the key provided in the header of any inbound API requests. + +## Using Orchard +Once the setup is complete, Orchard is ready to receive a number of different instructions via API request. + +If deployed into a cloud environment like AWS, Orchard will need a role with an appropriate set of permissions appropriate for the activities. + +Orchard allows the definition and execution of **workflows**, where each workflow consists of a number of **activities**. Activities can be dependant on other activities, forming a directed acyclic graph (DAG). Orchard will execute activities concurrently whenever possible. + +Below is an example workflow that defines a number of activities to be executed in an AWS VPC environment: + +```json +{ + "name": "workflowTestName", + "activities": [ + { + "id": "activityId_1", + "name": "first_activity", + "activityType": "mock.activity.StubActivity", + "activitySpec": { + "steps": [ + { + "jar": "command-runner.jar", + "args": [ + "spark-submit", + "s3://s3bucket/submit/spark_submit.py", + "--data_source", + "s3://s3bucket/data/data_source.csv", + "--output_uri", + "s3://s3bucket/data/output" + ] + } + ] + }, + "resourceId": "resourceId_1", + "maxAttempt": 2 + }, + { + "id": "activityId_2", + "name": "second_activity", + "activityType": "mock.activity.StubActivity", + "activitySpec": { + "steps": [ + { + "jar": "command-runner.jar", + "args": [ + "step started", + "modeling in progress", + "canceled" + ] + } + ] + }, + "resourceId": "resourceId_2", + "maxAttempt": 2 + } + ], + "resources": [ + { + "id": "resourceId_1", + "name": "emr cluster", + "resourceType": "mock.resource.StubResource", + "resourceSpec": { + "releaseLabel": "emr-6.3.0", + "applications": [ + "Spark" + ], + "serviceRole": "EMR_Role", + "resourceRole": "emr-resource-role", + "instancesConfig": { + "subnetId": "subnet-0000ab0a", + "ec2KeyName": "orchard", + "instanceCount": 2, + "masterInstanceType": "m5.xlarge", + "slaveInstanceType": "m5.xlarge" + } + }, + "maxAttempt": 2 + }, + { + "id": "resourceId_2", + "name": "emr cluster", + "resourceType": "mock.resource.StubResource", + "resourceSpec": { + "releaseLabel": "emr-6.3.0", + "applications": [ + "Spark" + ], + "serviceRole": "EMR_Role", + "resourceRole": "emr-resource-role", + "instancesConfig": { + "subnetId": "subnet-0000ab0a", + "ec2KeyName": "orchard", + "instanceCount": 2, + "masterInstanceType": "m5.xlarge", + "slaveInstanceType": "m5.xlarge" + } + }, + "maxAttempt": 2 + } + ], + "dependencies": { + "activityId_2": [ + "activityId_1" + ] + } +} +``` + +To submit this request to Orchard: +```html +POST http://localhost:9000/v1/workflow +``` +Which returns a workflow_id. For example: `wf-f231a08f-60e4-480a-b845-e53e06918f77` + +Once defined, activate a workflow using the workflow id like so: +```html +PUT http://localhost:9000/v1/workflow/wf-f231a08f-60e4-480a-b845-e53e06918f77 +``` + +**Resource and Activity Types** + +In the above example workflow, the activities and resources used are **stubs**. In an actual deployment, Orchard will be using resources and activities specific to the chosen cloud provider's environment, like AWS' EC2 or EMI. Each activity has its own `activitySpec`, which contains configuration needed to carry out that activity. + +Currently, Orchard supports: +- AWS EC2 activities / resources +- AWS EMR activities / resources +- AWS S3 resources +- AWS SSM resources +- Shell script activity +- Shell command activity + +The project is actively seeking contributions for other activity and resource types, including those relevant to GCP and Azure cloud. A guide to adding new resources and activities will be linked here at a later date for those interested in contributing. + +## Contributing +To contribute to the project, please check issues, fork, and submit a pull request. + +## License +Orchard is an open-source project licensed under BSD 3-Clause "New" or "Revised" License. + +Go [here](https://github.com/salesforce/orchard/blob/master/LICENSE.txt) to read the full text of Orchard's license.