-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b61c144
commit 0c9c876
Showing
1 changed file
with
190 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,195 @@ | ||
# orchard | ||
# Orchard | ||
|
||
> An intentional plantation of trees or shrubs that is maintained for food production. | ||
[![CircleCI](https://circleci.com/gh/salesforce/orchard.svg?style=svg)](https://circleci.com/gh/salesforce/orchard) | ||
|
||
Orchard is an orchestration service that manages workflows and activities | ||
inspired by AWS Data Pipeline | ||
Orchard is an orchestration service that manages data pipelines, compute workflows, associated ETL/ELT activities, AND manages the underlying resource lifecycle (provisioning, monitoring, termination). | ||
Inspired by AWS' [Data Pipeline](https://aws.amazon.com/datapipeline/) service, Orchard is designed for enterprise use-cases that demand security, extreme concurrency, granular control over the resource lifecycle, and flexible integration with a cloud-based microservice architecture. | ||
|
||
## Design | ||
Like Apache Spark, Orchard is written in functional Scala. This gives Orchard the power of Scala's well-developed concurrency features, and in particular, the Actor pattern as enabled by Scala's [Akka](https://github.com/akka/akka) library. | ||
|
||
## Setup | ||
Orchard is designed to be deployed into a cloud environment as a service, but can alternatively be set up locally for exploration and development. To do so, follow these steps. | ||
|
||
**Install Apps (OSX)** | ||
```sh | ||
# clone orchard into a local directory | ||
git clone [email protected]:salesforce/orchard.git | ||
|
||
# use sdkman to install Scala Build Tool (SBT) (if needed) | ||
curl -s "https://get.sdkman.io" | bash | ||
sdk install sbt | ||
|
||
# use brew to install postman (for API calls) and docker (if needed) | ||
brew install -cask postman | ||
brew install -cask docker | ||
``` | ||
|
||
**Configure Postgres Database** | ||
|
||
Orchard uses a Postgres database in the docker-compose stack to store the state of each active task. Set the password for this database by adding a [.env file](https://docs.docker.com/compose/environment-variables/#the-env-file) to the project's root containing `ORCHARD_PG_SECRET=orchardsecret`, substituting `orchardsecret` for your own secret. | ||
|
||
or set directly in the environment with: | ||
```sh | ||
export ORCHARD_PG_SECRET=orchardsecret | ||
``` | ||
|
||
**Start the Docker Compose stack** | ||
```sh | ||
docker-compose up | ||
``` | ||
|
||
This will start the database container, provision the required tables, and start the Orchard web-serivce. | ||
|
||
**Authentication** | ||
|
||
Orchard is by default running a development configuration where authentication is disabled. To enable API authentication, set `orchard.auth.enabled = true` in [application.conf](https://github.com/salesforce/orchard/blob/master/orchard-ws/conf/application.conf). Orchard will then pull the keys specified in | ||
``` | ||
hashed-keys = { | ||
user = [ ${?MCE_ENV_X_API_USER}, ${?MCE_ENV_X_API_ADMIN_USER} ] | ||
admin = [ ${?MCE_ENV_X_API_ADMIN}, ${?MCE_ENV_X_API_ADMIN_USER} ] | ||
} | ||
``` | ||
which must match the key provided in the header of any inbound API requests. | ||
|
||
## Using Orchard | ||
Once the setup is complete, Orchard is ready to receive a number of different instructions via API request. | ||
|
||
If deployed into a cloud environment like AWS, Orchard will need a role with an appropriate set of permissions appropriate for the activities. | ||
|
||
Orchard allows the definition and execution of **workflows**, where each workflow consists of a number of **activities**. Activities can be dependant on other activities, forming a directed acyclic graph (DAG). Orchard will execute activities concurrently whenever possible. | ||
|
||
Below is an example workflow that defines a number of activities to be executed in an AWS VPC environment: | ||
|
||
```json | ||
{ | ||
"name": "workflowTestName", | ||
"activities": [ | ||
{ | ||
"id": "activityId_1", | ||
"name": "first_activity", | ||
"activityType": "mock.activity.StubActivity", | ||
"activitySpec": { | ||
"steps": [ | ||
{ | ||
"jar": "command-runner.jar", | ||
"args": [ | ||
"spark-submit", | ||
"s3://s3bucket/submit/spark_submit.py", | ||
"--data_source", | ||
"s3://s3bucket/data/data_source.csv", | ||
"--output_uri", | ||
"s3://s3bucket/data/output" | ||
] | ||
} | ||
] | ||
}, | ||
"resourceId": "resourceId_1", | ||
"maxAttempt": 2 | ||
}, | ||
{ | ||
"id": "activityId_2", | ||
"name": "second_activity", | ||
"activityType": "mock.activity.StubActivity", | ||
"activitySpec": { | ||
"steps": [ | ||
{ | ||
"jar": "command-runner.jar", | ||
"args": [ | ||
"step started", | ||
"modeling in progress", | ||
"canceled" | ||
] | ||
} | ||
] | ||
}, | ||
"resourceId": "resourceId_2", | ||
"maxAttempt": 2 | ||
} | ||
], | ||
"resources": [ | ||
{ | ||
"id": "resourceId_1", | ||
"name": "emr cluster", | ||
"resourceType": "mock.resource.StubResource", | ||
"resourceSpec": { | ||
"releaseLabel": "emr-6.3.0", | ||
"applications": [ | ||
"Spark" | ||
], | ||
"serviceRole": "EMR_Role", | ||
"resourceRole": "emr-resource-role", | ||
"instancesConfig": { | ||
"subnetId": "subnet-0000ab0a", | ||
"ec2KeyName": "orchard", | ||
"instanceCount": 2, | ||
"masterInstanceType": "m5.xlarge", | ||
"slaveInstanceType": "m5.xlarge" | ||
} | ||
}, | ||
"maxAttempt": 2 | ||
}, | ||
{ | ||
"id": "resourceId_2", | ||
"name": "emr cluster", | ||
"resourceType": "mock.resource.StubResource", | ||
"resourceSpec": { | ||
"releaseLabel": "emr-6.3.0", | ||
"applications": [ | ||
"Spark" | ||
], | ||
"serviceRole": "EMR_Role", | ||
"resourceRole": "emr-resource-role", | ||
"instancesConfig": { | ||
"subnetId": "subnet-0000ab0a", | ||
"ec2KeyName": "orchard", | ||
"instanceCount": 2, | ||
"masterInstanceType": "m5.xlarge", | ||
"slaveInstanceType": "m5.xlarge" | ||
} | ||
}, | ||
"maxAttempt": 2 | ||
} | ||
], | ||
"dependencies": { | ||
"activityId_2": [ | ||
"activityId_1" | ||
] | ||
} | ||
} | ||
``` | ||
|
||
To submit this request to Orchard: | ||
```html | ||
POST http://localhost:9000/v1/workflow | ||
``` | ||
Which returns a workflow_id. For example: `wf-f231a08f-60e4-480a-b845-e53e06918f77` | ||
|
||
Once defined, activate a workflow using the workflow id like so: | ||
```html | ||
PUT http://localhost:9000/v1/workflow/wf-f231a08f-60e4-480a-b845-e53e06918f77 | ||
``` | ||
|
||
**Resource and Activity Types** | ||
|
||
In the above example workflow, the activities and resources used are **stubs**. In an actual deployment, Orchard will be using resources and activities specific to the chosen cloud provider's environment, like AWS' EC2 or EMI. Each activity has its own `activitySpec`, which contains configuration needed to carry out that activity. | ||
|
||
Currently, Orchard supports: | ||
- AWS EC2 activities / resources | ||
- AWS EMR activities / resources | ||
- AWS S3 resources | ||
- AWS SSM resources | ||
- Shell script activity | ||
- Shell command activity | ||
|
||
The project is actively seeking contributions for other activity and resource types, including those relevant to GCP and Azure cloud. A guide to adding new resources and activities will be linked here at a later date for those interested in contributing. | ||
|
||
## Contributing | ||
To contribute to the project, please check issues, fork, and submit a pull request. | ||
|
||
## License | ||
Orchard is an open-source project licensed under BSD 3-Clause "New" or "Revised" License. | ||
|
||
Go [here](https://github.com/salesforce/orchard/blob/master/LICENSE.txt) to read the full text of Orchard's license. |