Pyplyn (meaning pipeline in Afrikaans) is an open source tool that extracts data from various sources, transforms and gives it meaning, and sends it to other systems for consumption.
Pyplyn was written to allow teams in Salesforce to power real-time health dashboards that go beyond one-to-one mapping of time-series data to visual artifacts.
One example of such an use-case is ingesting time-series data stored in Argus and Refocus, processing multiple metrics together (context), and displaying the result in Refocus (as red, yellow, or green lights).
- Simple and reliable data pipeline with support for various transformations
- No code required, JSON-based syntax
- Flexible multi-stage source/transformation/destination logic
- Developed with support for extension via easy-to-grasph Java code
- Highly available and scalable (the pipeline can be partitioned across multiple node)
- Configurations can be added/updated/removed without restarting the process
- Publishes operational metrics (errors, p95, etc.) for monitoring service health
- Faster processing speed with the use of RxJava (4.3x faster, tested on our reference dataset)
- Cleaner code, mainly after converting models Immutables-annotated abstract classes
- Support mutual TLS authentication for endpoints, by specifying a Java keystore and password
- Connect, read, and write timeouts can now be specified for each connector
- All Jackson-based models can now be serialized (with the type specifier field)
AppConfig.Global.minRepeatIntervalMillis
was deprecated (replaced withAppConfig.Global.runOnce
)- Bash script for managing the service's lifecycle (start, stop, restart, logs, etc.)
- Since 10.0.0, Pyplyn releases follow Semantic versioning guidelines.
- Extract source: Execute a Salesforce SOQL query
- Load destination: Email (via SMTP MTA)
- Load destination: Post a Salesforce Chatter Feed Item
- Load destination: Create a Salesforce Record
- API for managing configurations
- API for managing connectors
- Multi-tenancy (support configurations and connectors belonging to different users)
- Include Pyplyn in the Maven central repository
We welcome ideas for improvement and bugs and as such we encourage you to submit them by opening new issues on GitHub!
Pyplyn uses Maven for its build lifecycle. At least you will need to have Maven and Java 8 installed on your host OS.
Consult the full prerequisites section to find out more.
# Clone the Pyplyn repository
git clone https://github.com/salesforce/pyplyn /tmp/pyplyn
# Build the project with Maven
cd /tmp/pyplyn
mvn clean package
# Navigate to Pyplyn's build location
cd target/
# Create a new directory for your configurations (leave empty for now)
mkdir configurations
# Rename app-config.example.json and make the required changes
mv config/app-config.example.json config/pyplyn-config.json
# Rename connectors.example.json and make the required changes (see below)
mv config/connectors.example.json config/connectors.json
# Update the _connectors.json_ file and configure your endpoints
#
# Edit bin/pyplyn.sh and set _LOCATION_ to the absolute path of the build directory
# LOCATION=/tmp/pyplyn/target
# Start pyplyn and check logs
bash bin/pyplyn.sh start
# Check that the program started without throwing any exceptions
bash ~/pyplyn/bin/pyplyn.sh logs
A full step-by-step explanation (including how to write configurations) can be found in the Pyplyn documentation.
Consult the Pyplyn Documentation for an in-depth explanation of Pyplyn's features.
Generate Javadocs by running the following Maven target: mvn package
.
If you would like to contribute to Pyplyn, please read the contributor guide!