Rayvens offers facilities to easily incorporate the set of Apache Camel sources and sinks into an application code.
Sources and sinks are attached to Stream
objects:
stream = rayvens.Stream('name')
The addition of a new source or sink to a stream requires a configuration to be passed into the appropriate API function. A configuration is a dictionary containing the source or sink kind
along with several of its options.
The general usage is to first create the configuration:
sink_configuration = dict(kind='slack-sink',
channel='channel_name',
webhook_url='valid_webhook_url')
followed by the creation of the sink based on the configuration:
stream.add_sink(sink_configuration)
Sources publish events to the stream they are attached to.
Sinks are subscribed to the stream they are attached to.
Events are propagated from the sources via the stream to all the subscribers attached to the stream.
If any operator is attached to the stream, the operator will process the events in the order in which they are pushed onto the stream and then forwards them to all the sinks once the processing ends.
Locally, sources and sinks run as separate processes. This includes the case in which the source/sink is run inside a container in the cloud. When running using the operator mode, the source/sink will be launched as a service.
Rayvens supports several sources and sinks out of the box given below by their kind
field value.
Sources:
http-source
kafka-source
telegram-source
binance-source
cloud-object-store-source
file-source
Sinks:
slack-sink
kafka-sink
telegram-sink
cloud-object-store-sink
file-sink
In addition to the sources and sinks above, Rayvens also allows the user to specify any other source or sink supported by Apache Camel by using the following sources/sink kinds:
generic-source
generic-periodic-source
generic-sink
An HTTP source is a periodic source meaning that a request will be sent to the source URL every period
milliseconds.
The source supports the following fields:
url
the URL being polled everyperiod
millisecondsperiod
(optional, default 1000 ms) request frequency in milliseconds
source_config = dict(kind='http-source',
url=<url_value>,
period=3000)
Example of a URL value: https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL
The Kafka source allows the application to receive the events published on a specific Kafka topic. It supports the following fields:
topic
the Kafka topic where all the events will be coming frombrokers
list of Kafka brokers of the following type: host1:port1,host2:port2,...SASL_password
(optional) if interacting with a cloud-based Kafka service, this is the password used to access the service
The Telegram source receives all the incoming messages from a Telegram Bot. It supports the following fields:
authorization_token
the authorization token used to communicate with the Telegram Bot.
Uses the Binance API to retrieve cryptocurrency prices. It supports the following fields:
coin
the official cryptocurrency symbol, for example for Bitcoin the symbol is BTCperiod
(optional, default 3000 milliseconds) price request frequency in milliseconds
Example:
source_config = dict(kind='binance-source', coin='BTC')
This source supports multiple input routes based on the number of coins being interrogated:
source_config = dict(kind='binance-source', coin=['BTC', 'ETH'])
An individual event will be received with the price of each of coin in the list of coins.
This source manages the receiving of objects from AWS S3 or IBM Cloud Object Storage. It supports the following fields:
bucket_name
the name of the bucketaccess_key_id
found in the configuration of the Cloud Object Storage service;secret_access_key
found in the configuration of the Cloud Object Storage service;endpoint
the name of the public endpoint from the bucket configuration qualified by the URI scheme (for example,https://
);region
(optional) the region of the bucket, if left empty the region will be automatically parsed by Rayvens from the endpoint;move_after_read
(optional) enables moving of any read file from bucket specified bybucket_name
to new bucket that is the value of this field. If using the Cloud Object Storage, the new bucket has to be created ahead of time. The new bucket name is provided as the value of this field;meta_event_only
(optional) this option only allows for the filename to be propagated;
Each file event is formated as a json that includes body
and filename
keys.
This source will fetch and delete the files in the bucket unless the move_after_read
option is enabled in which case the read files are moved to the bucket specified by the move_after_read
field value.
Example of the simplest configuration where the source consumes any files written to the bucket specified by the bucket_name
:
source_config = dict(kind='cloud-object-storage-source',
bucket_name="input_bucket_name",
access_key_id=access_key_id,
secret_access_key=secret_access_key,
endpoint=endpoint)
Example of configuration with move after read enabled:
source_config = dict(kind='cloud-object-storage-source',
bucket_name="input_bucket_name",
access_key_id=<access_key_id>,
secret_access_key=<secret_access_key>,
endpoint=<endpoint>,
move_after_read="new-bucket-name")
The event returned by this source is a JSON string of the form:
{"filename": <filename_value>, "body": <body_value>}
Convert the event to JSON when handling the event:
import json
json_event = json.loads(event)
and then access its field like so:
json_event['filename']
json_event['body']
This source reads files from a file system directory when they become available. The event is triggered automatically whenever files show up in the source directory. Each file event is formated as a json that includes body
and filename
keys. The file event is returned by the source to be processed further by the application. This source has the following options:
path
the path to the file being monitored;keep_files
(optional) by default the files are deleted i.e. the value isFalse
;move_after_read
(optional) a directory to move processed files to. Mutually exclusive w.r.t.keep_files=True
;recursive
(optional) by default files and not extracted from sub-directories i.e. the value isFalse
;
Example configuration for this source:
source_config = dict(kind='file-source',
path='test_files',
move_after_read='test_files_dest')
The returned result is a JSON message with the following structure:
{"filename": <filename_value>, "body": <body_value>}
Convert the event to JSON when handling the event:
import json
json_event = json.loads(event)
and then access its field like so:
json_event['filename']
json_event['body']
This source reads a file from the file system when the file becomes available. The event is triggered automatically whenever the file becomes available. The file content is read to memory and returned by the source to be processed further by the application. This source has the following options:
path
the path to the file being monitored;keep_files
(optional) by default the files are deleted i.e. the value isFalse
;
Example configuration for this source:
source_config = dict(kind='file-source-raw',
path='test_files/test.txt',
keep_files=True)
This source monitors a given directory for file-related events: modifications, deletion and creation. The source has the following configuration fields:
path
the path to the directory being monitored;events
the list of events being monitored:DELETE
,CREATE
andMODIFY
. By default all three events are enabled. The user can specify a subset of events only by setting the value of this field to a comma separated list example:events=DELETE,CREATE
Example of configuration for this source:
source_config = dict(kind='file-watch-source',
path='test_files',
events='DELETE,CREATE')
The returned result is a JSON message with the following structure:
{"filename": <filename_value>, "event_type": <event_type>}
where event_type is one of DELETE
, CREATE
or MODIFY
For example, when the file test.txt
is deleted from the watched folder the event returned is:
{
"filename": "/absolute/path/to/test.txt",
"event_type": "DELETE"
}
Convert the event to JSON when handling the event:
import json
json_event = json.loads(event)
and then access its field like so:
json_event['filename']
json_event['event_type']
Outputs a message to a Slack channel provided by the user. It supports the following fields:
channel
the output channel namewebhook_url
the webhook URL belonging to the channel to which the message is being sent
The Kafka sink allows the application to publish events to a specific Kafka topic. It supports the following fields:
topic
the Kafka topic to which events will be sentbrokers
list of Kafka brokers of the following type: host1:port1,host2:port2,...SASL_password
(optional) if interacting with a cloud-based Kafka service, this is the password used to access the service
The Telegram sink sends messages to a Telegram Bot. It supports the following fields:
authorization_token
the authorization token used to communicate with the Telegram Bot.chat_id
(optional) if present it enables the Telegram bot to send the message to another Telegram chat.
This sink manages the uploading of objects to AWS S3 or IBM Cloud Object Storage. It supports the following fields:
-
bucket_name
the name of the bucket; -
access_key_id
found in the configuration of the Cloud Object Storage service; -
secret_access_key
found in the configuration of the Cloud Object Storage service; -
endpoint
the name of the public endpoint from the bucket configuration qualified by the URI scheme (for example,https://
); -
region
(optional) the region of the bucket, if left empty the region will be automatically parsed by Rayvens from the endpoint; -
upload_type
(optional) the special type of the upload:Possible values:
upload_type="multi-part"
the input must be a file which will be split into multiple parts;
Related options:
part_size
(optional, only used whenupload_type
is set tomulti-part
) the size in bytes of the parts;
-
file_name
(optional whenupload_type
ismulti-part
) the name of the file under which the data will be stored in the Cloud Object Store. If specified in conjunction withupload_type
beingmulti-part
the name of the uploaded file will be overwritten; -
from_file
(optional) uploads a file to Cloud Object Storage. The upload event is triggered when the file is available. The file is deleted after the upload unless thekeep_file
option is specified. When this option is enabled, theupload_type
is by defaultmulti-part
and the file will be uploaded in multiple parts of sizepart_size
;Related options:
keep_file
(optional) this option signals that the file specified byfrom_file
or the files dumped inside thefrom_directory
should not be deleted after upload;
-
from_directory
(optional) uploads the files dropped into a file system directory to Cloud Object Storage. The upload event is triggered whenever a file is dropped inside the directory. The file is deleted after the upload. When this option is enabled, theupload_type
is by defaultmulti-part
and the file will be uploaded in multiple parts of sizepart_size
;Related options:
keep_file
(optional) this option signals that the files dumped inside thefrom_directory
should not be deleted after upload;
Without using any of the optional fields, this sink will upload application data to the Cloud Object Storage and put it in a file with name specified by the user. The data is the file contents in this case. See example cos_sink.py
.
In addition to the default mode described above this sink type also supports the uploading of files in multiple parts. To enable this mode the upload type field named upload_type
must be specified as multi-part
. The configuration also accepts an optional argument part_size
that specifies the size of the parts. The size needs to be specified in bytes and its default value is 10 MB (i.e. 26214400 bytes).
For example, to specify a multi-part upload using 2MB chunks:
sink_config = dict(kind='cloud-object-storage-sink',
bucket_name=<cos_bucket>,
access_key_id=<cos_access_key_id>,
secret_access_key=<cos_secret_access_key>,
endpoint=<cos_endpoint>,
upload_type="multi-part",
part_size=2 * 1024 * 1024)
The file to be uploaded can be specified by the user at runtime and can be passed to the Rayvens stream containing a multi-part Cloud Object Storage sink use in the following way:
from pathlib import Path
stream << Path("test_files/test.txt")
This will ensure that the input is treated as the path to a file on the target system. Unless overwritten by the file_name
option the name of the uploaded file will be the original file name (in this case test.txt
).
When from_file
or from_directory
fields are used, the input will be a file system file which is uploaded in multi-part mode. Since the multi-part mode is the only one currently supported it will be selected by default. To adjust the part size consider using the part_size
option.
This sink manages the uploading of objects to a file system directory. It supports the following fields:
path
the path to the sink directory;
Example configuration for this source:
source_config = dict(kind='file-sink',
path='test_files')
To supports the rest of Apache Camel sources, Rayvens supports two generic source types.
- Non-periodic generic source with type
kind="generic-source"
- Periodic generic source with type
kind="generic-periodic-source"
Each generic source type requires a field called spec
or a uri
field. If both are specified the spec
field will be used and the uri
field will be ignored.
The spec
field contains the Yaml specification of the source. The expected input type for spec
is string.
The uri
field expects the string representing just the endpoint specification.
Depending on the source, the specification can be minimal consisting just of the URI specifying the source configuration. For example, creating a kafka source using a topic with name "newTopic"
and a local broker localhost:9092
is as simple as:
source_config = dict(kind='generic-source',
spec="uri: kafka:newTopic?brokers=localhost:9092")
For an even simpler configuration we can specify the above using the uri
field instead:
source_config = dict(kind='generic-source',
uri="kafka:newTopic?brokers=localhost:9092")
The spec
string is sensitive to indentation if the configuration is more complicated, for example, if the source is a periodic source the configuration code may look like this:
source_config = dict(kind='generic-source',
spec="""
- from:
uri: timer:tick?period=2000
steps:
- to: https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL
""")
Every two seconds, a request will be sent to the source URL request the price of an AAPL share. The response will be forwarded back to Rayvens.
To simplify the support of periodic patterns such as the one for the generic HTTP source above, Rayvens provides a periodic generic source. Periodic generic sources use a timer to send requests to the source every period
milliseconds in a similar manner to the code above. The request will trigger the source to produce an event which is then forwarded to Rayvens. Periodic sources have an optional period
field.
Using the generic-periodic-source
type instead of the generic-source
simplifies the above code to just specifying the source URI:
source_config = dict(kind='generic-periodic-source',
spec="uri: https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL",
period=2000)
Using the uri
syntax:
source_config = dict(kind='generic-periodic-source',
uri="https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL",
period=2000)
The catalog of Apache Camel supported components can be found here: https://camel.apache.org/components/latest/index.html
Rayvens supports generic sinks denoted by the generic-sink
type.
Similarly to generic sources, it is sufficient to specify the sink URI inside the spec
field as a string or use the uri
syntax.
For example, if outputting to a Slack channel, the following generic sink can be put together provided the slack channel name and webhook URL are available in the application:
sink_config = dict(kind='generic-sink',
spec=f"to: slack:{slack_channel}?webhookUrl={slack_webhook}")
Using the uri
syntax the above becomes:
sink_config = dict(kind='generic-sink',
uri=f"slack:{slack_channel}?webhookUrl={slack_webhook}")