Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create the command driven connectors RFC #255

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions rfc/accepted/0021-command-driven-connectors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Command driven connectors

- Feature Name: `command_driven_connectors`
- Start Date: 2021-10-07
- Tremor Issue: [tremor-rs/tremor-runtime#0000](https://github.com/tremor-rs/tremor-runtime/issues/0000)
- RFC PR: [tremor-rs/tremor-rfcs#0000](https://github.com/tremor-rs/tremor-rfcs/pull/0000)

## Summary
[summary]: #summary

This RFC adds support for connectors that don't do anything by themselves, but instead are driven by commands sent to them.
ramonacat marked this conversation as resolved.
Show resolved Hide resolved

## Motivation
[motivation]: #motivation

There are many connectors, where, especially on the read side, there's no obvious "default" way to read data. For example object stores usually do not provide an API to stream changes (and if they did, the users might still want to read files in a different fashion). With command driven connectors, users will be able to read data that they need based on the commands they send to the connector.
ramonacat marked this conversation as resolved.
Show resolved Hide resolved

## Guide-level Explanation
[guide-level-explanation]: #guide-level-explanation

Command-driven connectors define at least two ports - one for data and one for commands.
ramonacat marked this conversation as resolved.
Show resolved Hide resolved
As an example, let's look at a connector that reads files:

```tremor
define flow main
flow
define connector file_connector from file
with
codec="string",
config = {"command_driven": true}
ramonacat marked this conversation as resolved.
Show resolved Hide resolved
end;

define connector file_list from file
with
codec = "json",
config = {
"path": "in.json",
"mode": "read",
},
end;

create pipeline main
pipeline
select { "command": "read", "path": event.path } from in into out;
end;

create connector file_connector from file_connector;
create connector file_list from file_list;

connect /connector/file_list/out to /pipeline/main/in;

# This is the magic - we send the commands here, note the "control" port
connect /pipeline/main/out to /connector/file_connector/control;

connect /connector/file_connector/data to /pipeline/main/out;
end;
```

## Reference-level Explanation
[reference-level-explanation]: #reference-level-explanation

Each command driven connector implements at least two ports - `data` and `control`.
ramonacat marked this conversation as resolved.
Show resolved Hide resolved
`control` is an input port, through which the commands are sent.
Currently only reads are supported, so `data` is an output port.
Each message in the `data` port is a single event, with metadata containing the original command.
ramonacat marked this conversation as resolved.
Show resolved Hide resolved
The commands are standardised as far as it is practical, so the connectors can be swapped without adjusting the rest of the system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support this by namespacing commands or is there another plan? What happens if a connector gets an unsupported command? Should we provide convinience functiuons for them such as file::open("some.file") that create the right events for a command?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some sort of namespacing will be required, e.g. "query" will take a different kind of query (and possibly different arguments) based on the database, as even "file_read" can be different (depending on the underlying technologies, the paths might not be compatible, there might be more arguments needed to locate the file, etc.). I like the idea with convenience functions, I think that would give us good UX (while allowing people to still generate the commands manually if they need/wish to).


## Drawbacks
[drawbacks]: #drawbacks

This raises the complexity of Tremor.

## Rationale and Alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

There are no known alternatives that provide the same benefits.
Currently, for example the S3 reader connector, will read all files in the bucket, once, which has limited use.

## Prior Art
[prior-art]: #prior-art

Discuss prior art, both the good and the bad, in relation to this proposal.
A few examples of what this can include are:

- For language, library, tools, and clustering proposals: Does this feature exist in other programming languages, and what experience have their community had?
- For community proposals: Is this done by some other community and what were their experiences with it?
- For other teams: What lessons can we learn from what other communities have done here?
- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background.

This section is intended to encourage you as an author to think about the lessons from other projects, provide readers of your RFC with a fuller picture.
If there is no prior art, that is fine- your ideas are interesting to us whether they are brand new or if it is an adaptation from other projects.

Note that while precedent set by other projects is some motivation, it does not, on its own, motivate an RFC.
Please also take into consideration that Tremor sometimes intentionally diverges from similar projects.

## Unresolved Questions
[unresolved-questions]: #unresolved-questions

- How do we enforce uniformity of commands across connectors?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the concept of traits, behaviors, personas , or whatever it would be called for that. (I'll go with traid in the rest of the text as it's the shortest, which is pointless since I probably wrote more in this note then I save by using trait ...)

The basic idea is that a connector defines a number of traits that specify the kind of commands it interacts with. Which traits exist we got to define possibly not in the RFC as it'll change over time, but we could start with some examples.

lets look at s3 / file / gcs as an example. I suspect it would work something like this:

  • s3 offers the fileio, objectstore and s3 traits
  • gcs offers the fileio, objectstore and gcs traits
  • file offers the fileio, fs traits

commands such as list files, open file, close file etc would be in fileio

commands like cache id could be in objectstore

and endpoint specific commands could be in gcs and s3 respectively

The reason for this is that with generic traitrs testing / prototyping and migrating become very easy. As long as no implementation specific features are used prototyping could happen with a local only connector and then switched over to a production connector on deployment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - reusable traits could be a huge boon for testing. For example, if we added seek or sync to fileio or other commands to fileio then these would be useable in tests for setup and may imply a set of associated assertions.

- How would writes work?
- Are multiple events per command allowed in the output?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that batching for commands is possible, we can unbatch them and go about our work, I don't see anything that would prevent this. On the other side I don't see much of a use for this either.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I think we can get away with not answering that right now, and decide when/if we actually need it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want commands from a given command source to be applied in arrival order - that would seem to be a strong enough guarantee initially and consistent with what we already have for user or runtime events. Later, if we need stronger or variant guarantees - perhaps these could be controlled via command traits. A future worth iterating in this RFC IMO ...


## Future Possibilities
[future-possibilities]: #future-possibilities

Think about what the natural extension and evolution of your proposal would be and how it would affect Tremor as a whole in a holistic way. Try to use this section as a tool to more fully consider all possible interactions with the project in your proposal. Also, consider how this all fits into the roadmap for the project and of the relevant sub-team.

This is also a good place to "dump ideas", if they are out of scope for the RFC you are writing but otherwise related.

If you have tried and cannot think of any future possibilities, you may state that you cannot think of anything.

Note that having something written down in the future-possibilities section is not a reason to accept the current or a future RFC; such notes should be in the section on motivation or rationale in this or subsequent RFCs.
The section merely provides additional information.


## notes
- separate channels - one for commands, one for data
- traits (not necessarily rust traits) for the behaviours that a connector can implement
- e.g. KV store - "read key", "stream read key", filesystem - "create directory", "delete directory"