-
Notifications
You must be signed in to change notification settings - Fork 17
Add example errors connector and move simple example connector to a different namespace #437
Conversation
A description for this change would be nice to understand the purpose. |
7b2e581
to
543e893
Compare
1d635ab
to
5ee17c0
Compare
Note for the reviewers: this change is not urgent, so we can discuss what's going on here for a bit! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One nit and one small comment about validate_filtering
end | ||
|
||
def self.validate_filtering(filtering = {}) | ||
# TODO: real filtering validation will follow later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've changed this logic to only be present in the base connector and only override self.advanced_snippet_validator
(so we plugin a dedicated validation class).
Just as a heads up, when it's merged :-)
@@ -59,5 +59,7 @@ gem 'elasticsearch', '~> 8.5.0' | |||
# Dependencies for oauth | |||
gem 'signet', '~> 0.16.0' | |||
|
|||
# Dependency for example connector | |||
gem 'faker', '~> 2.22.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👏 That'll be super helpful for the filtering example connector, too
'Example Connector that produces transient errors' | ||
end | ||
|
||
# Field 'Foo' won't have a default value. Field 'Bar' will have the default value 'Value'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think that comment can be removed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the new connector? Is it used for the resilience test? If this is the case, I don't think it appropriate to introduce the connector in dir lib/connectors
.
Customers will consider connectors under lib/connectors
the "Official" connectors provided by Elastic. The purpose of the example connector is to give customers a minimal connector to start custom implementation.
(Adding my opinion as I'll also add one for filtering and filtering validation) @wangch079 AFAIK this connector is used to isolate the presentation of different features into dedicated example connectors. The existing example connector should be as simple as possible to provide a simple starting point (as you've also stated). But as the framework grows we introduce advanced features like the new error handling, filtering, filtering validation and so on. This would lead to the example connector growing and growing, which is contradicting to the original purpose of being a simple starting point. Therefore if we add new advanced features we can add dedicated example connectors for developers wanting to use these advanced features (and using the dedicated example connectors as a reference). IMO it's fine, if all example connectors are located under WDYT? |
I assume the For
I think each connector should have its dedicated directory, and the name should be the |
The work here indeed happens at a connector level, moreover framework is unable to handle this logic at all - developer of the connector should actually specify which errors are "tolerable" - it's okay to continue extraction of data. Consider our interface: def yield_documents
fetch_documents do |document|
yield serialize(document)
end
end
def fetch_documents
mongodb.collection(:data).find
end
def serialize
# do mapping logic to make it elasticsearch-friendly
end At this point framework cannot understand which errors are tolerable - for example, what if So back to tolerable errors - another place in code that can raise errors is The only errors that we can try to tolerate in the framework are "ingestion" errors - we tried to ingest a document into Elasticsearch, but we failed due to some mapping mismatch, for example. This can be done on framework level and does not need an example connector, but for above cases I believe it's great to provide a solid example connector and explain how it works. We can put this logic in our single example connector, but it's already becoming pretty big and getting out of hand IMO, thus this code is in a separate example connector. Happy to discuss it in more details @wangch079 and this PR here is to open this discussion :) |
We do have it, but it's always better to have an expressive example IMO than just the documentation. |
I re-checked how error handling is implemented in In In Because attachments.each_with_index do |att, index|
yield_with_handling_tolerable_errors do
data = { id: (index + 1).to_s, name: "example document #{index + 1}", _attachment: File.read(att) }
# Uncomment one of these two lines to simulate longer running sync jobs
#
# sleep(rand(10..60).seconds)
# sleep(rand(1..10).minutes)
yield data
end
end I still don't think it's a good idea to introduce a second example connector, besides this:
And also:
|
But it is indeed optional, user can skip it when yielding documents and we won't prevent them.
This connector though is more complex - it has configurable fields that help testing such connector behaviour live - user can create this example connector, mess with the field values (number of ingested documents, % of errors) and see how it works for themselves. I've used it for some small testing and found this super useful. My concern here is that it would be nice to have more example connectors like this - not only to demonstrate how to write the connector, but also to have an example connector that demonstrates the behaviour. They can indeed be combined in a single connector, but it'll make the connector itself huge - 10s of configurable fields, complex logic - and it'll be harder to follow/use this connector as a simple example. |
It's up to the user's implementation, and they can certainly not use Do you have the plan to make
I'm now fine with more than one example connector. Just make sure the service type is properly named, and we shouldn't nest again in dir |
Part of https://github.com/elastic/enterprise-search-team/issues/2271
This PR adds a new example connector (service_type:
example-with-errors
) that ingests random data into index and sometimes raises an error.Connector has 2 settings:
So for
chance_to_raise => 2; generated_document_count => 1_000_000
connector will ingest 1.000.000 random documents and 2% of them will be errors.You can change
chance_to_raise
to higher/lower values to see how connector service handles errors coming from document serialization/mapping.As a side-effect of the change, I've moved original example connector from
Connectors::Example::Connector
toConnectors::Example::Simple::Connector
- I expect some logic from this connector to be split to a separate connector as well.Checklists
Pre-Review Checklist