Add example errors connector and move simple example connector to a different namespace #437

artem-shelkovnikov · 2022-11-15T21:52:57Z

Part of https://github.com/elastic/enterprise-search-team/issues/2271

This PR adds a new example connector (service_type: example-with-errors) that ingests random data into index and sometimes raises an error.

Connector has 2 settings:

chance_to_raise - what is the chance in percents each individual document will raise an error?
generated_document_count - number of documents that connector will generate and try to ingest

So for chance_to_raise => 2; generated_document_count => 1_000_000 connector will ingest 1.000.000 random documents and 2% of them will be errors.

You can change chance_to_raise to higher/lower values to see how connector service handles errors coming from document serialization/mapping.

As a side-effect of the change, I've moved original example connector from Connectors::Example::Connector to Connectors::Example::Simple::Connector - I expect some logic from this connector to be split to a separate connector as well.

Checklists

Pre-Review Checklist

Covered the changes with automated tests
Tested the changes locally

mchernyavskaya · 2022-11-16T10:33:49Z

A description for this change would be nice to understand the purpose.

artem-shelkovnikov · 2022-11-18T12:37:56Z

Note for the reviewers: this change is not urgent, so we can discuss what's going on here for a bit!

timgrein

LGTM! One nit and one small comment about validate_filtering

timgrein · 2022-11-23T11:05:01Z

lib/connectors/example/simple/connector.rb

+        end
+
+        def self.validate_filtering(filtering = {})
+          # TODO: real filtering validation will follow later


We've changed this logic to only be present in the base connector and only override self.advanced_snippet_validator (so we plugin a dedicated validation class).

Just as a heads up, when it's merged :-)

timgrein · 2022-11-23T11:05:45Z

Gemfile

@@ -59,5 +59,7 @@ gem 'elasticsearch', '~> 8.5.0'
 # Dependencies for oauth
 gem 'signet', '~> 0.16.0'

+# Dependency for example connector
+gem 'faker', '~> 2.22.0'


Nice 👏 That'll be super helpful for the filtering example connector, too

timgrein · 2022-11-23T11:06:13Z

lib/connectors/example/with_errors/connector.rb

+          'Example Connector that produces transient errors'
+        end
+
+        # Field 'Foo' won't have a default value. Field 'Bar' will have the default value 'Value'.


nit: I think that comment can be removed, right?

wangch079

What is the purpose of the new connector? Is it used for the resilience test? If this is the case, I don't think it appropriate to introduce the connector in dir lib/connectors.

Customers will consider connectors under lib/connectors the "Official" connectors provided by Elastic. The purpose of the example connector is to give customers a minimal connector to start custom implementation.

timgrein · 2022-11-28T12:00:13Z

(Adding my opinion as I'll also add one for filtering and filtering validation)

@wangch079 AFAIK this connector is used to isolate the presentation of different features into dedicated example connectors. The existing example connector should be as simple as possible to provide a simple starting point (as you've also stated). But as the framework grows we introduce advanced features like the new error handling, filtering, filtering validation and so on. This would lead to the example connector growing and growing, which is contradicting to the original purpose of being a simple starting point. Therefore if we add new advanced features we can add dedicated example connectors for developers wanting to use these advanced features (and using the dedicated example connectors as a reference).

IMO it's fine, if all example connectors are located under lib/connectors/example.

WDYT?

wangch079 · 2022-11-28T15:49:05Z

advanced features like the new error handling, filtering, filtering validation and so on.

I assume the new error handling here means the new resilience work by @artem-shelkovnikov , which I think should happen at framework level, not in connector implementation.

For filtering, filtering validation, we should have a placeholder method in Connectors::Base::Connector as a guideline.

IMO it's fine, if all example connectors are located under lib/connectors/example.

I think each connector should have its dedicated directory, and the name should be the service_type.

artem-shelkovnikov · 2022-11-28T16:15:57Z

I assume the new error handling here means the new resilience work by @artem-shelkovnikov , which I think should happen at framework level, not in connector implementation.

The work here indeed happens at a connector level, moreover framework is unable to handle this logic at all - developer of the connector should actually specify which errors are "tolerable" - it's okay to continue extraction of data.

Consider our interface:

def yield_documents
  fetch_documents do |document|
    yield serialize(document)
  end
end

def fetch_documents
  mongodb.collection(:data).find
end

def serialize
  # do mapping logic to make it elasticsearch-friendly
end

At this point framework cannot understand which errors are tolerable - for example, what if fetch_documents fails, should the sync continue the work? Intuitively it can retry only, but if retry fails then sync should be interrupted. Can the framework handle the retries without connector telling explicitly what should be retried? I don't think so.

So back to tolerable errors - another place in code that can raise errors is serialize function - maybe it has a bug inside or maybe data comes in unexpected formats. We'd like to raise error here, but only if too many errors happen, but how is it also possible to make framework handle it without explicitly telling framework to try to tolerate errors here? I can't think of a way.

The only errors that we can try to tolerate in the framework are "ingestion" errors - we tried to ingest a document into Elasticsearch, but we failed due to some mapping mismatch, for example. This can be done on framework level and does not need an example connector, but for above cases I believe it's great to provide a solid example connector and explain how it works.

We can put this logic in our single example connector, but it's already becoming pretty big and getting out of hand IMO, thus this code is in a separate example connector.

Happy to discuss it in more details @wangch079 and this PR here is to open this discussion :)

artem-shelkovnikov · 2022-11-28T16:18:20Z

For filtering, filtering validation, we should have a placeholder method in Connectors::Base::Connector as a guideline.

We do have it, but it's always better to have an expressive example IMO than just the documentation.

wangch079 · 2022-11-29T07:22:39Z

I re-checked how error handling is implemented in ent-search, and I agree that we won't be able to make it 100% at framework level. In fact, it's completely in connector level, Connectors::Base::Connector is still connector.

In ent-search, 1) it wraps yield_document_changes with with_auth_tokens_and_retry, and in each connector implementation, 2) it wraps the yield with yield_single_document_change, where monitor functions.

In connectors-ruby, we don't have 1), which is fine, we can add it later if it's necessary. For 2), yield_with_handling_tolerable_errors serves the exact same purpose as yield_single_document_change, just a different name.

Because yield_with_handling_tolerable_errors is used in each connector implementation, it makes sense to include it in the example connector, and explain how it works:

attachments.each_with_index do |att, index|
  yield_with_handling_tolerable_errors do
    data = { id: (index + 1).to_s, name: "example document #{index + 1}", _attachment: File.read(att) }

    # Uncomment one of these two lines to simulate longer running sync jobs
    #
    # sleep(rand(10..60).seconds)
    # sleep(rand(1..10).minutes)

    yield data
  end
end

I still don't think it's a good idea to introduce a second example connector, besides this:

Customers will consider connectors under lib/connectors the "Official" connectors provided by Elastic. The purpose of the example connector is to give customers a minimal connector to start custom implementation.

And also:

the example connector should be a minimal but fully functional connector, adding error handling is just adding one line of code.
adding an example-with-errors connector implies tolerable error handling is optional.

artem-shelkovnikov · 2022-11-30T13:40:56Z

adding an example-with-errors connector implies tolerable error handling is optional.

But it is indeed optional, user can skip it when yielding documents and we won't prevent them.

the example connector should be a minimal but fully functional connector, adding error handling is just adding one line of code.

This connector though is more complex - it has configurable fields that help testing such connector behaviour live - user can create this example connector, mess with the field values (number of ingested documents, % of errors) and see how it works for themselves.

I've used it for some small testing and found this super useful.

My concern here is that it would be nice to have more example connectors like this - not only to demonstrate how to write the connector, but also to have an example connector that demonstrates the behaviour. They can indeed be combined in a single connector, but it'll make the connector itself huge - 10s of configurable fields, complex logic - and it'll be harder to follow/use this connector as a simple example.

wangch079 · 2022-12-02T07:51:35Z

But it is indeed optional, users can skip it when yielding documents and we won't prevent them.

It's up to the user's implementation, and they can certainly not use yield_with_handling_tolerable_errors at all.

Do you have the plan to make total_errors, consecutive_errors, and errors_in_window configurable? If so, users can practically disable tolerable errors by configuration.

user can create this example connector, mess with the field values (number of ingested documents, % of errors) and see how it works for themselves.

They can indeed be combined in a single connector, but it'll make the connector itself huge

I'm now fine with more than one example connector. Just make sure the service type is properly named, and we shouldn't nest again in dir connectors

github-actions bot added auto-backport v8.6.0.4 labels Nov 15, 2022

artem-shelkovnikov force-pushed the artem/add-example-error-connector branch from 7b2e581 to 543e893 Compare November 18, 2022 11:32

artem-shelkovnikov changed the title ~~WIP - adding example errors connector~~ Add example errors connector and move simple example connector to a different namespace Nov 18, 2022

artem-shelkovnikov marked this pull request as ready for review November 18, 2022 12:37

artem-shelkovnikov added 3 commits November 18, 2022 13:37

WIP

78591c2

Fix bugs

cc3b494

Add error tolerance logic to the connector

5ee17c0

artem-shelkovnikov force-pushed the artem/add-example-error-connector branch from 1d635ab to 5ee17c0 Compare November 18, 2022 12:37

artem-shelkovnikov requested a review from a team November 18, 2022 12:38

Pin faker version

e6c3289

timgrein approved these changes Nov 23, 2022

View reviewed changes

wangch079 reviewed Nov 28, 2022

View reviewed changes

artem-shelkovnikov closed this Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example errors connector and move simple example connector to a different namespace #437

Add example errors connector and move simple example connector to a different namespace #437

artem-shelkovnikov commented Nov 15, 2022 •

edited

Loading

mchernyavskaya commented Nov 16, 2022

artem-shelkovnikov commented Nov 18, 2022

timgrein left a comment •

edited

Loading

timgrein Nov 23, 2022

timgrein Nov 23, 2022 •

edited

Loading

timgrein Nov 23, 2022

wangch079 left a comment

timgrein commented Nov 28, 2022

wangch079 commented Nov 28, 2022

artem-shelkovnikov commented Nov 28, 2022

artem-shelkovnikov commented Nov 28, 2022

wangch079 commented Nov 29, 2022

artem-shelkovnikov commented Nov 30, 2022

wangch079 commented Dec 2, 2022 •

edited

Loading

Add example errors connector and move simple example connector to a different namespace #437

Add example errors connector and move simple example connector to a different namespace #437

Conversation

artem-shelkovnikov commented Nov 15, 2022 • edited Loading

Part of https://github.com/elastic/enterprise-search-team/issues/2271

Checklists

Pre-Review Checklist

mchernyavskaya commented Nov 16, 2022

artem-shelkovnikov commented Nov 18, 2022

timgrein left a comment • edited Loading

Choose a reason for hiding this comment

timgrein Nov 23, 2022

Choose a reason for hiding this comment

timgrein Nov 23, 2022 • edited Loading

Choose a reason for hiding this comment

timgrein Nov 23, 2022

Choose a reason for hiding this comment

wangch079 left a comment

Choose a reason for hiding this comment

timgrein commented Nov 28, 2022

wangch079 commented Nov 28, 2022

artem-shelkovnikov commented Nov 28, 2022

artem-shelkovnikov commented Nov 28, 2022

wangch079 commented Nov 29, 2022

artem-shelkovnikov commented Nov 30, 2022

wangch079 commented Dec 2, 2022 • edited Loading

artem-shelkovnikov commented Nov 15, 2022 •

edited

Loading

timgrein left a comment •

edited

Loading

timgrein Nov 23, 2022 •

edited

Loading

wangch079 commented Dec 2, 2022 •

edited

Loading