-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create connector record in agent component #2861
Conversation
) | ||
|
||
if not connector_name: | ||
connector_name = f"[Agentless] {service_type} connector" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried if I create two gdrive connectors, they'll have duplicate names. Is anything preventing that?
Also, I'm not sure we want "Agentless" to leak out into public-facing terminology. Or at least, not in Search pages (it might in Integrations pages). I'd suggest "Elastic-managed" or "Agent", but don't love either and am open to alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use Elastic-managed for now then :) this should be compliant with new terminology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm appending random 4 chars at the end of autogenerate connector name ... this should be fine to stay unique (IDs present in agent policy are toooo long to put in UI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not elasticsearch_config: | ||
return False | ||
|
||
if "host" not in elasticsearch_config or "api_key" not in elasticsearch_config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be checking for cloud_id
instead of api_key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe in addition?
We must have either host or cloud_id, and we MUST have api_key. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking elasticsearch config as seen by the connector service.
cloud_id
is a first for me, I don't see a single reference to it in the repo https://github.com/search?q=repo%3Aelastic%2Fconnectors%20cloud_id&type=code (also in the example config I don't see it)
Also, agent doesn't specify the cloud_id anywhere https://www.elastic.co/guide/en/fleet/current/elasticsearch-output.html
Can you explain what do you mean with cloud id? I don't think we must check for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud ID is present for all cloud deployments and serverless projects.
And it should be the defacto way to connect to Elasticsearch, see: https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-ec
When connecting to Elastic Cloud with the Python Elasticsearch client you should always use the cloud_id parameter to connect.
I'm surprised we don't support an elasticsearch.cloud_id
in our YAML config, IDK how I never noticed that was missing. We should fix that, but I suppose it's a problem for another day. We can disregard it for this PR I suppose, and leave the logic as is, but understand now you're checking to make sure that both host and api key are provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explaining. I had no idea we could use cloud id for connection. I will file an issue to support the cloud id in the config YML!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…om:elastic/connectors-python into create-connector-record-in-agent-component
await self.connector_index.connector_put( | ||
connector_id=connector_id, | ||
service_type=service_type, | ||
connector_name=connector_name, | ||
) | ||
logger.info(f"Created connector record for {connector_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel on also filling up RCFs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be nice especially on serverless as afaik it would take a bit of time until this gets autopopulated, will file an enhancement issue to track it 👍 we might want to test this thoroughly with new connector api _configuration endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not self.connector_index: | ||
self.connector_index = ConnectorIndex(agent_config.get("elasticsearch")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it a cached property instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, we don't have access to agent_config
when instantiating ConnectorRecordManager
, the ensure_connector_records_exist
is called from checking handler with up to date config, so cached_property
can't be used.
Co-authored-by: Artem Shelkovnikov <[email protected]>
Closes https://github.com/elastic/search-team/issues/8265
This change introduces the
ConnectorRecordManager
, which ensures that connectors associated with a given agent are created on startup using theensure_connector_records_exist
function.The check-in handler has been updated to inspect both input and output units passed to the component from the agent protocol. The assumption is that each connector component will receive no more than one output of type ES and one input of type
connectors-py
. I've added logging and additional logic to detect and handle edge cases.If more then one connector component is defined in the policy user will see a log:
Multiple connector inputs detected. The first connector input defined in the agent policy will be used.
Note
We could change this and allow adding multiple connector inputs to the same policy, hence allowing to run multiple connectors on a same agent(less) deployment. This could be more cost efficient, while leaving an option to control what policy is deployed where to manage resource.
Raising this now, but we can revisit later. For now we constrain single connector input <-> single policy <-> single deployment.
Additionally, I implemented logic in the config wrapper to identify changes in the connector configurations. I also removed the configuration that forced the connector service to start in native mode, as this is no longer necessary given that we now have access to the IDs of the connectors we manage.
Validation
E2e tests + unit tests
Pre-Review Checklist
config.yml.example
)v7.13.2
,v7.14.0
,v8.0.0
)