Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

populate time fields for connectors on return #2922

Merged
merged 8 commits into from
Oct 1, 2024

Conversation

brianf-aws
Copy link
Contributor

@brianf-aws brianf-aws commented Sep 10, 2024

Description

Currently any class that extends the AbstractConnector class has the fields createdTime and lastUpdatedTime set to null. The solution was creating setters on the connector interface and implementing on the abstract connector. Now on Indexing new connectors (Via the create API) the time fields will be instantiated. Additionally on update a connector, lastUpdateTime has been set. Two unit tests in UpdateConnectorTransportActionTests named UpdateConnectorTransportActionTests and testUpdateConnectorUpdatesHttpConnectorTimeFields

Related Issues

fixes #2890 Created Time and Last Updated Time are not implemented on Connector

Check List

  • New functionality includes testing.
  • Commits are signed per the DCO using --signoff.

Changes

GET CONNECTOR (NO UPDATE)

Connector createdTime lastUpdateTime
Old (was indexed prior to code change)
New (With code change)

GET CONNECTOR (AFTER UPDATE)

Connector createdTime lastUpdateTime
Old (was indexed prior to code change)
New (With code change)
  1. Observe that connector creations using the Get Connector API now have a created_time,last_updated_time
    a. creating a connector
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
  "name": "Chat Gpt 3.5",
  "description": "Brians Test Chatgpt",
  "version": "1",
  "protocol": "http",
  "parameters": {
    "endpoint": "api.openai.com",
    "model": "gpt-3.5-turbo"
  },
  "credential": {
    "openAI_key": "'"$BRIAN_OPENAI"'"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://${parameters.endpoint}/v1/chat/completions",
      "headers": {
        "Authorization": "Bearer ${credential.openAI_key}"
      },
      "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages} }"
    }
  ]
}' \
  localhost:9200/_plugins/_ml/connectors/_create
 b. Inspecting changes via a get request  `curl -XGET "http://localhost:9200/_plugins/_ml/connectors/lkf-3JEBz83ApvP3YOPy"`. Also note the followinf fields have been added `created_time,last_updated_time`
 {
    "name": "Chat Gpt 3.5",
    "version": "1",
    "description": "Brians Test Chatgpt",
    "protocol": "http",
    "parameters": {
        "endpoint": "api.openai.com",
        "model": "gpt-3.5-turbo"
    },
    "actions": [
        {
            "action_type": "PREDICT",
            "method": "POST",
            "url": "https://${parameters.endpoint}/v1/chat/completions",
            "headers": {
                "Authorization": "Bearer ${credential.openAI_key}"
            },
            "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages} }"
        }
    ],
    "created_time": 1725989544074,
    "last_updated_time": 1725989544074
}
  1. Note how when updating the connector using the Update Connector API and then calling get it changes the the last updated time and leaves the created time the same
    a. Update the connector curl -XPUT "http://localhost:9200/_plugins/_ml/connectors/lkf-3JEBz83ApvP3YOPy" -H "Content-Type: application/json' -d' { "description": "The new description of Brians chatGpt model" }"

    b. get the connector again and note the new time curl -XGET "http://localhost:9200/_plugins/_ml/connectors/lkf-3JEBz83ApvP3YOPy" which yields


{
    "name": "Chat Gpt 3.5",
    "version": "1",
    "description": "The new description of Brians chatGpt model",
    "protocol": "http",
    "parameters": {
        "endpoint": "api.openai.com",
        "model": "gpt-3.5-turbo"
    },
    "actions": [
        {
            "action_type": "PREDICT",
            "method": "POST",
            "url": "https://${parameters.endpoint}/v1/chat/completions",
            "headers": {
                "Authorization": "Bearer ${credential.openAI_key}"
            },
            "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages} }"
        }
    ],
    "created_time": 1725989544074,
    "last_updated_time": 1725989603171
}
  1. In summary the initial connector creation has the fields "created_time": 1725989544074, "last_updated_time": 1725989544074 and after the update its "created_time": 1725989544074, "last_updated_time": 1725989603171. You can see how it changed but the created_time stayed the same.

Testing

  • Manual testing via using docker-compose (Was done in 2.x to get around version incompatibility and cherry picked to main branch)
  1. Spin up a docker compose file without the code changes (i.e. using the ml-commons code thats packed with openSearch)
  2. Create a connector and see it has been created via the get connectors API, (Observe no time fields)
  3. shut down the container run ./gradlew assemble to pack the code change into a zip file
  4. add the following commands to the docker compose file command: bash -c "bin/opensearch-plugin remove opensearch-skills; bin/opensearch-plugin remove opensearch-ml; bin/opensearch-plugin install --batch file:///<Path to the zip>/opensearch-ml-X.X.X.X-SNAPSHOT.zip; ./opensearch-docker-entrypoint.sh opensearch"
  5. Get the connector that was done on step 2 observe no observed time
  6. Create a NEW connector and observe it has the time fields update it to see the updated time stamp works
  • Two new unit tests were onboarded to simulate these changes in UpdateConnectorTransportActionTests named
  1. testUpdateConnectorDoesNotUpdateHttpConnectorTimeFields checks that when creating a connector time fields arent populated we test this because we want to make sure that connectors dont get this field populated instead we want to make sure that the Transport layer sets this additionally when these fields are null we need to make sure the transport layer does not set the updated time on a update.
  2. testUpdateConnectorUpdatesHttpConnectorTimeFields Checks that time fields are instantiated and that after a update the updated time is bigger than the created time.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

fixes opensearch-project#2890 Currently any class that extends the AbstractConnector class has the fields createdTime and lastUpdatedTime set to null. The solution was instantiating the fields in the constructor of the AbstractConnector class, as well updating it within the HTTPConnector class whenever an update happens. Many tests were modified to catch the time fields being populated as such there will be many differences on the string in order to get around the timing issue when doing tests.

Signed-off-by: Brian Flores <[email protected]>
@dhrubo-os
Copy link
Collaborator

Let's add more details in the description like how did you test?

Let's create a connector and invoke the get connector api to get the connector details to show the create and update timestamp

After that update the connector and then show the change of updated timestamp.

@brianf-aws
Copy link
Contributor Author

brianf-aws commented Sep 10, 2024

Hey @dhrubo-os Thanks for the feedback I updated my post based on your feedback. I uploaded a video that you can click on to see a visual of the written steps (2 mins) Its on the heading Changes

I also did a test on a cluster and the change works as well. (Amazonians can see this video as proof)

fixes opensearch-project#2890 when applying a code change like this previous connectors would have a weird bug where upon calling GET on them would change the timestamp. In this commit, it remains the old connectors without time fields while new ones have time fields, newer connectors will have correct and updated Timestamp. Manual testing was done on a local cluster and two unit tests were done to inspect the time changes on creation and update

Signed-off-by: Brian Flores <[email protected]>
Originally this commit was cherry picked from the 2.x branch and as such code changes affected the new build that werent caught on the previous commit 8c006de. Reformatted tests that were failing as the behavior implemented in previous commits was to not display time fields if a connector does not have them in the first place. gradlew build was done to assure the tests passed

Signed-off-by: Brian Flores <[email protected]>
@@ -93,6 +95,11 @@ protected void doExecute(Task task, ActionRequest request, ActionListener<Update
if (Boolean.TRUE.equals(hasPermission)) {
connector.update(mlUpdateConnectorAction.getUpdateContent(), mlEngine::encrypt);
connector.validateConnectorURL(trustedConnectorEndpointsRegex);

if (connector instanceof HttpConnector && ((HttpConnector) connector).getLastUpdateTime() != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only HttpConnector? Can't we just check connector.getLastUpdateTime() != null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to bloat the connector interface with a getter as it already has two getters. Also AbstractConnector has the time fields not the connector interface

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if getLastUpdateTime is null? Don't we want to update those values also?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the concern if we just add connector.setLastUpdateTime(Instant.now()); without any checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the updates should be built-in (but overridable) defaults on the AbstractConnector. May not need a setter for it as it should generally done automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dhrubo-os, @dbwiddis

This method will update connectors when it does this my theory is that deserializes the connector from an index and constructs the HTTPConnector object when this happens it is very hard to control the update fields (on instantiation).

If we decide to add the time fields in the abstract Connector constructor we have a problem between old connectors and new ones. This results in a unexpected bug when connectors that didn't have these fields get brought to life; every subsequent update on the connector makes the time fields a new number every time.

Now if we wanted old connectors to not have the time fields (what our users expect when they update a old connector) they wont see the time fields because this method checks if an update should work on time only if they have the field. When a connector thats been created with the code change, they will see time fields and will update accordingly well.

Please let me know if you are still confused I would be happy to explain more in depth :)

Previoulsy we didnt consider the old connectors to have time fields at all, But given offline discussion if we add time fields to old connectors users could get more information moving forward without breaking any backward features. The solution to this was setting the last updated time in the update connector api; now moving forward any connector gets attached a last updated time field. I updated the testUpdateConnectorDoesNotUpdateHTTPCOnnectorTimeFields method to check that lastUpdateTime has a timestamp but that createdTime has no time field.

Signed-off-by: Brian Flores <[email protected]>
@brianf-aws
Copy link
Contributor Author

Hi Everyone thanks for your patience on this bug Ive updated this code with a final change that was discussed offline. Here is the final changes in a tabulated form. I would appreciate your review once more 😅 @dbwiddis, @rbhavna, @dhrubo-os, @ylwu-amzn

GET CONNECTOR (NO UPDATE)

Connector createdTime lastUpdateTime
Old (was indexed prior to code change)
New (With code change)

GET CONNECTOR (AFTER UPDATE)

Connector createdTime lastUpdateTime
Old (was indexed prior to code change)
New (With code change)

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this version. Simple, readable.

import java.util.Map;
import java.util.UUID;
import java.time.Instant;
import java.util.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally frowned on to use star imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, yeah will fix that

Copy link
Collaborator

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this issue!

@dhrubo-os dhrubo-os merged commit 88eaefd into opensearch-project:main Oct 1, 2024
8 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2922-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 88eaefdcfc2d1e37b8ec7bffd0e730840325a3f2
# Push it to GitHub
git push --set-upstream origin backport/backport-2922-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2922-to-2.x.

@dhrubo-os
Copy link
Collaborator

@brianf-aws Backport PR for 2.x is failing. Can you please check why it's failing. And also please raise a backport PR for 2.x branch.

brianf-aws added a commit to brianf-aws/ml-commons that referenced this pull request Oct 2, 2024
* populate time fields for connectors on return

fixes opensearch-project#2890 Currently any class that extends the AbstractConnector class has the fields createdTime and lastUpdatedTime set to null. The solution was instantiating the fields in the constructor of the AbstractConnector class, as well updating it within the HTTPConnector class whenever an update happens. Many tests were modified to catch the time fields being populated as such there will be many differences on the string in order to get around the timing issue when doing tests.

Signed-off-by: Brian Flores <[email protected]>

* fixes backward compatability issues with old connectors

fixes opensearch-project#2890 when applying a code change like this previous connectors would have a weird bug where upon calling GET on them would change the timestamp. In this commit, it remains the old connectors without time fields while new ones have time fields, newer connectors will have correct and updated Timestamp. Manual testing was done on a local cluster and two unit tests were done to inspect the time changes on creation and update

Signed-off-by: Brian Flores <[email protected]>

* fix failing MLRegisterModelInutTest.testToXContent tests

Originally this commit was cherry picked from the 2.x branch and as such code changes affected the new build that werent caught on the previous commit 8c006de. Reformatted tests that were failing as the behavior implemented in previous commits was to not display time fields if a connector does not have them in the first place. gradlew build was done to assure the tests passed

Signed-off-by: Brian Flores <[email protected]>

* Reverts back model tests that were modified incorrectly by connector change

When creating a code change to the connector it propagated the new change of the object that affected many UTs, but after changing the logic of indexing the new connector, change the old changes for the unit test involving models with connectors had to be reverted back. UTs specifically for the indexed connectors have been created in UpdateConnectorTransportActionTests were done to capture this

Signed-off-by: Brian Flores <[email protected]>

* Adds lastUpdateTime to Old Connectors

Previoulsy we didnt consider the old connectors to have time fields at all, But given offline discussion if we add time fields to old connectors users could get more information moving forward without breaking any backward features. The solution to this was setting the last updated time in the update connector api; now moving forward any connector gets attached a last updated time field. I updated the testUpdateConnectorDoesNotUpdateHTTPCOnnectorTimeFields method to check that lastUpdateTime has a timestamp but that createdTime has no time field.

Signed-off-by: Brian Flores <[email protected]>

* Fixes wildcard import in UpdateConnectorTransportActionTests

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit 88eaefd)
dhrubo-os pushed a commit that referenced this pull request Oct 3, 2024
* populate time fields for connectors on return

fixes #2890 Currently any class that extends the AbstractConnector class has the fields createdTime and lastUpdatedTime set to null. The solution was instantiating the fields in the constructor of the AbstractConnector class, as well updating it within the HTTPConnector class whenever an update happens. Many tests were modified to catch the time fields being populated as such there will be many differences on the string in order to get around the timing issue when doing tests.

Signed-off-by: Brian Flores <[email protected]>

* fixes backward compatability issues with old connectors

fixes #2890 when applying a code change like this previous connectors would have a weird bug where upon calling GET on them would change the timestamp. In this commit, it remains the old connectors without time fields while new ones have time fields, newer connectors will have correct and updated Timestamp. Manual testing was done on a local cluster and two unit tests were done to inspect the time changes on creation and update

Signed-off-by: Brian Flores <[email protected]>

* fix failing MLRegisterModelInutTest.testToXContent tests

Originally this commit was cherry picked from the 2.x branch and as such code changes affected the new build that werent caught on the previous commit 8c006de. Reformatted tests that were failing as the behavior implemented in previous commits was to not display time fields if a connector does not have them in the first place. gradlew build was done to assure the tests passed

Signed-off-by: Brian Flores <[email protected]>

* Reverts back model tests that were modified incorrectly by connector change

When creating a code change to the connector it propagated the new change of the object that affected many UTs, but after changing the logic of indexing the new connector, change the old changes for the unit test involving models with connectors had to be reverted back. UTs specifically for the indexed connectors have been created in UpdateConnectorTransportActionTests were done to capture this

Signed-off-by: Brian Flores <[email protected]>

* Adds lastUpdateTime to Old Connectors

Previoulsy we didnt consider the old connectors to have time fields at all, But given offline discussion if we add time fields to old connectors users could get more information moving forward without breaking any backward features. The solution to this was setting the last updated time in the update connector api; now moving forward any connector gets attached a last updated time field. I updated the testUpdateConnectorDoesNotUpdateHTTPCOnnectorTimeFields method to check that lastUpdateTime has a timestamp but that createdTime has no time field.

Signed-off-by: Brian Flores <[email protected]>

* Fixes wildcard import in UpdateConnectorTransportActionTests

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit 88eaefd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Created Time and Last Updated Time are not implemented on Connector
5 participants