Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTPYTHON-309 & INTPYTHON-417 Use new cluster and schedule on interval #50

Merged
merged 57 commits into from
Dec 4, 2024

Conversation

blink1073
Copy link
Member

Also resolves INTPYTHON-313 by adding support for docarray on local Atlas.

@blink1073 blink1073 requested a review from Jibola November 22, 2024 23:48
@blink1073
Copy link
Member Author

Before merging, we need to update the Evergreen project config to only run the local variants on pull requests.

Copy link
Collaborator

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I have some clarifying questions, but everything else is solid. Great work! :)

Comment on lines 93 to 118
tasks:
- name: test-semantic-kernel-python
- name: test-semantic-kernel-python-local
commands:
- func: "fetch repo"
- func: "setup local atlas"
- func: "execute tests"

- name: test-semantic-kernel-python-remote
commands:
- func: "fetch repo"
- func: "setup remote atlas"
- func: "execute tests"

- name: test-semantic-kernel-csharp-local
commands:
- func: "fetch repo"
- func: "setup local atlas"
- func: "execute tests"

- name: test-semantic-kernel-csharp-remote
commands:
- func: "fetch repo"
- func: "setup remote atlas"
- func: "execute tests"

- name: test-langchain-python-local
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this ticket.

We should make a python generator for this as well. That or change this to just two tasks.
local run and remote run

There's no difference in the commands otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, replied to the wrong comment: https://jira.mongodb.org/browse/INTPYTHON-441

- func: "execute tests"

- name: test-semantic-kernel-csharp
- name: test-langchain-python-remote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add tags or can we do regex matching in evergreen to make sure these aren't run on pull requests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 46 to 75
# Ensure the remote database is populated.
. .evergreen/utils.sh

PYTHON_BINARY=$(find_python3)

# Should be called from src
EVERGREEN_PATH=$(pwd)/.evergreen
TARGET_DIR=$(pwd)/$DIR
SCAFFOLD_SCRIPT=$EVERGREEN_PATH/scaffold_atlas.py

mkdir atlas

pushd atlas

$PYTHON_BINARY -m venv .
source ./bin/activate
popd

# Test server is up
$PYTHON_BINARY -m pip install pymongo
CONN_STRING=$MONGODB_URI \
$PYTHON_BINARY -c "from pymongo import MongoClient; import os; MongoClient(os.environ['MONGODB_URI']).db.command('ping')"

# Add database and index configurations
DATABASE=$DATABASE \
CONN_STRING=$MONGODB_URI \
REPO_NAME=$REPO_NAME \
DIR=$DIR \
TARGET_DIR=$TARGET_DIR \
$PYTHON_BINARY $SCAFFOLD_SCRIPT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could call provision-atlas as a next step and remove duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored

Comment on lines +4 to +7
"numDimensions": 1536,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the value in the previous Atlas cluster:

image

@blink1073 blink1073 requested a review from Jibola December 2, 2024 12:21
Copy link
Collaborator

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Comment on lines +46 to +48
if collection_name in collections:
logger.debug("Clearing existing collection", collection_name)
db[collection_name].delete_many({})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: We could also drop the collection entirely. This would remove all existing index definitions on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that but it seemed to cause race conditions.

Comment on lines +75 to +80
logger.debug(
"creating search index: %s on %s.%s...",
index_name,
database_name,
collection_name,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Comment on lines +16 to +20
semantic-kernel-python)
MONGODB_URI=$SEMANTIC_KERNEL_MONGODB_URI
;;
semantic-kernel-csharp)
MONGODB_URI=$SEMANTIC_KERNEL_MONGODB_URI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR:
Could it be that sharing the URI could be leading to some failures in index definitions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, given they're using different DBs.

Comment on lines +120 to +148
scaffold_atlas() {
PYTHON_BINARY=$(find_python3)

# Should be called from src
EVERGREEN_PATH=$(pwd)/.evergreen
TARGET_DIR=$(pwd)/$DIR
SCAFFOLD_SCRIPT=$EVERGREEN_PATH/scaffold_atlas.py

mkdir -p atlas
pushd atlas

$PYTHON_BINARY -m venv .
source ./bin/activate
popd

# Test server is up
$PYTHON_BINARY -m pip install pymongo
CONN_STRING=$CONN_STRING \
$PYTHON_BINARY -c "from pymongo import MongoClient; import os; MongoClient(os.environ['CONN_STRING']).db.command('ping')"

# Add database and index configurations
DATABASE=$DATABASE \
CONN_STRING=$CONN_STRING \
REPO_NAME=$REPO_NAME \
DIR=$DIR \
DEBUG="${DEBUG:-1}" \
TARGET_DIR=$TARGET_DIR \
$PYTHON_BINARY $SCAFFOLD_SCRIPT
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@blink1073 blink1073 requested a review from Jibola December 3, 2024 14:21
@blink1073 blink1073 merged commit e94c697 into mongodb-labs:main Dec 4, 2024
9 checks passed
@blink1073 blink1073 deleted the INTPYTHON-309 branch December 4, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants