Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests #149

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
25d1aaa
test with k3s
roxanne-o Nov 23, 2024
8766327
more stuff
roxanne-o Nov 23, 2024
a9ac588
fix a few things up
roxanne-o Nov 23, 2024
7bcf454
Automatically reformatting code with black and isort
Nov 23, 2024
9e4312f
update this
roxanne-o Nov 23, 2024
0fa2948
get rid of this
roxanne-o Nov 23, 2024
c8d869b
Merge branch 'test-with-k3s' of https://github.com/groundlight/edge-e…
roxanne-o Nov 23, 2024
7f0f8e0
some more stuff
roxanne-o Nov 23, 2024
20d4fda
fix yaml
roxanne-o Nov 23, 2024
9a66be7
add poetry install
roxanne-o Nov 23, 2024
a7711db
watch for rollout status
roxanne-o Nov 23, 2024
b2e7131
describe pods for debugging
roxanne-o Nov 23, 2024
40b13d6
try this
roxanne-o Nov 23, 2024
bf624b1
try this??
roxanne-o Nov 23, 2024
f3afd07
try this?
roxanne-o Nov 23, 2024
3da9411
hm okay try this
roxanne-o Nov 23, 2024
9364ee8
add this as well
roxanne-o Nov 23, 2024
46a72dc
add rollout stuff
roxanne-o Nov 23, 2024
35a7e22
Update test/setup_k3s_test_environment.sh
roxanne-o Nov 25, 2024
25b90d1
Merge branch 'main' of https://github.com/groundlight/edge-endpoint i…
roxanne-o Dec 3, 2024
c054d04
some stuff
roxanne-o Dec 6, 2024
7f73916
fix
roxanne-o Dec 6, 2024
671f0a2
Merge branch 'main' of https://github.com/groundlight/edge-endpoint i…
roxanne-o Dec 6, 2024
e51557b
stuff passes now
roxanne-o Dec 6, 2024
183ecd8
update pipeline
roxanne-o Dec 6, 2024
767d88f
fix pipeline
roxanne-o Dec 6, 2024
ee07cca
fix sequencing
roxanne-o Dec 6, 2024
21d66f3
nevermind
roxanne-o Dec 6, 2024
535fab1
setup basic
roxanne-o Dec 8, 2024
b3958aa
more stuff
roxanne-o Dec 9, 2024
501fb98
Automatically reformatting code with black and isort
Dec 9, 2024
75e26b9
more stuff
roxanne-o Dec 9, 2024
f621047
Merge branch 'integration-tests' of https://github.com/groundlight/ed…
roxanne-o Dec 9, 2024
7010918
Automatically reformatting code with black and isort
Dec 9, 2024
ca40e11
finishing up
roxanne-o Dec 10, 2024
e6d8ad8
Automatically reformatting code with black and isort
Dec 10, 2024
b63eedf
tweak some stuff
roxanne-o Dec 13, 2024
142d673
Merge branch 'integration-tests' of https://github.com/groundlight/ed…
roxanne-o Dec 13, 2024
6175cc4
Automatically reformatting code with black and isort
Dec 13, 2024
ffac9e5
cleaning up pt2
roxanne-o Dec 13, 2024
8d0280e
Merge branch 'integration-tests' of https://github.com/groundlight/ed…
roxanne-o Dec 13, 2024
471a60f
Automatically reformatting code with black and isort
Dec 13, 2024
4e403bd
Merge branch 'main' of https://github.com/groundlight/edge-endpoint i…
roxanne-o Jan 21, 2025
7554c74
updates from merge
roxanne-o Jan 21, 2025
b2de82d
Merge branch 'integration-tests' of https://github.com/groundlight/ed…
roxanne-o Jan 21, 2025
b6063f9
make some tests pass?
roxanne-o Jan 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ jobs:
needs:
- test-general-edge-endpoint
- test-sdk
- test-with-k3s
- validate-setup-ee
runs-on: ubuntu-22.04
steps:
Expand All @@ -276,7 +277,10 @@ jobs:

update-glhub:
if: github.ref == 'refs/heads/main'
needs: validate-setup-ee
needs:
- validate-setup-ee
- test-sdk
- test-with-k3s
runs-on: ubuntu-latest
environment: live

Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ test-all: test test-with-docker ## Run all tests in one make command
@echo "All tests completed."

test-with-k3s:
. test/setup_k3s_test_environment.sh && poetry run pytest -m live
. test/integration/setup_and_run_tests.sh

validate-setup-ee:
test/validate_setup_ee.sh
Expand Down
1 change: 0 additions & 1 deletion deploy/bin/setup-ee.sh
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,6 @@ if [[ "${DEPLOY_LOCAL_VERSION}" == "1" ]]; then

# Use envsubst to replace the PERSISTENT_VOLUME_NAME, PERSISTENT_VOLUME_NAME in the local_persistent_volume.yaml template
envsubst < deploy/k3s/local_persistent_volume.yaml > deploy/k3s/local_persistentvolume.yaml
echo $PERSISTENT_VOLUME_NAME
$K apply -f deploy/k3s/local_persistentvolume.yaml
rm deploy/k3s/local_persistentvolume.yaml

Expand Down
1,832 changes: 897 additions & 935 deletions poetry.lock

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion test/api/test_image_queries_live.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@
# - name="edge_testing_det",
# - query="Is there a dog in the image?",
# - confidence_threshold=0.9
DETECTOR_ID = "det_2SagpFUrs83cbMZsap5hZzRjZw4"

# we use a dynamically created detector for integration tests
DETECTOR_ID = os.getenv("DETECTOR_ID", "det_2SagpFUrs83cbMZsap5hZzRjZw4")


@pytest.mark.live
Expand Down
Binary file added test/integration/cat.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test/integration/dog.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
144 changes: 144 additions & 0 deletions test/integration/integration_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
## The integration tests consists of a back and forth between python (which we use to create and validate
## image queries) and bash (which we use to check deployments are properly rolled out)
## This file contains all the modes that we use for integeration testing.
## Modes:
## - Create the integration test detector
## - Submit the initial dog/cat image query to the edge, expect low confidence
## - Train the edge model by submitting image queries to the cloud.
## - Submit the final dog/cat image query to the edge, expect high confidence

import argparse
import random
import time

from groundlight import Groundlight, GroundlightClientError
from model import Detector

NUM_IQS_TO_IMPROVE_MODEL = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is slightly misleading because I think double this amount of IQs actually get submitted? Maybe it should be renamed to something likeNUM_IQS_PER_CLASS_TO_IMPROVE_MODEL, though I don't think it's too important either way.

ACCETABLE_TRAINED_CONFIDENCE = 0.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ACCETABLE_TRAINED_CONFIDENCE = 0.8
ACCEPTABLE_TRAINED_CONFIDENCE = 0.8

And wherever this value is used



def get_groundlight():
try:
return Groundlight(endpoint="http://localhost:30107")
except GroundlightClientError:
# we use this to create a detector since we do that before setting up edge
# although maybe we want to be more careful here about making sure that's
# the case we're in
return Groundlight()


gl = get_groundlight()


def main():
parser = argparse.ArgumentParser(
description="Submit a dog and cat image to k3s Groundlight edge-endpoint for integration tests"
)
parser.add_argument(
"-m",
"--mode",
type=str,
choices=["create_detector", "initial", "improve_model", "final"],
help="Mode of operation: 'initial', 'many', or 'final'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have "create_detector", "initial", "improve_model", and "final" as the options? Or is this saying something different?

required=True,
)
parser.add_argument("-d", "--detector_id", type=str, help="id of detector to use", required=False)
args = parser.parse_args()

detector = None
if args.detector_id:
detector = gl.get_detector(args.detector_id)

if detector is None and args.mode != "create_detector":
raise ValueError("You must provide detector id unless mode is create detector")

if args.mode == "create_detector":
detector_id = create_cat_detector()
print(detector_id) # print so that the shell script can save the value
elif args.mode == "initial":
submit_initial(detector)
elif args.mode == "improve_model":
improve_model(detector)
elif args.mode == "final":
submit_final(detector)


def create_cat_detector() -> str:
"""Create the intial cat detector that we use for the integration tests. We create
a new one each time."""
random_number = random.randint(0, 9999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth increasing the range here just to make it even more unlikely that we get a collision? You could also generate a ksuid like we do here to ensure there's no issues.

detector = gl.create_detector(name=f"cat_{random_number}", query="Is this a cat?")
detector_id = detector.id
return detector_id


def submit_initial(detector: Detector) -> str:
"""Submit the initial dog and cat image to the edge. Since this method is called at the beginning
of integration tests, we expect a low confidence from the default edge model"""
start_time = time.time()
# 0.5 threshold to ensure we get a edge answer
iq_yes = _submit_cat(detector, confidence_threshold=0.5)
iq_no = _submit_dog(detector, confidence_threshold=0.5)
end_time = time.time()
print(f"Time taken to get low confidence response from edge: {end_time - start_time} seconds")

# a bit dependent on the current default model,
# but that one always defaults to 0.5 confidence at first.
assert iq_yes.result.confidence == 0.5
assert iq_no.result.confidence == 0.5
Comment on lines +86 to +89
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we're planning to make the default edge binary pipeline be our normal default binary pipeline, which does make actual zeroshot predictions (which are still close to 0.5, but not exactly 0.5). Maybe this should check that the confidence is in a slightly wider range? I'm worried we won't remember to update this when we change the default edge pipeline.



def improve_model(detector: Detector):
"""Improve the edge model by escalating to the cloud."""
for _ in range(NUM_IQS_TO_IMPROVE_MODEL):
# there's a subtle tradeoff here.
# we're submitting images from the edge which will get escalated to the cloud
# and thus train our model. but this process is slow
iq_yes = _submit_cat(detector, confidence_threshold=1, wait=0)
gl.add_label(image_query=iq_yes, label="YES")
iq_no = _submit_dog(detector, confidence_threshold=1, wait=0)
gl.add_label(image_query=iq_no, label="NO")


def submit_final(detector: Detector):
"""This is called at the end of our integration tests to make sure the edge model
is now confident."""
# 0.5 threshold to ensure we get a edge answer
start_time = time.time()
iq_yes = _submit_cat(detector, confidence_threshold=0.5)
iq_no = _submit_dog(detector, confidence_threshold=0.5)
end_time = time.time()
print(f"Time taken to get high confidence response from edge: {end_time - start_time} seconds")

assert iq_yes.result.confidence > ACCETABLE_TRAINED_CONFIDENCE
assert iq_yes.result.label.value == "YES"
print(f"Final confidence for yes result: {iq_yes.result.confidence}")

assert iq_no.result.confidence > ACCETABLE_TRAINED_CONFIDENCE
assert iq_no.result.label.value == "NO"
print(f"Final confidence for no result: {iq_no.result.confidence}")


def _submit_cat(detector: Detector, confidence_threshold: float, wait: int = None):
return _submit_dog_or_cat(
detector=detector, confidence_threshold=confidence_threshold, img_file="./test/integration/cat.jpg", wait=wait
)


def _submit_dog(detector: Detector, confidence_threshold: float, wait: int = None):
return _submit_dog_or_cat(
detector=detector, confidence_threshold=confidence_threshold, img_file="./test/integration/dog.jpg", wait=wait
)


def _submit_dog_or_cat(detector: Detector, confidence_threshold: float, img_file: str, wait: int = None):
image_query = gl.submit_image_query(
detector=detector, confidence_threshold=confidence_threshold, image=img_file, wait=wait
)

return image_query


if __name__ == "__main__":
main()
48 changes: 48 additions & 0 deletions test/integration/run_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# This script runs integration tests, assuming k3s and detector setup via setup_and_run_tests.sh.
# Run all tests with: > make test-with-k3s
# It combines Python (for image submission) and Bash (for k3s checks).
# The test includes:
# 1) Running pytest live tests for health, readiness, and image submission to the edge.
# 2) Submitting an image to the edge using a cat/dog detector,
# checking for low confidence, training the edge detector via cloud escalation, and
# verifying model improvement in a new edge pod.

# first do basic pytest integration style tests
# we skip the async test because we're setup for edge answers
if ! poetry run pytest -m live -k "not test_post_image_query_via_sdk_want_async"; then
echo "Error: pytest integration tests failed."
exit 1
fi

echo "Submitting initial iqs, ensuring we get low confidence at first"
# submit initial tests that we get low confidence answers at first
poetry run python test/integration/integration_test.py -m initial -d $DETECTOR_ID

echo "Training detector in the cloud"
# now we improve the model by submitting many iqs and labels
poetry run python test/integration/integration_test.py -m improve_model -d $DETECTOR_ID

# Give the new model time to be pulled. We're a bit generous here.
echo "Now we sleep for $((3 * REFRESH_RATE)) seconds to get a newer model"
sleep $((3 * REFRESH_RATE))
echo "Ensuring a new pod for the deployment $DETECTOR_ID_WITH_DASHES has been created in the last $REFRESH_RATE seconds..."

# Ensure our most recent pod is brand new.
most_recent_pod=$(kubectl get pods -n $DEPLOYMENT_NAMESPACE -l app=inference-server -o jsonpath='{.items[-1].metadata.name}')
current_time=$(date +%s)
pod_creation_time=$(kubectl get pod $most_recent_pod -n $DEPLOYMENT_NAMESPACE -o jsonpath='{.metadata.creationTimestamp}')
pod_creation_time_seconds=$(date -d "$pod_creation_time" +%s)
time_difference=$((current_time - pod_creation_time_seconds))


# Check if the pod was created within 1.1 times the refresh rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should say 3 times the refresh rate here I think.

if [ $(echo "$time_difference <= $REFRESH_RATE * 3" | bc) -eq 1 ]; then
echo "A new pod for the deployment $DETECTOR_ID_WITH_DASHES has been created within 3 times the refresh rate."
else
echo "Error: No new pod for the deployment $DETECTOR_ID_WITH_DASHES has been created within 3 times the refresh rate."
exit 1
fi

echo now we check if the edge model performs well...
poetry run python test/integration/integration_test.py -m final -d $DETECTOR_ID
echo All tests pass :D
81 changes: 81 additions & 0 deletions test/integration/setup_and_run_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/bin/bash

# This script will setup the k3s testing environment. Once you've run them you can run the
# live tests, which will hit the API service that got setup
# Altogether, you can run everything with:
# > make test-with-k3s

if [ -z "$GROUNDLIGHT_API_TOKEN" ]; then
echo "Error: GROUNDLIGHT_API_TOKEN environment variable is not set."
exit 1
fi

if ! command -v k3s &> /dev/null
then
echo "Error: you must have k3s setup"
exit 1

fi

# First create a detector to use for testing:
export DETECTOR_ID=$(poetry run python test/integration/integration_test.py --mode create_detector)
echo "created detector with id: $DETECTOR_ID"

# set some other environment variables
export PERSISTENT_VOLUME_NAME="test-with-k3s-pv"
export EDGE_ENDPOINT_PORT="30107"
export INFERENCE_FLAVOR="CPU"
export LIVE_TEST_ENDPOINT="http://localhost:$EDGE_ENDPOINT_PORT"
export REFRESH_RATE=60 # not actually different than the default, but we may want to tweak this

# update the config for this detector, such that we always take edge answers
# but first, save the template to a temporary file
cp configs/edge-config.yaml configs/edge-config.yaml.tmp
sed -i "s/detector_id: \"\"/detector_id: \"$DETECTOR_ID\"/" configs/edge-config.yaml
sed -i "s/refresh_rate: 60/refresh_rate: $REFRESH_RATE/" configs/edge-config.yaml

# # now we should delete the persistent volume before, in case it's in a bad state
if kubectl get pv "$PERSISTENT_VOLUME_NAME" &> /dev/null; then
echo "Persistent volume $PERSISTENT_VOLUME_NAME exists. Deleting it..."
kubectl delete pv "$PERSISTENT_VOLUME_NAME" &
echo "Persistent volume $PERSISTENT_VOLUME_NAME deleted."
else
echo "Persistent volume $PERSISTENT_VOLUME_NAME does not exist. No action needed."
fi


export DEPLOYMENT_NAMESPACE="test-with-k3s"
if ! kubectl get namespace $DEPLOYMENT_NAMESPACE &> /dev/null; then
kubectl create namespace $DEPLOYMENT_NAMESPACE
fi


# Build the Docker image and import it into k3s
echo "Building the Docker image..."
export IMAGE_TAG=$(./deploy/bin/git-tag-name.sh)
./deploy/bin/build-push-edge-endpoint-image.sh dev
./deploy/bin/setup-ee.sh
# restore config file
mv configs/edge-config.yaml.tmp configs/edge-config.yaml

echo "Waiting for edge-endpoint pods to rollout..."

if ! kubectl rollout status deployment/edge-endpoint -n $DEPLOYMENT_NAMESPACE --timeout=5m; then
echo "Error: edge-endpoint pods failed to rollout within the timeout period."
exit 1
fi

echo "Edge-endpoint pods have successfully rolled out."

echo "Waiting for the inference deployment to rollout (inferencemodel-$DETECTOR_ID)..."

export DETECTOR_ID_WITH_DASHES=$(echo ${DETECTOR_ID//_/-} | tr '[:upper:]' '[:lower:]')
if ! kubectl rollout status deployment/inferencemodel-$DETECTOR_ID_WITH_DASHES -n $DEPLOYMENT_NAMESPACE --timeout=5m; then
echo "Error: inference deployment for detector $DETECTOR_ID_WITH_DASHES failed to rollout within the timeout period."
exit 1
fi
echo "Inference deployment for detector $DETECTOR_ID has successfully rolled out."


./test/integration/run_tests.sh

2 changes: 2 additions & 0 deletions test/validate_setup_ee.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
# basic script to validate that setup_ee works as expected
export DEPLOYMENT_NAMESPACE="validate-setup-ee"
export INFERENCE_FLAVOR="CPU"
export DEPLOY_LOCAL_VERSION="1"
export EDGE_ENDPOINT_PORT="30107"

kubectl create namespace $DEPLOYMENT_NAMESPACE
./deploy/bin/setup-ee.sh
Expand Down
Loading