-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration tests #149
base: main
Are you sure you want to change the base?
Integration tests #149
Changes from all commits
25d1aaa
8766327
a9ac588
7bcf454
9e4312f
0fa2948
c8d869b
7f0f8e0
20d4fda
9a66be7
a7711db
b2e7131
40b13d6
bf624b1
f3afd07
3da9411
9364ee8
46a72dc
35a7e22
25b90d1
c054d04
7f73916
671f0a2
e51557b
183ecd8
767d88f
ee07cca
21d66f3
535fab1
b3958aa
501fb98
75e26b9
f621047
7010918
ca40e11
e6d8ad8
b63eedf
142d673
6175cc4
ffac9e5
8d0280e
471a60f
4e403bd
7554c74
b2de82d
b6063f9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,144 @@ | ||||||
## The integration tests consists of a back and forth between python (which we use to create and validate | ||||||
## image queries) and bash (which we use to check deployments are properly rolled out) | ||||||
## This file contains all the modes that we use for integeration testing. | ||||||
## Modes: | ||||||
## - Create the integration test detector | ||||||
## - Submit the initial dog/cat image query to the edge, expect low confidence | ||||||
## - Train the edge model by submitting image queries to the cloud. | ||||||
## - Submit the final dog/cat image query to the edge, expect high confidence | ||||||
|
||||||
import argparse | ||||||
import random | ||||||
import time | ||||||
|
||||||
from groundlight import Groundlight, GroundlightClientError | ||||||
from model import Detector | ||||||
|
||||||
NUM_IQS_TO_IMPROVE_MODEL = 10 | ||||||
ACCETABLE_TRAINED_CONFIDENCE = 0.8 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
And wherever this value is used |
||||||
|
||||||
|
||||||
def get_groundlight(): | ||||||
try: | ||||||
return Groundlight(endpoint="http://localhost:30107") | ||||||
except GroundlightClientError: | ||||||
# we use this to create a detector since we do that before setting up edge | ||||||
# although maybe we want to be more careful here about making sure that's | ||||||
# the case we're in | ||||||
return Groundlight() | ||||||
|
||||||
|
||||||
gl = get_groundlight() | ||||||
|
||||||
|
||||||
def main(): | ||||||
parser = argparse.ArgumentParser( | ||||||
description="Submit a dog and cat image to k3s Groundlight edge-endpoint for integration tests" | ||||||
) | ||||||
parser.add_argument( | ||||||
"-m", | ||||||
"--mode", | ||||||
type=str, | ||||||
choices=["create_detector", "initial", "improve_model", "final"], | ||||||
help="Mode of operation: 'initial', 'many', or 'final'", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this have "create_detector", "initial", "improve_model", and "final" as the options? Or is this saying something different? |
||||||
required=True, | ||||||
) | ||||||
parser.add_argument("-d", "--detector_id", type=str, help="id of detector to use", required=False) | ||||||
args = parser.parse_args() | ||||||
|
||||||
detector = None | ||||||
if args.detector_id: | ||||||
detector = gl.get_detector(args.detector_id) | ||||||
|
||||||
if detector is None and args.mode != "create_detector": | ||||||
raise ValueError("You must provide detector id unless mode is create detector") | ||||||
|
||||||
if args.mode == "create_detector": | ||||||
detector_id = create_cat_detector() | ||||||
print(detector_id) # print so that the shell script can save the value | ||||||
elif args.mode == "initial": | ||||||
submit_initial(detector) | ||||||
elif args.mode == "improve_model": | ||||||
improve_model(detector) | ||||||
elif args.mode == "final": | ||||||
submit_final(detector) | ||||||
|
||||||
|
||||||
def create_cat_detector() -> str: | ||||||
"""Create the intial cat detector that we use for the integration tests. We create | ||||||
a new one each time.""" | ||||||
random_number = random.randint(0, 9999) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it worth increasing the range here just to make it even more unlikely that we get a collision? You could also generate a ksuid like we do here to ensure there's no issues. |
||||||
detector = gl.create_detector(name=f"cat_{random_number}", query="Is this a cat?") | ||||||
detector_id = detector.id | ||||||
return detector_id | ||||||
|
||||||
|
||||||
def submit_initial(detector: Detector) -> str: | ||||||
"""Submit the initial dog and cat image to the edge. Since this method is called at the beginning | ||||||
of integration tests, we expect a low confidence from the default edge model""" | ||||||
start_time = time.time() | ||||||
# 0.5 threshold to ensure we get a edge answer | ||||||
iq_yes = _submit_cat(detector, confidence_threshold=0.5) | ||||||
iq_no = _submit_dog(detector, confidence_threshold=0.5) | ||||||
end_time = time.time() | ||||||
print(f"Time taken to get low confidence response from edge: {end_time - start_time} seconds") | ||||||
|
||||||
# a bit dependent on the current default model, | ||||||
# but that one always defaults to 0.5 confidence at first. | ||||||
assert iq_yes.result.confidence == 0.5 | ||||||
assert iq_no.result.confidence == 0.5 | ||||||
Comment on lines
+86
to
+89
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At some point we're planning to make the default edge binary pipeline be our normal default binary pipeline, which does make actual zeroshot predictions (which are still close to 0.5, but not exactly 0.5). Maybe this should check that the confidence is in a slightly wider range? I'm worried we won't remember to update this when we change the default edge pipeline. |
||||||
|
||||||
|
||||||
def improve_model(detector: Detector): | ||||||
"""Improve the edge model by escalating to the cloud.""" | ||||||
for _ in range(NUM_IQS_TO_IMPROVE_MODEL): | ||||||
# there's a subtle tradeoff here. | ||||||
# we're submitting images from the edge which will get escalated to the cloud | ||||||
# and thus train our model. but this process is slow | ||||||
iq_yes = _submit_cat(detector, confidence_threshold=1, wait=0) | ||||||
gl.add_label(image_query=iq_yes, label="YES") | ||||||
iq_no = _submit_dog(detector, confidence_threshold=1, wait=0) | ||||||
gl.add_label(image_query=iq_no, label="NO") | ||||||
|
||||||
|
||||||
def submit_final(detector: Detector): | ||||||
"""This is called at the end of our integration tests to make sure the edge model | ||||||
is now confident.""" | ||||||
# 0.5 threshold to ensure we get a edge answer | ||||||
start_time = time.time() | ||||||
iq_yes = _submit_cat(detector, confidence_threshold=0.5) | ||||||
iq_no = _submit_dog(detector, confidence_threshold=0.5) | ||||||
end_time = time.time() | ||||||
print(f"Time taken to get high confidence response from edge: {end_time - start_time} seconds") | ||||||
|
||||||
assert iq_yes.result.confidence > ACCETABLE_TRAINED_CONFIDENCE | ||||||
assert iq_yes.result.label.value == "YES" | ||||||
print(f"Final confidence for yes result: {iq_yes.result.confidence}") | ||||||
|
||||||
assert iq_no.result.confidence > ACCETABLE_TRAINED_CONFIDENCE | ||||||
assert iq_no.result.label.value == "NO" | ||||||
print(f"Final confidence for no result: {iq_no.result.confidence}") | ||||||
|
||||||
|
||||||
def _submit_cat(detector: Detector, confidence_threshold: float, wait: int = None): | ||||||
return _submit_dog_or_cat( | ||||||
detector=detector, confidence_threshold=confidence_threshold, img_file="./test/integration/cat.jpg", wait=wait | ||||||
) | ||||||
|
||||||
|
||||||
def _submit_dog(detector: Detector, confidence_threshold: float, wait: int = None): | ||||||
return _submit_dog_or_cat( | ||||||
detector=detector, confidence_threshold=confidence_threshold, img_file="./test/integration/dog.jpg", wait=wait | ||||||
) | ||||||
|
||||||
|
||||||
def _submit_dog_or_cat(detector: Detector, confidence_threshold: float, img_file: str, wait: int = None): | ||||||
image_query = gl.submit_image_query( | ||||||
detector=detector, confidence_threshold=confidence_threshold, image=img_file, wait=wait | ||||||
) | ||||||
|
||||||
return image_query | ||||||
|
||||||
|
||||||
if __name__ == "__main__": | ||||||
main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# This script runs integration tests, assuming k3s and detector setup via setup_and_run_tests.sh. | ||
# Run all tests with: > make test-with-k3s | ||
# It combines Python (for image submission) and Bash (for k3s checks). | ||
# The test includes: | ||
# 1) Running pytest live tests for health, readiness, and image submission to the edge. | ||
# 2) Submitting an image to the edge using a cat/dog detector, | ||
# checking for low confidence, training the edge detector via cloud escalation, and | ||
# verifying model improvement in a new edge pod. | ||
|
||
# first do basic pytest integration style tests | ||
# we skip the async test because we're setup for edge answers | ||
if ! poetry run pytest -m live -k "not test_post_image_query_via_sdk_want_async"; then | ||
echo "Error: pytest integration tests failed." | ||
exit 1 | ||
fi | ||
|
||
echo "Submitting initial iqs, ensuring we get low confidence at first" | ||
# submit initial tests that we get low confidence answers at first | ||
poetry run python test/integration/integration_test.py -m initial -d $DETECTOR_ID | ||
|
||
echo "Training detector in the cloud" | ||
# now we improve the model by submitting many iqs and labels | ||
poetry run python test/integration/integration_test.py -m improve_model -d $DETECTOR_ID | ||
|
||
# Give the new model time to be pulled. We're a bit generous here. | ||
echo "Now we sleep for $((3 * REFRESH_RATE)) seconds to get a newer model" | ||
sleep $((3 * REFRESH_RATE)) | ||
echo "Ensuring a new pod for the deployment $DETECTOR_ID_WITH_DASHES has been created in the last $REFRESH_RATE seconds..." | ||
|
||
# Ensure our most recent pod is brand new. | ||
most_recent_pod=$(kubectl get pods -n $DEPLOYMENT_NAMESPACE -l app=inference-server -o jsonpath='{.items[-1].metadata.name}') | ||
current_time=$(date +%s) | ||
pod_creation_time=$(kubectl get pod $most_recent_pod -n $DEPLOYMENT_NAMESPACE -o jsonpath='{.metadata.creationTimestamp}') | ||
pod_creation_time_seconds=$(date -d "$pod_creation_time" +%s) | ||
time_difference=$((current_time - pod_creation_time_seconds)) | ||
|
||
|
||
# Check if the pod was created within 1.1 times the refresh rate | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should say 3 times the refresh rate here I think. |
||
if [ $(echo "$time_difference <= $REFRESH_RATE * 3" | bc) -eq 1 ]; then | ||
echo "A new pod for the deployment $DETECTOR_ID_WITH_DASHES has been created within 3 times the refresh rate." | ||
else | ||
echo "Error: No new pod for the deployment $DETECTOR_ID_WITH_DASHES has been created within 3 times the refresh rate." | ||
exit 1 | ||
fi | ||
|
||
echo now we check if the edge model performs well... | ||
poetry run python test/integration/integration_test.py -m final -d $DETECTOR_ID | ||
echo All tests pass :D |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
#!/bin/bash | ||
|
||
# This script will setup the k3s testing environment. Once you've run them you can run the | ||
# live tests, which will hit the API service that got setup | ||
# Altogether, you can run everything with: | ||
# > make test-with-k3s | ||
|
||
if [ -z "$GROUNDLIGHT_API_TOKEN" ]; then | ||
echo "Error: GROUNDLIGHT_API_TOKEN environment variable is not set." | ||
exit 1 | ||
fi | ||
|
||
if ! command -v k3s &> /dev/null | ||
then | ||
echo "Error: you must have k3s setup" | ||
exit 1 | ||
|
||
fi | ||
|
||
# First create a detector to use for testing: | ||
export DETECTOR_ID=$(poetry run python test/integration/integration_test.py --mode create_detector) | ||
echo "created detector with id: $DETECTOR_ID" | ||
|
||
# set some other environment variables | ||
export PERSISTENT_VOLUME_NAME="test-with-k3s-pv" | ||
export EDGE_ENDPOINT_PORT="30107" | ||
export INFERENCE_FLAVOR="CPU" | ||
export LIVE_TEST_ENDPOINT="http://localhost:$EDGE_ENDPOINT_PORT" | ||
export REFRESH_RATE=60 # not actually different than the default, but we may want to tweak this | ||
|
||
# update the config for this detector, such that we always take edge answers | ||
# but first, save the template to a temporary file | ||
cp configs/edge-config.yaml configs/edge-config.yaml.tmp | ||
sed -i "s/detector_id: \"\"/detector_id: \"$DETECTOR_ID\"/" configs/edge-config.yaml | ||
sed -i "s/refresh_rate: 60/refresh_rate: $REFRESH_RATE/" configs/edge-config.yaml | ||
|
||
# # now we should delete the persistent volume before, in case it's in a bad state | ||
if kubectl get pv "$PERSISTENT_VOLUME_NAME" &> /dev/null; then | ||
echo "Persistent volume $PERSISTENT_VOLUME_NAME exists. Deleting it..." | ||
kubectl delete pv "$PERSISTENT_VOLUME_NAME" & | ||
echo "Persistent volume $PERSISTENT_VOLUME_NAME deleted." | ||
else | ||
echo "Persistent volume $PERSISTENT_VOLUME_NAME does not exist. No action needed." | ||
fi | ||
|
||
|
||
export DEPLOYMENT_NAMESPACE="test-with-k3s" | ||
if ! kubectl get namespace $DEPLOYMENT_NAMESPACE &> /dev/null; then | ||
kubectl create namespace $DEPLOYMENT_NAMESPACE | ||
fi | ||
|
||
|
||
# Build the Docker image and import it into k3s | ||
echo "Building the Docker image..." | ||
export IMAGE_TAG=$(./deploy/bin/git-tag-name.sh) | ||
./deploy/bin/build-push-edge-endpoint-image.sh dev | ||
./deploy/bin/setup-ee.sh | ||
# restore config file | ||
mv configs/edge-config.yaml.tmp configs/edge-config.yaml | ||
|
||
echo "Waiting for edge-endpoint pods to rollout..." | ||
|
||
if ! kubectl rollout status deployment/edge-endpoint -n $DEPLOYMENT_NAMESPACE --timeout=5m; then | ||
echo "Error: edge-endpoint pods failed to rollout within the timeout period." | ||
exit 1 | ||
fi | ||
|
||
echo "Edge-endpoint pods have successfully rolled out." | ||
|
||
echo "Waiting for the inference deployment to rollout (inferencemodel-$DETECTOR_ID)..." | ||
|
||
export DETECTOR_ID_WITH_DASHES=$(echo ${DETECTOR_ID//_/-} | tr '[:upper:]' '[:lower:]') | ||
if ! kubectl rollout status deployment/inferencemodel-$DETECTOR_ID_WITH_DASHES -n $DEPLOYMENT_NAMESPACE --timeout=5m; then | ||
echo "Error: inference deployment for detector $DETECTOR_ID_WITH_DASHES failed to rollout within the timeout period." | ||
exit 1 | ||
fi | ||
echo "Inference deployment for detector $DETECTOR_ID has successfully rolled out." | ||
|
||
|
||
./test/integration/run_tests.sh | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is slightly misleading because I think double this amount of IQs actually get submitted? Maybe it should be renamed to something like
NUM_IQS_PER_CLASS_TO_IMPROVE_MODEL
, though I don't think it's too important either way.