-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration tests #149
base: main
Are you sure you want to change the base?
Integration tests #149
Conversation
…ndpoint into test-with-k3s
Co-authored-by: Tom Faulhaber <[email protected]>
…nto test-with-k3s
…nto test-with-k3s
…ge-endpoint into integration-tests
…ge-endpoint into integration-tests
…ge-endpoint into integration-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one spelling fix
test/integration/dog.jpg
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good dog! 14/10!
from model import Detector | ||
|
||
NUM_IQS_TO_IMPROVE_MODEL = 10 | ||
ACCETABLE_TRAINED_CONFIDENCE = 0.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACCETABLE_TRAINED_CONFIDENCE = 0.8 | |
ACCEPTABLE_TRAINED_CONFIDENCE = 0.8 |
And wherever this value is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some various comments about small things, but overall this is great! Very excited to have this.
"--mode", | ||
type=str, | ||
choices=["create_detector", "initial", "improve_model", "final"], | ||
help="Mode of operation: 'initial', 'many', or 'final'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have "create_detector", "initial", "improve_model", and "final" as the options? Or is this saying something different?
def create_cat_detector() -> str: | ||
"""Create the intial cat detector that we use for the integration tests. We create | ||
a new one each time.""" | ||
random_number = random.randint(0, 9999) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth increasing the range here just to make it even more unlikely that we get a collision? You could also generate a ksuid like we do here to ensure there's no issues.
# a bit dependent on the current default model, | ||
# but that one always defaults to 0.5 confidence at first. | ||
assert iq_yes.result.confidence == 0.5 | ||
assert iq_no.result.confidence == 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point we're planning to make the default edge binary pipeline be our normal default binary pipeline, which does make actual zeroshot predictions (which are still close to 0.5, but not exactly 0.5). Maybe this should check that the confidence is in a slightly wider range? I'm worried we won't remember to update this when we change the default edge pipeline.
from groundlight import Groundlight, GroundlightClientError | ||
from model import Detector | ||
|
||
NUM_IQS_TO_IMPROVE_MODEL = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is slightly misleading because I think double this amount of IQs actually get submitted? Maybe it should be renamed to something likeNUM_IQS_PER_CLASS_TO_IMPROVE_MODEL
, though I don't think it's too important either way.
time_difference=$((current_time - pod_creation_time_seconds)) | ||
|
||
|
||
# Check if the pod was created within 1.1 times the refresh rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should say 3 times the refresh rate here I think.
These tests expand on the basic integration we had for k3s. They are a bit unusual as they weave python and bash together. We test that we can setup the inference pods, submit to edge and get a low confidence answer at first, submit to edge and escalate to cloud to train the model, pull the updated model back down to the cloud into a new inference pod, and now make a confident edge prediction.