Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests #149

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open

Integration tests #149

wants to merge 42 commits into from

Conversation

roxanne-o
Copy link
Contributor

@roxanne-o roxanne-o commented Dec 9, 2024

These tests expand on the basic integration we had for k3s. They are a bit unusual as they weave python and bash together. We test that we can setup the inference pods, submit to edge and get a low confidence answer at first, submit to edge and escalate to cloud to train the model, pull the updated model back down to the cloud into a new inference pod, and now make a confident edge prediction.

Copy link
Contributor

@tomfaulhaber tomfaulhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one spelling fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good dog! 14/10!

from model import Detector

NUM_IQS_TO_IMPROVE_MODEL = 10
ACCETABLE_TRAINED_CONFIDENCE = 0.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ACCETABLE_TRAINED_CONFIDENCE = 0.8
ACCEPTABLE_TRAINED_CONFIDENCE = 0.8

And wherever this value is used

Copy link
Contributor

@CoreyEWood CoreyEWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some various comments about small things, but overall this is great! Very excited to have this.

"--mode",
type=str,
choices=["create_detector", "initial", "improve_model", "final"],
help="Mode of operation: 'initial', 'many', or 'final'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have "create_detector", "initial", "improve_model", and "final" as the options? Or is this saying something different?

def create_cat_detector() -> str:
"""Create the intial cat detector that we use for the integration tests. We create
a new one each time."""
random_number = random.randint(0, 9999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth increasing the range here just to make it even more unlikely that we get a collision? You could also generate a ksuid like we do here to ensure there's no issues.

Comment on lines +86 to +89
# a bit dependent on the current default model,
# but that one always defaults to 0.5 confidence at first.
assert iq_yes.result.confidence == 0.5
assert iq_no.result.confidence == 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we're planning to make the default edge binary pipeline be our normal default binary pipeline, which does make actual zeroshot predictions (which are still close to 0.5, but not exactly 0.5). Maybe this should check that the confidence is in a slightly wider range? I'm worried we won't remember to update this when we change the default edge pipeline.

from groundlight import Groundlight, GroundlightClientError
from model import Detector

NUM_IQS_TO_IMPROVE_MODEL = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is slightly misleading because I think double this amount of IQs actually get submitted? Maybe it should be renamed to something likeNUM_IQS_PER_CLASS_TO_IMPROVE_MODEL, though I don't think it's too important either way.

time_difference=$((current_time - pod_creation_time_seconds))


# Check if the pod was created within 1.1 times the refresh rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should say 3 times the refresh rate here I think.

Base automatically changed from test-with-k3s to main December 17, 2024 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants