Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit confident edge predictions #156

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CONFIGURING-DETECTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ The global config contains parameters that affect the overall behavior of the ed

`refresh_rate` is a float that defines how often the edge endpoint will attempt to fetch updated ML models (in seconds). If you expect a detector to frequently have a better model available, you can reduce this to ensure that the improved models will quickly be fetched and deployed. For example, you may want to label many image queries on a new detector. A higher refresh rate will ensure that the latest model improvements from these labels are promptly deployed to the edge. In practice, you likely won't want this to be lower than ~30 seconds due to the time it takes to train and fetch new models. If not specified, the default is 60 seconds.

#### `confident_audit_rate`

`confident_audit_rate` is a float that defines the probability that any given confident prediction will be escalated to the cloud for auditing. This enables the accuracy of the edge model to be evaluated in the cloud even when it answers queries confidently. If a detector is configured to have cloud escalation disabled, this parameter will be ignored. If not specified, the default value is 0.001 (meaning there is a 0.1% chance that a confident prediction will be audited).

### `edge_inference_configs`

Edge inference configs are 'templates' that define the behavior of a detector on the edge. Each detector you configure will be assigned one of these templates. There are some predefined configs that represent the main ways you might want to configure a detector. However, you can edit these and also create your own as you wish.
Expand Down
35 changes: 30 additions & 5 deletions app/api/routes/image_queries.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import logging
import random
from typing import Literal, Optional

from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, Query, Request, status
Expand Down Expand Up @@ -145,7 +146,7 @@ async def post_image_query( # noqa: PLR0913, PLR0915, PLR0912
ml_confidence = results["confidence"]

is_confident_enough = ml_confidence >= confidence_threshold
if return_edge_prediction or is_confident_enough: # return the edge prediction
if return_edge_prediction or is_confident_enough: # Return the edge prediction
if return_edge_prediction:
logger.debug(f"Returning edge prediction without cloud escalation. {detector_id=}")
else:
Expand All @@ -164,9 +165,34 @@ async def post_image_query( # noqa: PLR0913, PLR0915, PLR0912
text=results["text"],
)

# Escalate after returning edge prediction if escalation is enabled and we have low confidence
if not disable_cloud_escalation and not is_confident_enough:
# Only escalate if we haven't escalated on this detector too recently
# Skip cloud operations if escalation is disabled
if disable_cloud_escalation:
return image_query

if is_confident_enough: # Audit confident edge predictions at the specified rate
if random.random() < app_state.edge_config.global_config.confident_audit_rate:
logger.debug(
f"Auditing confident edge prediction with confidence {ml_confidence} for detector {detector_id=}."
)
background_tasks.add_task(
safe_call_sdk,
gl.submit_image_query,
detector=detector_id,
image=image_bytes,
wait=0,
patience_time=patience_time,
confidence_threshold=confidence_threshold,
human_review="ALWAYS", # Require human review for audited queries so we can evaluate accuracy
want_async=True,
metadata={"is_edge_audit": True}, # Provide metadata to identify edge audits in the cloud

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the cloud audits get a lower review priority than escalations. How if at all is that handled here? When combined with the audit rate / sheer number of audited images, this is a potential area of concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we do want to treat edge audits in the same way as cloud audits, which will likely take some backend work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current code, the 'audit' escalations to the cloud will be treated as normal escalations, so they won't have the reduced priority that cloud audits do. I tried to find a way for these escalations to be counted as audits, but I'm pretty sure there's no way to do it without making backend changes (and potentially SDK changes as well). Given that, I thought it could be worth introducing edge audits via this simple approach first, but we could go with the more complicated approach instead if that seems worthwhile.

)

# Don't want to escalate to cloud again if we're already auditing the query
return image_query

# Escalate after returning edge prediction if escalation is enabled and we have low confidence.
if not is_confident_enough:
# Only escalate if we haven't escalated on this detector too recently.
if app_state.edge_inference_manager.escalation_cooldown_complete(detector_id=detector_id):
logger.debug(
f"Escalating to cloud due to low confidence: {ml_confidence} < thresh={confidence_threshold}"
Expand All @@ -189,7 +215,6 @@ async def post_image_query( # noqa: PLR0913, PLR0915, PLR0912
)

return image_query

# -- Edge-inference is not available --
else:
# Create an edge-inference deployment record, which may be used to spin up an edge-inference server.
Expand Down
5 changes: 5 additions & 0 deletions app/core/configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ class GlobalConfig(BaseModel):
default=60.0,
description="The interval (in seconds) at which the inference server checks for a new model binary update.",
)
confident_audit_rate: float = Field(
default=0.001,
description="The probability that any given confident prediction will be sent to the cloud for auditing.",
)


class EdgeInferenceConfig(BaseModel):
Expand Down Expand Up @@ -84,6 +88,7 @@ def validate_inference_configs(self):
{
'global_config': {
'refresh_rate': 60.0,
'confident_audit_rate': 0.001,
},
'edge_inference_configs': {
'default': EdgeInferenceConfig(
Expand Down
3 changes: 2 additions & 1 deletion configs/edge-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# For configuring detectors on the edge endpoint. See CONFIGURING-DETECTORS.md for more information.

global_config: # These settings affect the overall behavior of the edge endpoint.
refresh_rate: 60 # How often to attempt to fetch updated ML models (in seconds). If not set, defaults to 60.
refresh_rate: 60 # How often to attempt to fetch updated ML models (in seconds). Defaults to 60.
confident_audit_rate: 0.001 # Probability that a confident prediction will be sent to cloud for auditing. Defaults to 0.001.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what's the standard FPS rate on an edge detector (binary)? If it's 5, then that's a human-required escalation for audit every 3.5 minutes or so, I think. Seems like too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a good point! It's hard to say what the standard FPS rate is since there's such a wide range of use cases, which makes it hard to pick a rate that works for all cases. Something approximating video processing (or a setup with multiple cameras going to a single detector) could definitely be 5 FPS or more. For high FPS use cases the rate could be configured to be lower (though I don't think we could expect a customer to do that without direct guidance from us). A concern with lowering it too much would be that it becomes much less useful for lower FPS cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could instead audit on a timer, so our audit frequency is independent of FPS, but thats no longer random sampling. What do you think Paulina?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine

Copy link
Contributor Author

@CoreyEWood CoreyEWood Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the timer would add some additional complexity with our multiple worker setup because each worker would have their own timer unless we make the timer external to the processes somehow - which is possible, just maybe more complicated than we'd want.


edge_inference_configs: # These configs define detector-specific behavior and can be applied to detectors below.
default: # Return the edge model's prediction if sufficiently confident; otherwise, escalate to the cloud.
Expand Down
Loading