-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erikk/ban 375 inference time to potassium #33
Erikk/ban 375 inference time to potassium #33
Conversation
potassium/potassium.py
Outdated
@@ -219,14 +225,22 @@ def handle(path): | |||
|
|||
@flask_app.route('/__status__', methods=["GET"]) | |||
def status(): | |||
idle_time = 0 | |||
inference_time = int((time.time() - self._inference_start_time)*1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't inference_time be 0 if there's no inference currently running? Right not if i'm reading this correct it will pretty much always be > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth noting that there's a minor race condition to watch out for when implementing this as well since there will be a period where either inference start time hasn't been updated or the lock hasn't been acquired yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, depends if we want to calculate the cold boot into the inference time. Here I opted to include it with the reasoning that if someone (for some weird reason) has a cold boot which is very long or is in an infinite loop. The timeout would still kick in and kill the replica. So from a user-point-of-view. "I set this value in the UI to 15min" => "not possible to have a replica running longer than 15min and thus not getting billed for more than 15min".
Does this sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offline discussion: let's move this to the be only around the inference itself and not the cold boot.
potassium/potassium.py
Outdated
@@ -162,6 +166,8 @@ def _handle_generic(self, endpoint, flask_request): | |||
res = make_response(tb_str) | |||
res.status_code = 500 | |||
res.headers['X-Endpoint-Type'] = endpoint.type | |||
self._idle_start_time = time.time() | |||
self._inference_start_time = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're essentially using _inference_start_time=0 as a sentinel value here which i don't think is ideal (setting start time to be 0 ms since epoch doesn't really make semantic sense).
We may want to consider renaming this value to _last_inference_start_time
so we don't feel the need to update it when releasing the lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, very good point. Latest change makes:
- the var is called
_last_inference_start_time
- sets it to none if it's not initialised
- still sends a 0 in the json response if
_last_inference_start_time==None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - there's a minor edge case where status can report 0 inference time as it's exiting an inference job but for our use case i think it's good enough - solution would probably be adding a secondary mutex which is probably overkill
🐑
inference_time = 0 | ||
gpu_available = not self._gpu_lock.locked() | ||
|
||
if self._last_inference_start_time != None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT prefer is not None in python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, missed this before merging. I'll add that in another PR and merge it so we get it as well 👍
* Add pyright + __status__ endpoint (#29) * wip * add pyright * clean up * simplify pyright * optimize build * fixes per code review --------- Co-authored-by: Aaron Peddle <[email protected]> * add basic tests (#30) Co-authored-by: Aaron Peddle <[email protected]> * Feature/track gpu status (#31) * track gpu status * fix potential deadlock with failed background task * add sequence number * update _read_event_chan to match previous API * fixes per code review --------- Co-authored-by: Aaron Peddle <[email protected]> * Erikk/ban 375 inference time to potassium (#33) * add idle time status * inference timeout * need to thread * improve tests * inference time not including cold boot time * small change * changed var name and set to None (instead of zero) if not set --------- Co-authored-by: Aaron Peddle <[email protected]> * version bump --------- Co-authored-by: Aaron Peddle <[email protected]> Co-authored-by: Aaron Peddle <[email protected]> Co-authored-by: Erik Kaunismäki <[email protected]>
* add idle time status * inference timeout * need to thread * improve tests * inference time not including cold boot time * small change * changed var name and set to None (instead of zero) if not set --------- Co-authored-by: Aaron Peddle <[email protected]>
What is this?
Why?
Part of https://linear.app/banana-dev/issue/BAN-358/add-timeout-on-replicas-that-stay-unavailable-above-threshold
How did you test it works without regressions?
If this is a new feature what may a critical error look like?
Things to consider to not repeat mistakes we've learned from many times