-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create RFC: Handling XLA Exceptions in Python #435
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
RFC: Handling XLA Exceptions in Python | ||
|
||
Status Propose | ||
Author Vinay Kumar Burugu | ||
Collaborator Loki Ravi | ||
PR https://github.com/tensorflow/community/pull/435 | ||
|
||
Objective | ||
|
||
The aim of this RFC is to: | ||
|
||
1. Any exception/status code thrown by XLA to be uniquely named (independent of TensorFlow). | ||
2. XLA exceptions to be readily available to front-end frameworks so that they can be exposed to users. | ||
|
||
Motivation | ||
|
||
1. Though XLA was developed as a part of Tensorflow, it is also used by PyTorch, JAX, etc. There is a need for exceptions thrown by XLA to be named independent of Tensorflow. | ||
2. Today, XLA defines XlaRuntimeError in the xla client. However, this exception is not readily available to front-ends. Refer to this (https://github.com/google/jax/pull/10676) effort in JAX to expose this Exception to users. | ||
|
||
User Benefit | ||
|
||
1. *Simplified contract with front-ends*: Add a single line of code to front-ends to import XLA exceptions. For eg: from import xla.exceptions as xla_exceptions. | ||
2. *Better exception names in XLA*: Derive readable exceptions like XlaInvalidArgumentError from XlaError to replace the existing ValueError. Additionally, this would make it easier to catch/trap Exceptions from XLA and attribute errors to XLA. | ||
|
||
Design Proposal | ||
|
||
This proposal would replace existing std:: and py:: exceptions thrown by the CPP layers with new custom exceptions which will all be extended from XlaError base class. All status objects originating from XLA will be replaced by XLA specific status codes. Equivalent python exceptions will be defined under a common module xla.exceptions which will contain translations for status objects (previously TensorFlow exceptions (https://github.com/tensorflow/tensorflow/blob/87462bfac761435a46641ff2f10ad0b6e5414a4b/tensorflow/python/framework/errors_impl.py)) as well as direct throws from XLA (previously std library exceptions). | ||
|
||
This proposal introduces an XlaError base class and all exceptions are extended from this class. | ||
|
||
class XlaError : public std::Exception { | ||
public: | ||
char * what () { | ||
return "Custom XLA Exception"; | ||
} | ||
} | ||
|
||
We will create different exception classes based on the need by extending this base class. For example: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume initially "need" is defined as existing TensorFlow exceptions that are currently used? What is the intended use for this exception hierarchy? You could also achieve the goal you have defined by just having an I acknowledge that this is currently not very principled in which exception is thrown but if we make all thrown exceptions XLA specific anyway, we could unify the use now. Hence I'd like to better understand the use case of sub-exceptions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this change was propsed to covert std exceptions currently being thrown with unique XLA exceptions. If all exceptions throws XlaError, that would suffice. |
||
|
||
class XlaInvalidValueError : public XlaError { | ||
public: | ||
char * what () { | ||
return "Invalid value provided."; | ||
} | ||
} | ||
|
||
We will expose the exceptions under a common module xla.exceptions for use by front-ends. We will additionally modify TensorFlow and PyTorch front-ends to import xla exceptions as import xla.exceptions as xla_exceptions | ||
|
||
The effort can be categorized into: | ||
|
||
1. Replacing existing status objects | ||
|
||
XlaInternalError (Currently a TensorFlow status object) | ||
|
||
Current implementation[Link to code (https://github.com/tensorflow/tensorflow/blob/d1a41c236defc6dd05e6fb31dddea8e0a8b53b96/tensorflow/compiler/xla/runtime/execution_engine.cc#:~:text=return%20InternalError(%22exported%20function%20not%20found%3A%20%25s%22%2C%20function_name)%3B)]: | ||
|
||
return InternalError("exported function not found: %s", function_name); | ||
|
||
Proposed implementation: | ||
|
||
return XlaInternalError("exported function not found: %s", function_name); | ||
|
||
Python UX: | ||
|
||
import xla.exceptions as xla_exceptions | ||
try: | ||
#Some line of code | ||
except xla_exceptions.XlaInternalError: | ||
pass | ||
|
||
2. Replacing existing use of std:: and py:: exceptions in the CPP layer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using standard exceptions has the advantage that they can be treated uniformly. I understand that goal of making errors attributable to XLA but in the specific instance called out below, I am not convinced this is the right approach. So I suggest we only replace exceptions that are an exception of the XLA api in a narrow sense and keep standard exceptions when they model python behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is your take on this @vinayburugu? It seems wrong to me to replace these exceptions from a python API perspective. |
||
|
||
XlaAttributeError (Currently a py:: lib exception) | ||
|
||
Current implementation[Link to code (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/python/xla.cc#:~:text=throw%20py%3A%3Aattribute_error(absl%3A%3AStrCat(%22Unknown%20attribute%20%22%2C%20name))%3B)]: | ||
|
||
throw py::attribute_error(absl::StrCat("Unknown attribute ", name)); | ||
|
||
Proposed implementation: | ||
|
||
throw XlaAttributeError(absl::StrCat("Unknown attribute ", name)) | ||
|
||
|
||
Python UX: | ||
|
||
import xla.exceptions as xla_exceptions | ||
try: | ||
#Some line of code | ||
except xla_exceptions.XlaAttributeError: | ||
pass | ||
|
||
3. Exposing them under a common module at the Python layer | ||
|
||
We will register these exceptions in Python using pybind. | ||
|
||
PYBIND11_MODULE(xla.exceptions, m) { | ||
auto py_xlaerror = py::register_exception<XlaError>(m, "XlaError"); | ||
py::register_exception<XlaInvalidArgumentError>(m, "XlaInvalidArgumentError", py_xlaerror.ptr()); | ||
py::register_exception<XlaInternalError>(m, "XlaInternalError", py_xlaerror.ptr()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you mean by
All status objects originating from XLA will be replaced by XLA specific status codes
? Do you want to expose the literals in a separate, XLA specific python module or modify them? Do you want a single exception that covers all status codes or sub-exceptions for each?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Status Object updates the error codes from errorcodes.proto which are common to both XLA and tensorflow. I wanted to use unique codes for XLA and create corresponding python Exceptions for those error codes so that we see unique XLA exceptions corresponding to those new error codes added. All this effort is being proposed just to identify whether the root cause is framework or XLA instantly based on the Python Error / Error code etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the error codes get wrapped into an 'XlaError' exception at the XLA API boundary, why would we need different error numbers? I agree we should expose the numbers from XLA, as well, but it is not clear to me why they could not be the same numbers?