Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create RFC: Handling XLA Exceptions in Python #435

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions rfcs/RFC: Handling XLA Exceptions in Python
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
RFC: Handling XLA Exceptions in Python

Status Propose
Author Vinay Kumar Burugu
Collaborator Loki Ravi
PR https://github.com/tensorflow/community/pull/435

Objective

The aim of this RFC is to:

1. Any exception/status code thrown by XLA to be uniquely named (independent of TensorFlow).
2. XLA exceptions to be readily available to front-end frameworks so that they can be exposed to users.

Motivation

1. Though XLA was developed as a part of Tensorflow, it is also used by PyTorch, JAX, etc. There is a need for exceptions thrown by XLA to be named independent of Tensorflow.
2. Today, XLA defines XlaRuntimeError in the xla client. However, this exception is not readily available to front-ends. Refer to this (https://github.com/google/jax/pull/10676) effort in JAX to expose this Exception to users.

User Benefit

1. *Simplified contract with front-ends*: Add a single line of code to front-ends to import XLA exceptions. For eg: from import xla.exceptions as xla_exceptions.
2. *Better exception names in XLA*: Derive readable exceptions like XlaInvalidArgumentError from XlaError to replace the existing ValueError. Additionally, this would make it easier to catch/trap Exceptions from XLA and attribute errors to XLA.

Design Proposal

This proposal would replace existing std:: and py:: exceptions thrown by the CPP layers with new custom exceptions which will all be extended from XlaError base class. All status objects originating from XLA will be replaced by XLA specific status codes. Equivalent python exceptions will be defined under a common module xla.exceptions which will contain translations for status objects (previously TensorFlow exceptions (https://github.com/tensorflow/tensorflow/blob/87462bfac761435a46641ff2f10ad0b6e5414a4b/tensorflow/python/framework/errors_impl.py)) as well as direct throws from XLA (previously std library exceptions).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean by All status objects originating from XLA will be replaced by XLA specific status codes? Do you want to expose the literals in a separate, XLA specific python module or modify them? Do you want a single exception that covers all status codes or sub-exceptions for each?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Status Object updates the error codes from errorcodes.proto which are common to both XLA and tensorflow. I wanted to use unique codes for XLA and create corresponding python Exceptions for those error codes so that we see unique XLA exceptions corresponding to those new error codes added. All this effort is being proposed just to identify whether the root cause is framework or XLA instantly based on the Python Error / Error code etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error codes get wrapped into an 'XlaError' exception at the XLA API boundary, why would we need different error numbers? I agree we should expose the numbers from XLA, as well, but it is not clear to me why they could not be the same numbers?


This proposal introduces an XlaError base class and all exceptions are extended from this class.

class XlaError : public std::Exception {
public:
char * what () {
return "Custom XLA Exception";
}
}

We will create different exception classes based on the need by extending this base class. For example:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume initially "need" is defined as existing TensorFlow exceptions that are currently used? What is the intended use for this exception hierarchy? You could also achieve the goal you have defined by just having an XlaError as the only exception that is thrown at the interface boundary to XLA. That exception could carry a status code, message and stack trace to aid debugging.

I acknowledge that this is currently not very principled in which exception is thrown but if we make all thrown exceptions XLA specific anyway, we could unify the use now. Hence I'd like to better understand the use case of sub-exceptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this change was propsed to covert std exceptions currently being thrown with unique XLA exceptions. If all exceptions throws XlaError, that would suffice.


class XlaInvalidValueError : public XlaError {
public:
char * what () {
return "Invalid value provided.";
}
}

We will expose the exceptions under a common module xla.exceptions for use by front-ends. We will additionally modify TensorFlow and PyTorch front-ends to import xla exceptions as import xla.exceptions as xla_exceptions

The effort can be categorized into:

1. Replacing existing status objects

XlaInternalError (Currently a TensorFlow status object)

Current implementation[Link to code (https://github.com/tensorflow/tensorflow/blob/d1a41c236defc6dd05e6fb31dddea8e0a8b53b96/tensorflow/compiler/xla/runtime/execution_engine.cc#:~:text=return%20InternalError(%22exported%20function%20not%20found%3A%20%25s%22%2C%20function_name)%3B)]:

return InternalError("exported function not found: %s", function_name);

Proposed implementation:

return XlaInternalError("exported function not found: %s", function_name);

Python UX:

import xla.exceptions as xla_exceptions
try:
#Some line of code
except xla_exceptions.XlaInternalError:
pass

2. Replacing existing use of std:: and py:: exceptions in the CPP layer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using standard exceptions has the advantage that they can be treated uniformly. I understand that goal of making errors attributable to XLA but in the specific instance called out below, I am not convinced this is the right approach.
The __getattr__ method is a standard python method defined on an object that, while returned from an XLA API, is potentially used in contexts independent of the XLA api. The contract for that method is to throw the standard exception so that the object behaves like one would expect of a python object.

So I suggest we only replace exceptions that are an exception of the XLA api in a narrow sense and keep standard exceptions when they model python behavior.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your take on this @vinayburugu? It seems wrong to me to replace these exceptions from a python API perspective.


XlaAttributeError (Currently a py:: lib exception)

Current implementation[Link to code (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/python/xla.cc#:~:text=throw%20py%3A%3Aattribute_error(absl%3A%3AStrCat(%22Unknown%20attribute%20%22%2C%20name))%3B)]:

throw py::attribute_error(absl::StrCat("Unknown attribute ", name));

Proposed implementation:

throw XlaAttributeError(absl::StrCat("Unknown attribute ", name))


Python UX:

import xla.exceptions as xla_exceptions
try:
#Some line of code
except xla_exceptions.XlaAttributeError:
pass

3. Exposing them under a common module at the Python layer

We will register these exceptions in Python using pybind.

PYBIND11_MODULE(xla.exceptions, m) {
auto py_xlaerror = py::register_exception<XlaError>(m, "XlaError");
py::register_exception<XlaInvalidArgumentError>(m, "XlaInvalidArgumentError", py_xlaerror.ptr());
py::register_exception<XlaInternalError>(m, "XlaInternalError", py_xlaerror.ptr());
}