-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error_already_set::what() is now constructed lazily #1895
Conversation
Prior to this commit throwing error_already_set was expensive due to the eager construction of the error string (which required traversing the Python stack). See pybind#1853 for more context and an alternative take on the issue. Note that error_already_set no longer inherits from std::runtime_error because the latter has no default constructor.
This looks like a really nice idea! For posterity, can you discuss a little bit what your use case was? (Presumably code that eagerly throws lots of exceptions, etc.) What kind of speedup does this patch yield? I'll let you look into the TODO that is mentioned in your code, that seems potentially a dangerous aspect. Note also that your patch doesn't seem compatible with PyPy in its current iteration. |
Note that I also like the solution in #1853. Are you suggesting that only one of those two PRs is merged? |
This is not supported on PyPy-2.7 5.8.0.
Why not both? :) This one could provide a nice "do the same, but faster" solution that doesn't require much effort from the user (only switching to a combination of The remaining unsolved problem with #1853 is that functions have to either return a holder type holding I tried this PR and here's the continuation of benchmarks from #1853 -- first, throwing the concrete exception (original numbers from that PR):
Then, using
With this PR:
And with #1853, for comparison (original numbers from that PR):
As I see it, this PR introduces a measurable speed improvement, but it's rather insignificant compared to the impact of #1853 ... not sure what's to blame, I guess the overhead of C++'s |
Just to make the intention here clear, the use-case I'm optimizing for is handling CPython API exceptions in C++, e.g. try {
v = dict["foo"];
catch (const py::error_already_set&) {
v = 42;
} which was unnecessarily expensive prior to this PR due to the Unlike #1853 this PR does not aim optimize the translation of a C++ exception to a Python one. In fact, any speedup there is either noise or accidental. |
Isn't it a breaking change? |
I did a quick flamegraph of a variation of the above example at this PR, and the conclusion is that the time of namespace test {
void foo() {
py::dict d{};
py::int_ v;
while (true) {
try { v = py::cast<py::int_>(d["foo"]); }
catch (const py::error_already_set&) { v = py::int_(42); }
}
}
} This makes me wonder if there is a possible design change we could do to get rid of exceptions entirely (see #1703) as they lead to inefficient code (and nobody wants their C++ code to go slow, especially due to exceptions)? The simplest (and the most invasive) thing which comes to mind is to move the exception to the return type in the spirit of |
@sizmailov yes, that change is breaking. I am actually not sure if it's possible to lazy-init the error and keep |
@superbobry Lazyness of virtual const char* what() const noexcept {
try{
if (m_lazy_what.empty()) {
if (m_type || m_value || m_trace)
PyErr_NormalizeException(&m_type.ptr(), &m_value.ptr(), &m_trace.ptr());
m_lazy_what = detail::error_string(m_type.ptr(), m_value.ptr(), m_trace.ptr());
}
return m_lazy_what.c_str();
} catch(...){
return "Unknown internal error occurred during exception handling";
}
} |
This is faster than dynamically looking up __name__ via GetAttrString. Note though that the runtime of the code throwing an error_already_set will be dominated by stack unwinding so the improvement will not be noticeable. Before: 396 ns ± 0.913 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) After: 277 ns ± 0.549 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) Benchmark: const std::string foo() { PyErr_SetString(PyExc_KeyError, ""); const std::string &s = py::detail::error_string(); PyErr_Clear(); return s; } PYBIND11_MODULE(foo, m) { m.def("foo", &::foo); }
Thanks @sizmailov. I've updated the PR with your suggestions. |
The implementation of __name__ is slightly more complex than that. It handles the module name prefix, and heap-allocated types. We could port it to pybind11 later on but for now it seems like an overkill. This reverts commit f1435c7.
Nice! Good idea, @superbobry! One thing I've noticed: since you're now calling Answering your TODO:
I'd argue it is, if you write a line of documentation for the function? And just for the fun of stringing C++ keyword, you can still append Apart from that, is there no way we can nót initialize/store the string in the |
Thanks for the review @YannickJadoul! Looking at the implementation of I've also tried passing a reference to |
Note that detail::error_string() no longer calls PyException_SetTraceback as it is unncessary for pretty-printing the exception.
Hmmm, indeed; more expensive than I expected, probably/possibly. However, I still don't like the idea that Is there a way to also do this lazily (or maybe evaluate
Myeah, I don't know if this is possible. I just noticed that there are now two strings inside of EDIT:
But if we're profiling anyway, how can we actually trigger this function to do its work and measure the overhead? |
On my machine the cost of calling >>> %timeit bar.noop(42)
111 ns ± 0.215 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit bar.normalize(42)
205 ns ± 0.988 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) Where void noop(const py::int_& v) {
}
void normalize(const py::int_& v) {
PyObject *type = PyExc_ValueError;
PyObject *value = v.ptr();
PyObject *trace = nullptr;
PyErr_NormalizeException(&type, &value, &trace);
}
PYBIND11_MODULE(bar, m) {
m.def("noop", &noop);
m.def("normalize", &normalize);
} This seems ~insignificant compared to other overheads, so I've moved normalization to the constructor. Another option is to make normalization an impl detail of |
…1#1895 ``` Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests": =========================================================== test session starts ============================================================ platform linux -- Python 3.9.12, pytest-6.2.3, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3 cachedir: .pytest_cache rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, configfile: pytest.ini collected 1 item test_shared_ptr_deleter_gil_acquire.py::test_merry_go_round Ant 0 1 Bug 0 2 Cat 0 3 Bug 1 3 Ant 1 3 Bug 2 3 Cat 1 3 Bug 3 3 Ant 2 3 Bug 4 3 Bug 5 3 Ant 3 3 Cat done. Bug 6 2 Bug 7 2 Ant done. Bug 8 1 Bug 9 1 Bug 10 1 Bug 11 1 Bug done. About to: delete raw_ptr; All done: 1.204 PASSED ============================================================ 1 passed in 1.21s ============================================================= ```
@jbms I think I figured it out, based on this small experiment: rwgk/rwgk_tbx@e1e2010 Things got significantly simpler, which is usually a good sign. In particular, the custom All clang sanitizers and valgrind here in the GHA CI are happy. I'll run the Google-global testing with the current state (at commit 53e29f4). |
I'm still not sure I understand the motivation for using
However, I expect the copy constructor to only be used in rare cases (calling
|
It's a bit involved, the primary motivation is to enable this (newly added in commit 80dfaea yesterday):
For background, current master does this:
Two lines of thought from here: 1. Aspect In a 2nd call of
My thinking: If 2. Aspect I.e. it releases the Python references, but that conflicts with the lazy-what feature this PR is about. An earlier version of this PR (at commit e9a2e6d) has this:
The To not lose the performance gain, I decided to hold on the the original references, landing here (current commit 53e29f4):
With that
For this to work we need the Further considerations:
@jbms wrote:
Yes, that was a consideration. Even if rare, I have to think millions of calls per second, 24/7. It will happen. Note that
That wasn't a consideration: |
I forgot to mention:
Yes, that's something we still need to solve (I believe the last hole that needs plugging). It's nice that we don't have to worry about it anymore for the copy ctor, but that was not the motivation for using |
include/pybind11/pytypes.h
Outdated
@@ -12,6 +12,9 @@ | |||
#include "detail/common.h" | |||
#include "buffer_info.h" | |||
|
|||
#include <cstdio> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is stdio still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer: no, thanks for pointing it out!
Long answer: commit message of 8e145c0
``` iwyu version: include-what-you-use 0.17 based on Debian clang version 13.0.1-3+build2 ``` Command used: ``` iwyu -c -std=c++17 -DPYBIND11_TEST_BOOST -Iinclude/pybind11 -I/usr/include/python3.9 -I/usr/include/eigen3 include/pybind11/pytypes.cpp ``` pytypes.cpp is a temporary file: `#include "pytypes.h"` The raw output is very long and noisy. I decided to use `#include <cstddef>` instead of `#include <cstdio>` for `std::size_t` (iwyu sticks to the manual choice). I ignored all iwyu suggestions that are indirectly covered by `#include <Python.h>`. I manually verified that all added includes are actually needed.
Google-global testing passed! (For the state at commit 53e29f4, locally merged into the smart_holder branch; the internal test ID is OCL:448346887:BASE:452610608:1654202021385:c5fceac8). Thanks for the approval @Skylion007! I'll wait for the CI to finish before merging. In the meantime I asked in an internal chatroom about |
Regarding For If In general it seems to me more natural from an API perspective to either drop the references or retain them, but not add extra logic to prevent calling Rethrowing an exception with |
Thanks, I didn't know about that change pre/post C++11. I'm not sure I like it... :-) But bet there were good reasons.
I think of it as a trap that people will fall into at some small-ish rate, wondering how their error got lost, while still developing. I don't think such a bug will survive long in production code, but getting it earlier can save a lot of debugging time.
Yes, and it's only a difference between paying sooner (at the time of the
I'm ambiguous about this one: fundamentally I agree, producing something like
We could easily even keep track of But it would be an additional behavior change. I think the current PR is a meaningful and fairly large step in a good direction. We can revisit that behavior change separately. (safe gil seems more important)
Without the
Absolutely, but our tooling (internal & clang-tidy here) doesn't catch it, therefore it would be a whack-a-mole game trying to squash that bad practice. Even if it's bad practice, I wouldn't want to leave a hole: reference count corruption in large applications tends to be extremely difficult to reproduce and debug. Sorry, way too many buts here! But if we also fold in safe gil, I believe we'll be in the rock solid state that is most important, then we can refine from there. |
)" This reverts commit a05bc3d. # Conflicts: # include/pybind11/pytypes.h
The original motivation for this PR was a performance optimization. See the original PR description below.
After nearly 3 years, work on this PR has led to a few related but distinct behavior changes:
PyErr_NormalizeException()
up a few lines #3971 — Fixes formatting of messages for originally unnormalized Python errors.This PR:
test_flaky_exception_failure_point_init
in test_exceptions.py).test_flaky_exception_failure_point_str
in test_exceptions.py).error_already_set::restore()
multiple times is clearly reported (preempts accidentally clearing an error, or accidentally restoring the same error twice; seetest_error_already_set_double_restore
in test_exceptions.py).error_already_set
copy constructor is no longer racy.detail::error_string()
no longer restores the error message. This fixes (in practice inconsequential) bugs in detail/class.h.The performance gain for exceptions with long tracebacks can be very substantial. See #1895 (comment) for concrete numbers.
A review of the
error_already_set
development history is under #1895 (comment).Original PR description below
Prior to this PR throwing
error_already_set
was expensive due to the eager construction of the error string. The PR changeserror_already_set::what
to construct the error string on first call.Before:
1.7 µs ± 9.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
After:
1.14 µs ± 1.93 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Benchmark:
Note that the running time of the benchmark is dominated by stack unwinding, therefore the speedup is only ~30%.
Suggested changelog entry:
``error_already_set`` is now safer and more performant, especially for exceptions with long tracebacks.