Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Unity] Fail to print out call trace under relax::Normalizer #15880

Closed
jinhongyii opened this issue Oct 5, 2023 · 7 comments · Fixed by #15955
Closed

[Bug] [Unity] Fail to print out call trace under relax::Normalizer #15880

jinhongyii opened this issue Oct 5, 2023 · 7 comments · Fixed by #15955
Labels
branch: unity needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug

Comments

@jinhongyii
Copy link
Contributor

jinhongyii commented Oct 5, 2023

Expected behavior

printing out call trace when encountering ICHECK failure

Actual behavior

python/tvm/relax/block_builder.py:645: in normalize
    return _ffi_api.BlockBuilderNormalize(self, expr)  # type: ignore
python/tvm/_ffi/_ctypes/packed_func.py:239: in __call__
    raise_last_ffi_error()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def raise_last_ffi_error():
        """Raise the previous error from FFI
    
        This should be used instead of `raise get_last_ffi_error()`, as it
        handle propagation of errors across an FFI boundary.  For example,
        if Python passes a callback to a C++ function, and the callback
        raises an exception, the re-thrown exception should contain the
        full stack trace, not just the stack frames that are above the
        outermost FFI call.
        """
    
        _LIB.TVMGetLastPythonError.restype = ctypes.c_void_p
        _LIB.TVMGetLastBacktrace.restype = ctypes.c_char_p
        py_err = _LIB.TVMGetLastPythonError()
        if py_err is None:
            c_err_msg = py_str(_LIB.TVMGetLastError())
            py_err_msg, err_type = c2pyerror(c_err_msg)
            if err_type is not None and err_type.startswith("tvm.error."):
                err_type = err_type[10:]
            py_err = ERROR_TYPE.get(err_type, TVMError)(py_err_msg)
    
        else:
            # TVMGetLastPythonError returns a PyObject*, with NULL when
            # there is no such value.  If we annotated the restype as
            # ctypes.py_object, we would need to return Py_None from the
            # C++ implementation.  This would require introducing a
            # dependency on libpython that we want to avoid when not in a
            # Python environment.  Therefore, casting the resulting void*
            # pointer to PyObject* using ctypes.
            py_err = ctypes.cast(ctypes.c_void_p(py_err), ctypes.py_object).value
    
        tb = py_err.__traceback__
    
        # The py_err.__traceback__ only goes from the location thrown
        # up to the next FFI handoff.  To have the stacktrace also
        # include the C++ side, we need to adjust the __traceback__
        # before re-throwing.
        backtrace = _LIB.TVMGetLastBacktrace()
        if backtrace:
            frames = re.split(r"\n\W+\d+:\W+", py_str(backtrace))
            frames = frames[1:]  # Skip "Stack trace: "
    
            for frame in frames:
                if " at " in frame:
                    print(frame)
                    func_name, frame = frame.split(" at ", 1)
>                   filename, lineno = frame.rsplit(":", 1)
E                   ValueError: not enough values to unpack (expected 2, got 1)

Environment

TVM Unity TOT

Steps to reproduce

Comment out this line:
https://github.com/apache/tvm/blob/unity/src/relax/op/tensor/linear_algebra.cc#L150
and run tests/python/relax/test_op_linear_algebra.py

if you print out frame here, you will see

tvm::relax::BlockBuilderImpl::ReportFatal(tvm::Diagnostic const&)
        at /home/hongyi/tvm/src/relax/ir/block_builder.cc:138
tvm::relax::GetTensorStructInfoFromTuple(tvm::relax::Call const&, tvm::relax::BlockBuilder const&, tvm::RelayExpr const&)
        at /home/hongyi/tvm/src/relax/op/op_common.cc:64
tvm::relax::InferStructInfoEinsum(tvm::relax::Call const&, tvm::relax::BlockBuilder const&)
        at /home/hongyi/tvm/src/relax/op/tensor/linear_algebra.cc:173
tvm::relax::Normalizer::InferStructInfo(tvm::relax::Call const&)
        at /home/hongyi/tvm/src/relax/ir/block_builder.cc:757
tvm::relax::Normalizer::VisitExpr_(tvm::relax::CallNode const*)
        at /home/hongyi/tvm/src/relax/ir/block_builder.cc:599
non-virtual thunk to tvm::relax::Normalizer::VisitExpr_(tvm::relax::CallNode const*)
        at /home/hongyi/tvm/src/relax/ir/block_builder.cc

cc @quic-sanirudh

@jinhongyii jinhongyii added type: bug needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it branch: unity labels Oct 5, 2023
@tqchen
Copy link
Member

tqchen commented Oct 5, 2023

cc @Lunderberg

@tqchen
Copy link
Member

tqchen commented Oct 5, 2023

seems is related to #15596

@Lunderberg
Copy link
Contributor

Thank you, and I'll take a look at it later this week. When catching in python, it resolved some out-of-order issues, but I hadn't checked if it also caused others.

@Lunderberg
Copy link
Contributor

Ah, looks like the string isn't collecting the full stack track, only the python object. This becomes a problem if the error is caught in a context where the string is printed.

@jinhongyii
Copy link
Contributor Author

Hi @Lunderberg just want to check if there's any progress on this issue

@Lunderberg
Copy link
Contributor

Unfortunately, I haven't been able to reproduce the error so far on my side using your example. This was attempted in builds with and without debug symbols, with and without optimization enabled. While commenting out the FInferStructInfo did trigger the exception to be raised from Normalizer::InferStructInfo(const Call&), it didn't trigger the error in the FFI error propagation.

What compiler are you using? I'm currently running on g++ 11.3.0, and maybe that causes a difference.

Lunderberg added a commit to Lunderberg/tvm that referenced this issue Oct 20, 2023
Prior to this commit, the FFI error propagation in Python assumed that
any stack frame with a known file name also had a known line number.
This commit updates the error propagation to instead check if a known
line number is present.

Closes apache#15880.
@Lunderberg
Copy link
Contributor

That said, the print(frame) output in your bug report is very useful, and I think I have a fix implemented. Can you test with #15955 applied, and see if the error still occurs?

junrushao pushed a commit that referenced this issue Oct 30, 2023
…15955)

Prior to this commit, the FFI error propagation in Python assumed that
any stack frame with a known file name also had a known line number.
This commit updates the error propagation to instead check if a known
line number is present.

Closes #15880.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: unity needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants