Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: free(): invalid pointer #4137

Closed
2 of 3 tasks
whybeyoung opened this issue Aug 11, 2022 · 8 comments
Closed
2 of 3 tasks

[BUG]: free(): invalid pointer #4137

whybeyoung opened this issue Aug 11, 2022 · 8 comments
Labels
triage New bug, unverified

Comments

@whybeyoung
Copy link

whybeyoung commented Aug 11, 2022

Required prerequisites

Problem description

Like issue #1472,
we still have problem in 2.10.0

free(): invalid pointer

c++ code:

#include "pybind11/embed.h"
#include <iostream>
#include <thread>
#include <chrono>
#include <sstream>

namespace py = pybind11;
using namespace std::chrono_literals;

class Wrapper
{
public:
  Wrapper()
  {
    py::gil_scoped_acquire acquire;
    _obj = py::module::import("wrapper").attr("Wrapper")();
    _wrapperInit = _obj.attr("wrapperInit");
    _wrapperFini = _obj.attr("wrapperFini");

  }

  ~Wrapper()
  {
    _wrapperInit.release();
    _wrapperFini.release();
  }

  int wrapperInit()
  {
    py::gil_scoped_acquire acquire;
    return _wrapperInit(nullptr).cast<int>();
  }

  void wrapperFini(int x)
  {
    py::gil_scoped_acquire acquire;
    _wrapperFini(x);
  }

  private:
  py::object _obj;
  py::object _wrapperInit;
  py::object _wrapperFini;
};
void thread_func(int iteration)
{
  Wrapper w;

  for (int i = 0; i < 1; i++)
  {
    w.wrapperInit();
    std::stringstream msg;
    msg << "iteration: " << iteration << " thread: " << std::this_thread::get_id() << std::endl;
    std::cout << msg.str();
        std::this_thread::sleep_for(100ms);
  }
}

int main() {
  py::scoped_interpreter guard{};
  py::gil_scoped_release release; // add this to release the GIL

  std::vector<std::thread> threads;

  for (int i = 0; i < 1; ++i)
    threads.push_back(std::thread(thread_func, 1));

  for (auto& t : threads)
    t.join();

  return 0;
}

wrapper.py code is

class Wrapper():
    serviceId = "mmocr"
    version = "backup.0"


    '''
    服务初始化
    @param config:
        插件初始化需要的一些配置,字典类型
        key: 配置名
        value: 配置的值
    @return
        ret: 错误码。无错误时返回0
    '''

    def wrapperInit(cls, config: {}) -> int:
        import torch
        print(config)

        print("Initializing ..")
        return 0

    def wrapperFini(cls) -> int:
        return 0

we run this code in ubuntu18.04 docker container. and the repo is public.ecr.aws/iflytek-open/opensource/demo/mmocr:v3.1

Reproducible example code

No response

@whybeyoung whybeyoung added the triage New bug, unverified label Aug 11, 2022
@henryiii
Copy link
Collaborator

I'm guessing this is #4105.

@henryiii
Copy link
Collaborator

I verified this is not #4105, this code was broken in 2.9 as well.

@rwgk
Copy link
Collaborator

rwgk commented Oct 24, 2022

I couldn't reproduce the free(): invalid pointer crash using the code here, but there is certainly a GIL issue that you can confirm by using PR #4146. The problem in the reproducer code is that the GIL is not being held when the destructor for Wrapper::_obj is running. You can "fix" it by adding _obj.release(); in the Wrapper destructor. "fix" is in quotation marks because it is simply leaking the Python reference, "masking" would be a more fitting word. To not leak:

--- main_using_embed_h.cpp.orig 2022-10-23 21:29:46.559375849 -0700
+++ main_using_embed_h.cpp      2022-10-23 21:56:25.089334464 -0700
@@ -21,7 +21,12 @@

   ~Wrapper()
   {
+    py::gil_scoped_acquire hold_gil;
+    _obj.dec_ref();
+    _obj.release();
+    _wrapperInit.dec_ref();
     _wrapperInit.release();
+    _wrapperFini.dec_ref();
     _wrapperFini.release();
   }

I'm closing this bug because it's pretty likely that the free(): invalid pointer has nothing to do with a bug in pybind11.

Until we merge PR #4146, I recommend you patch it locally and run all your tests.

@Davidnet
Copy link

Davidnet commented Jan 31, 2024

I am encountering this with the same conditon this is my set-up that can be replicated

# dummy_python_script.py
import torch

def simple_return():
    
    return 1

the simple.cpp

#include <iostream>
#include <future>
#include <pybind11/embed.h>

namespace py = pybind11;

std::future<int> callPythonFunctionAsync(py::object &pyFunction) {
    return std::async(std::launch::async, [&](){
        py::gil_scoped_acquire acquire;
        int result = pyFunction().cast<int>();
        return result;
    });
}

int main() {
    py::scoped_interpreter guard{}; // Start the interpreter and keep it alive

    // Import the Python module
    py::module pyModule = py::module::import("dummy");
    py::object pyFunction = pyModule.attr("simple_return");

    // Call the function asynchronously
    std::cout << "Calling Python function asynchronously..." << std::endl;
    py::gil_scoped_release release;
    auto futureResult = callPythonFunctionAsync(pyFunction);

    // Wait for the result and print it
    try {
        int result = futureResult.get();
        std::cout << "Result from Python: " << result << std::endl;
    } catch (const std::exception& e) {
        std::cerr << "Exception caught: " << e.what() << std::endl;
    }

    return 0;
}

with the following cmake

cmake_minimum_required(VERSION 3.10)  # Updated minimum required version
project(py_cpp_func)

set(CMAKE_CXX_STANDARD 11)  # Setting C++ standard to C++11
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")

# Manually set Python include directories and libraries
set(PYTHON_INCLUDE_DIR /usr/local/include/python3.10)
set(PYTHON_LIBRARY /usr/local/lib/libpython3.10.so)
include_directories(${PYTHON_INCLUDE_DIR})

# Include pybind11
# Include pybind11 from the external directory
add_subdirectory(external/pybind11)
add_executable(py_dummy simple.cpp)
target_link_libraries(py_dummy PRIVATE ${PYTHON_LIBRARIES} pybind11::embed)
configure_file(dummy.py ${CMAKE_BINARY_DIR}/dummy.py COPYONLY)

with the following dockerfile:

FROM ubuntu:18.04

RUN apt-get update && \
    apt-get install -y software-properties-common && \
    add-apt-repository ppa:ubuntu-toolchain-r/test && \
    apt-get update && \
    apt-get install -y \
    gcc \
    g++ \
    cmake \
    libboost-all-dev \
    wget

RUN apt-get remove -y cmake && \
    wget https://cmake.org/files/v3.10/cmake-3.10.0-Linux-x86_64.sh && \
    chmod +x cmake-3.10.0-Linux-x86_64.sh && \
    ./cmake-3.10.0-Linux-x86_64.sh --skip-license --prefix=/usr/local

RUN apt-get install -y git

# Clone pybind11 into the external directory
RUN mkdir -p /external && \
    git clone --branch v2.11.1 https://github.com/pybind/pybind11.git /external/pybind11

# Install Python 3.10.13
ENV PYTHON_VERSION 3.10.13

# Install necessary packages
RUN apt-get update && \
    apt-get install -y software-properties-common wget git \
    build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev \
    libssl-dev libsqlite3-dev libreadline-dev libffi-dev curl libbz2-dev liblzma-dev
RUN apt-get install -y libgomp1 libgl1-mesa-glx

# Download Python 3.10 source
RUN cd /tmp && \
    wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tar.xz && \
    tar -xf Python-$PYTHON_VERSION.tar.xz

# Compile Python 3.10
RUN cd /tmp/Python-$PYTHON_VERSION && \
    ./configure --enable-optimizations --enable-shared && \
    make -j 8 && \
    make altinstall && \
    ldconfig

# Install pip for Python 3.10
RUN cd /tmp && \
    wget https://bootstrap.pypa.io/get-pip.py && \
    python3.10 get-pip.py && \
    rm get-pip.py

# Install OpenCV for C++
RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y libopencv-dev

WORKDIR /usr/src/three-stage-object-detection
# Install Triton Inference Server
COPY three-stage-object-detection /usr/src/three-stage-object-detection/
RUN python3.10 -m pip install -e .

WORKDIR /usr/src/app
COPY CMakeLists.txt /usr/src/app/
COPY dummy.py /usr/src/app/
COPY simple.cpp /usr/src/app/
RUN mkdir external && \
    ln -s /external/pybind11 external/pybind11
RUN mkdir build && \
    cd build && \
    cmake -DCMAKE_BUILD_TYPE=Debug .. && \
    make

WORKDIR /usr/src/app/build

# Clean up
RUN apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

@rwgk
Copy link
Collaborator

rwgk commented Jan 31, 2024

I am encountering this with the same conditon this is my set-up that can be replicated

  • Does this run successfully if you remove import torch?
  • Do you have a stack trace from the crash?
  • I don't think that's it, but I'd make this change:
-std::future<int> callPythonFunctionAsync(py::object &pyFunction)
+std::future<int> callPythonFunctionAsync(py::handle pyFunction)
  • I don't think any of the maintainers will have the time to reproduce the crash. If this is important to you, I recommend you send a PR that adds a .github/workflows/reproducer.yml job to run in GitHub Actions.

  • I really really doubt the root cause is in pybind11.

@Davidnet
Copy link

Davidnet commented Jan 31, 2024

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff6c7f7f1 in __GI_abort () at abort.c:79
#2  0x00007ffff6cc8837 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff6df5a7b "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff6ccf8ba in malloc_printerr (str=str@entry=0x7ffff6df3c76 "free(): invalid pointer") at malloc.c:5342
#4  0x00007ffff6cd6dec in _int_free (have_lock=0, p=0x7fff280e49a8, av=0x7ffff702ac40 <main_arena>) at malloc.c:4167
#5  __GI___libc_free (mem=0x7fff280e49b8) at malloc.c:3134
#6  0x000055555542c508 in __gnu_cxx::new_allocator<std::_Fwd_list_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::destroy<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (this=0x5555556dae78, __p=0x7fff280e6158) at /usr/include/c++/7/ext/new_allocator.h:140
#7  0x000055555542876b in std::allocator_traits<std::allocator<std::_Fwd_list_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::destroy<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=..., __p=0x7fff280e6158) at /usr/include/c++/7/bits/alloc_traits.h:487
#8  0x000055555542319d in std::_Fwd_list_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_erase_after (this=0x5555556dae78, __pos=0x5555556dae78, __last=0x0) at /usr/include/c++/7/bits/forward_list.tcc:90
#9  0x000055555541e84a in std::_Fwd_list_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::~_Fwd_list_base (this=0x5555556dae78, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/forward_list.h:329
#10 0x000055555541a82c in std::forward_list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::~forward_list (this=0x5555556dae78, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/forward_list.h:559
#11 0x000055555540fb3b in pybind11::detail::internals::~internals (this=0x5555556dacd0, __in_chrg=<optimized out>) at /external/pybind11/include/pybind11/detail/internals.h:207
#12 0x0000555555419629 in pybind11::finalize_interpreter () at /external/pybind11/include/pybind11/embed.h:263
#13 0x00005555554196ea in pybind11::scoped_interpreter::~scoped_interpreter (this=0x7fffffffe533, __in_chrg=<optimized out>) at /external/pybind11/include/pybind11/embed.h:308
#14 0x0000555555407d2d in main () at /usr/src/app/simple.cpp:16

I got this backtrace also I was able to run if I update to 20.04 on the docker base image.

someone on gitter helped me to get the trace

@Davidnet
Copy link

I am encountering this with the same conditon this is my set-up that can be replicated

  • Does this run successfully if you remove import torch?
  • Do you have a stack trace from the crash?
  • I don't think that's it, but I'd make this change:
-std::future<int> callPythonFunctionAsync(py::object &pyFunction)
+std::future<int> callPythonFunctionAsync(py::handle pyFunction)
  • I don't think any of the maintainers will have the time to reproduce the crash. If this is important to you, I recommend you send a PR that adds a .github/workflows/reproducer.yml job to run in GitHub Actions.
  • I really really doubt the root cause is in pybind11.

if I do not put torch, the code works, so definitly something with torch

@rwgk
Copy link
Collaborator

rwgk commented Jan 31, 2024

if I do not put torch, the code works, so definitly something with torch

I'd work on sending them a PR that reproduces the crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage New bug, unverified
Projects
None yet
Development

No branches or pull requests

4 participants