Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for memory leak in Python target #1873

Merged
merged 9 commits into from
Jul 11, 2023
Merged

Fix for memory leak in Python target #1873

merged 9 commits into from
Jul 11, 2023

Conversation

jackyk02
Copy link
Collaborator

This pull request addresses a memory leak issue observed in the current implementation. As shown in the following screenshot, it is noticeable that with each iteration, the size of Python objects constructed using C extensions continues to grow.

Untitled picture

This problem could be attributed to several key issues, which are as follows:

  1. PyObject Reference Count: The reference counts for the majority of PyObjects are not being appropriately managed.

  2. Port Assignment: Rather than using the _LF_SET macro to assign values to the generic_port_instance_struct in the current implementation, the more suitable method would be to use lf_set_token and define a destructor for the token to ensure proper garbage collection.

  3. Function Definition: The existing _lf_free_token_value function is incorrect and fails to invoke the destructor.

Notes:

  • The function call Py_XINCREF(message_byte_array) has been removed. This is due to the fact that PyBytes_FromStringAndSize already performs the necessary operation to increment the reference count.
  • The macro identifier _LF_GARBAGE_COLLECTED has been renamed to _PYTHON_TARGET_ENABLED. This modification gives the macro a more indicative name.

Other related changes have also been made in the reactor-c repository (LINK).

Copy link
Collaborator

@petervdonovan petervdonovan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any problems with this and it matches the description on this PR. Just making sure I understand -- the added Py_XDECREF on line 191 is correct because serialized_pyobject is connected to pointerExpression, which was already used previously by

PyBytes_AsStringAndSize(serialized_pyobject, &"
            + serializedVarName
            + ".buf, &"
            + serializedVarName
            + ".len);\n");

? This would be clearer if the reference count were decremented in generateNetworkSerializerCode, where serialized_pyobject is used. Would that work?

@jackyk02
Copy link
Collaborator Author

I don't see any problems with this and it matches the description on this PR. Just making sure I understand -- the added Py_XDECREF on line 191 is correct because serialized_pyobject is connected to pointerExpression, which was already used previously by

PyBytes_AsStringAndSize(serialized_pyobject, &"
            + serializedVarName
            + ".buf, &"
            + serializedVarName
            + ".len);\n");

? This would be clearer if the reference count were decremented in generateNetworkSerializerCode, where serialized_pyobject is used. Would that work?

I don't see any problems with this and it matches the description on this PR. Just making sure I understand -- the added Py_XDECREF on line 191 is correct because serialized_pyobject is connected to pointerExpression, which was already used previously by

PyBytes_AsStringAndSize(serialized_pyobject, &"
            + serializedVarName
            + ".buf, &"
            + serializedVarName
            + ".len);\n");

? This would be clearer if the reference count were decremented in generateNetworkSerializerCode, where serialized_pyobject is used. Would that work?

Yes, that is correct. Moving Py_XDECREF(serialized_pyobject) to generateNetworkSerializerCode doesn't work because the serialized_pyobject is used for the sendingFunction.

@petervdonovan
Copy link
Collaborator

I may do some rebases in this and in the companion PR to get this up to date with master. This will mess up your version history if you have a divergent version.

I think the only reason why the tests failed was that the submodule was not updated.

@petervdonovan
Copy link
Collaborator

Here is a minimal reproduction for a test failure:

target Python {
  timeout: 1 sec,
  fast: true
}

reactor SendNumber {
  output out

  reaction(startup) -> out {=
    sm = 99

    out.set(sm/4.0)
  =}
}

reactor Print {
  input in_
  reaction (in_) {=
    print(in_.value)
  =}
}

main reactor {
  m = new SendNumber()
  p = new Print()
  m.out -> p.in_
}

Without the local variable sm, the test passes; without dividing it by 4 on line 12, the test passes. With both of those features in SendNumber there is consistently a segmentation fault.

@petervdonovan
Copy link
Collaborator

Could this be because

  1. a number object is created in the SendNumber reaction
  2. when we set the port to it, we access the number object in the C code without increasing its reference count
  3. then we return to the reaction and reach the end of it, at which point the number object goes out of scope and its reference count is decremented
  4. the number object gets freed and when we try to access it later, in the downstream reaction, it has been overwritten

This is supported by the fact that if I add back the line Py_INCREF(val); to python_port.c, there is no segmentation fault. I think that Py_INCREF(val); is actually the right thing to do if I understand this right.

IIRC there are some generated reactions in the federated Python target whose bodies are actually C? Maybe we need to decrement reference counts at the ends of those generated reactions to emulate the behavior that I think CPython must have when local variables go out of scope.

@jackyk02
Copy link
Collaborator Author

jackyk02 commented Jul 1, 2023

Could this be because

  1. a number object is created in the SendNumber reaction
  2. when we set the port to it, we access the number object in the C code without increasing its reference count
  3. then we return to the reaction and reach the end of it, at which point the number object goes out of scope and its reference count is decremented
  4. the number object gets freed and when we try to access it later, in the downstream reaction, it has been overwritten

This is supported by the fact that if I add back the line Py_INCREF(val); to python_port.c, there is no segmentation fault. I think that Py_INCREF(val); is actually the right thing to do if I understand this right.

IIRC there are some generated reactions in the federated Python target whose bodies are actually C? Maybe we need to decrement reference counts at the ends of those generated reactions to emulate the behavior that I think CPython must have when local variables go out of scope.

Hi Peter, your understanding and analysis are correct. Yes, using Py_INCREF(val); is indeed the right thing to do. I've created a new commit (LINK) where I added back Py_INCREF(val). Additionally, I've defined a different destructor for the output port considering the reference count for the output value has now become 3.

Also, I believe python_count_decrement(port->value) is not necessary anymore, given that the reference count has already been decreased in the destructor. Currently, it works because Py_XDECREF allows the input object to be NULL.

I've conducted tests on both singleport and multiport in federated execution with the latest commit, and all the tests have successfully passed. The memory leak issue is resolved for federated execution.

@petervdonovan
Copy link
Collaborator

petervdonovan commented Jul 1, 2023

I've created a new commit (LINK) where I added back Py_INCREF(val). Additionally, I've defined a different destructor for the output port considering the reference count for the output value has now become 3.

I saw that commit but am still not sure that I understand it. Decreasing the reference count as much as necessary in order to ensure that the object is freed looks like it could be hiding an off-by-one error in the reference count that is introduced elsewhere, and it is not clear to me that it will be compatible with the behavior of the Python runtime, which could be saving references that our runtime is not accounting for.

Here is an example of a program that segfaults on my machine (with the latest version of memory_leak_fix checked out in reactor-c, and in the memory_leak branch of lingua-franca):

target Python {
  timeout: 1 sec,
  fast: true
}

reactor SendNumber {
  state sm = {= [] =}
  output out

  timer t(0, 100 msec)

  reaction(t) -> out {=

    out.set(self.sm)
  =}
}

reactor Print {
  input in_
  reaction (in_) {=
    print(in_.value)
  =}
}

main reactor {
  m = new SendNumber()
  p = new Print()
  m.out -> p.in_
}

The output is the following:

[]
[]
[[...]]
Segmentation fault (core dumped)

Please let me know if you are having trouble reproducing this or a similar error.

The problem is that the Python runtime knows that there is a reference to the state variable, but our runtime doesn't know that. Therefore, when our runtime executes the destructor, and it decreases the reference count all the way to zero regardless of what the count was before, it is messing with (the counting of) references that it does not manage.

@petervdonovan
Copy link
Collaborator

If you share any steps required to reproduce the leak, then I would be happy to attempt to help. I'll probably have some time to tinker with it in the next few days.

@jackyk02
Copy link
Collaborator Author

jackyk02 commented Jul 4, 2023

I have successfully reproduced the error you mentioned. I agree that we should not decrease the reference count all the way to zero regardless of what the count was before. The segmentation fault occurring in the provided example is now fixed by this commit (lf-lang/reactor-c@825fe82)

In the commit above, python_count_decrement is still used as the destructor for the output port, and the value assigned to the port only increments its reference count by one.

For reproducing the leak, the following example could be used:

target Python{
    timeout: 10 sec,
    logging: DEBUG
};


preamble {=
    import time
    import tracemalloc
    tracemalloc.start()
=}

reactor ClientReactor {
    input in_parameter
    output out_parameter
    
    reaction(startup) {=
     =}
     
     reaction(in_parameter) -> out_parameter{=
        time.sleep(2)
        param_temp = in_parameter.value
        print(len(param_temp))
        print("RECEIVED-----------------------------------------------------------------------------------------------")
        out_parameter.set(param_temp)
    =}
}

reactor ServerReactor {
    input in_parameter
    output out_parameter
    state large_param
    
    reaction(startup)-> out_parameter {=
        temp = [1.00005]*10
        self.large_param = temp * 100000
        out_parameter.set(self.large_param)
     =}
     
     reaction(in_parameter) -> out_parameter{=
        snapshot = tracemalloc.take_snapshot()
        top_stats = snapshot.statistics("lineno")
        
        print("[ Top 10 ]")
        for stat in top_stats[:10]:
            print(stat)
            
        time.sleep(2)
        param_temp = in_parameter.value
        out_parameter.set(param_temp)
    =}
}

federated reactor{
    client = new ClientReactor()
    server = new ServerReactor()
    server.out_parameter -> client.in_parameter after 0
    client.out_parameter -> server.in_parameter
}

A temporary fix has been implemented in this commit (lf-lang/reactor-c@ec6e4f8). Please let me know if you have trouble reproducing the leak. Thank you for your offer to help :)

@petervdonovan
Copy link
Collaborator

petervdonovan commented Jul 5, 2023

Thanks Jacky!

I'm assuming that we are interested in the top entry that is marked <unknown>:0:, right? Since that corresponds to allocations in our generated C code? And we see the size and count associated with that entry increasing?

An interesting feature of the MRE that you shared that is different from the screenshot at the top of this post is that it doesn't look like the large object is what is being leaked. In your screenshot, the number of objects goes from 20000226 to 24000256 (a large increase), whereas when I run your MRE locally with reactor-c/825fe82 checked out, the number of objects initially jumps to about a million (as expected, since we send an array of length about a million), and then it jumps to 2 million (as I would expect if we are leaking all the million-long arrays). But after that the number only creeps up slowly, from 2000016 to 2000027 to 2000034, and so on, with the total size stuck at around 61.6 MiB (i.e., not increasing substantially). So it seems like we are maybe not leaking the large objects in this MRE, but instead we are leaking a relatively small number of small objects.

Furthermore, even using reactor-c/ec6e4f8, I still see very similar behavior, with an initial jump to about 2 million objects and 61.6 MiB and then a slow leak of only a small number of objects in each report. I can't really discern the difference, even though the commit messages say 825fe82 leaks whereas ec6e4f8 doesn't leak.

@jackyk02
Copy link
Collaborator Author

jackyk02 commented Jul 6, 2023

Thanks for testing the MRE. It is indeed a bit confusing as to why the total size appears to remain at around 61.6 MB. Based on my observations, the reference count of port->value is 2 during the first iteration. However, in the following iterations, the reference count of that becomes 1. I'm conducting further investigations to identify which Python C API increments the reference count of port->value.

You are right to point out that small objects seem to be continuously leaking. This could potentially be due to a few lingering temporary PyObjects in the current implementation. Ideally, these should be removed through the garbage collection.

As for the commit reactor-c/825fe82, I'm curious to know if all the tests passed successfully with this commit. Furthermore, it appears that unfederated execution only support single-threaded runtime...

@petervdonovan
Copy link
Collaborator

I can't really discern the difference, even though the commit messages say 825fe82 leaks whereas ec6e4f8 doesn't leak.

You are right to point out that small objects seem to be continuously leaking. This could potentially be due to a few lingering temporary PyObjects in the current implementation.

Just making sure this is clear -- I'm still not sure that the MRE actually is an MRE, in the sense that I do not see how it reproduces the leak that is blocking the federated learning project. If the reference count on the big list that is allocated in the reaction body were stuck at 1, then we would be accumulating lots of copies of that big list, which isn't happening. So I still am not sure what I would need to do in order to make sure I have actually reproduced the same bug that is blocking the federated learning project.

@petervdonovan
Copy link
Collaborator

I'm curious to know if all the tests passed successfully with this commit.

OK, I'll update the submodule and merge in master so that the tests run.

Furthermore, it appears that unfederated execution doesn't support multiport. Do you happen to have an example that demonstrates it?

Hm? Unfederated execution does support multiports. Here are the passing test cases that use multiports without federated execution.

This is expected to result in no segfaults but a leak.
Due to failing enclave tests.
@jackyk02
Copy link
Collaborator Author

jackyk02 commented Jul 6, 2023

Hm? Unfederated execution does support multiports. Here are the passing test cases that use multiports without federated execution.

To clarify, what I intended to say is that the unfederated execution only supports single-threaded runtime.

So I still am not sure what I would need to do in order to make sure I have actually reproduced the same bug that is blocking the federated learning project.

It appears that the recent commit reactor-c/825fe82 no longer has the memory leak issue for the large list. The bug I mentioned seems to be caused by the while loop in the previous commit, where it decreases the reference count all the way to zero regardless of what the count was before. Thank you for your help, and now we should prioritize ensuring that all tests pass successfully.

Do not decrement the reference count twice.
In my most recent meeting with @jackykwok2024 I believe we determined
that although there are multiple memory leaks, the most severe one is
fixed in the commit being referenced here, and that in any case the
other commit in reactor-c is not less leaky than the one referenced
here. Besides, I have found that in at least one test program a segfault
is fixed by using the commit referenced here (d28a9f5).
@petervdonovan petervdonovan marked this pull request as draft July 10, 2023 20:04
This seems to work on DelayArrayWithAfter.
@lhstrh lhstrh changed the title Memory Leak during Federated Execution Memory leak in Python target fixed Jul 11, 2023
@lhstrh lhstrh enabled auto-merge July 11, 2023 00:26
@lhstrh lhstrh added this pull request to the merge queue Jul 11, 2023
Merged via the queue into master with commit 045c429 Jul 11, 2023
42 checks passed
@lhstrh lhstrh deleted the memory_leak branch July 11, 2023 04:41
@petervdonovan petervdonovan changed the title Memory leak in Python target fixed Fix for memory leak in Python target Aug 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants