-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing mixed Python / C++ code? #46
Comments
Hi @drewm1980 I'm glad you consider using HawkTracer :) I've never written a code for tracing python, and I haven't heard of anybody who did that. I also didn't have plan to do this in the nearest future. However, since there's a potential use of that, I'm very happy to implement that functionality, or at least assist you with doing that. |
We're writing soft real-time software for robotic pick and place machines, with cycle times usually 100-200 ms. We'd like to see visually when operations are taking longer than we expect, threads are contending, cpu is idle while waiting for the GPU, etc... in the name of reducing worst case latency for analyzing one frame of data. Python is our glue language and calls into C++ code we wrote, and Tensorflow. Some other profilers we're looking at: Uber's pyflame and Ben Frederickson's py-spy (sampling, mixed Python/native) are very promising, but they're very new (as in Ben added the native call stack unwinding parts only last week) and I had issues getting them working in our environment. easy_profiler (intrusive, native only, with a gui) has been around longer, but I couldn't get the client and the code to connect: We may try perf + cpuprofilify (sampling, native): I evaluated VTune; it has some python integration, but it doesn't seem to have a fully functional timeline view; as a tool it seems more focused on reducing overall cpu time, rather than minimizing worst case latency. There are also big libraries like LTTNG and SystemTap for system-wide profiling, and big libraries for HPC stuff, but it's not clear how to even get started with them. Nvidia's and AMD's profilers are understandably focused on profiling their GPU's, and don't show much, if anything for the cpu. Anyway, thanks for the response! We did get hawktracer working on a C++ function in our code; thanks for writing it! |
Hi @drewm1980 thanks for the explanation. So it looks like you're basically interested in low-overhead intrusive function-level time profiling for more than just the native stack - that's what HawkTracer was designed for, as our usecase was very similar, but except python, we had lua/javascript. I'm very happy to work with you getting python bindings to work in HawkTracer. I see three possible solutions for that:
def foo():
hawktracer.Start("some_label")
# your python code here
hawktracer.Stop()
def foo():
# TracepointStart some_label
# your your python code here
# TracepoinStop Then you have generator that scans the code, and turns it into the code shown in the 1). The advantage of this solution is that in the production code you can very easily completely disable HawkTracer, and just have a special build target where you enable those tracepoints. The disadvantage of this solution is complexity, as you need to have 2 build targets, script for scanning and fixing the code etc. Other option to reduce overhead (to almost zero) in the production code is to just keep the code as it was in 1), and change C++ implementation of Start()/Stop() method to empty functions if you don't want to run the profiler (it could be e.g. some kind of
regarding 1) and 2) - if you want to measure the scope, we could have a class with the deleter, and measure the scope using def foo():
with HawkTracerTracepoint("label") as tp:
# your python code For 2): def foo():
# TraceScope label
# your python code and then have a script that turns the code into 1). We could also mix 1)/2) with 3). Please let me know what are your thoughts on that, and if you have any other ideas :) |
I'm not sure what the most ergonomic or efficient API for python is... function decorators are likely the pythonic way to do it. The interpreter already has some mechanism for instrumenting at the function level, though, through cProfile. If there's a way of setting up hawktrace hooks for specific functions at module import time, I bet it would be more efficient than having python function decorator code call the C++ functions through an FFI at function call time... But keep in mind I know very little about how the python interpreter works internally. cProfile has a reputation for high overhead, though we haven't measured it on our project yet (one of our next steps). Maybe restricting it only to instrument functions you care about (and log them through hawktrace so the log is shared with native code) would be fast enough for most uses. We already spent probably too much dev time on tooling; not sure how much more time we can spend in the short term. We'll likely focus on improving coverage of the C++ parts of our code for now. |
Then how about we implement a decorator, something like: import os
import HawkTracer
def HawkTracerTrace(func):
if HawkTracerTrace.tracing_enabled is None:
HawkTracerTrace.tracing_enabled = 'HAWKTRACER_TRACE' in os.environ
print(HawkTracerTrace.tracing_enabled)
def wrapper(*args, **kwargs):
if HawkTracerTrace.tracing_enabled is None:
HawkTracerTrace.tracing_enabled = 'HAWKTRACER_TRACE' in os.environ
if HawkTracerTrace.tracing_enabled:
HawkTracer.start(func.__name__)
func(*args, **kwargs)
HawkTracer.stop()
else:
func(*args, **kwargs)
return wrapper
HawkTracerTrace.tracing_enabled = None Then whenever you want to instrument your function, you can do so: @HawkTracerTrace
def func1():
print("Func1()")
@HawkTracerTrace
def func2():
print("Func2()") ? |
Hi, I think it's better to do the os.environ check outside of the wrapper, and if false, just return the function unmodified (I believe the decorator is evaluated when the interpreter hits the "@"). Otherwise you're adding control flow and environment variable checks even when instrumentation is supposed to be turned off. How about: in hawktracer.py: #!usr/bin/env python3
import os
#_tracing_enabled = False
_tracing_enabled = True
if 'HAWKTRACER_TRACE' in os.environ:
_tracing_enabled = True
def enable():
_tracing_enabled = True
def disable():
_tracing_enabled = False
def _start(region_name):
# Actual code that starts a hawktracer region
pass
def _stop():
# Actual code that stops a hawktracer region
pass
def start(region_name):
if _tracing_enabled:
print("Started a trace region called",region_name,'...')
_start(region_name)
else:
print("NOT Starting a trace region called",region_name,'...')
def stop():
if _tracing_enabled:
print("Stopped a trace region")
_stop()
def trace(func,_tracing_enabled=_tracing_enabled):
if _tracing_enabled:
print("Instrumenting function",func.__name__)
def wrapper(*args, **kwargs):
_start(func.__name__)
func(*args, **kwargs)
_stop()
return wrapper
else:
print("NOT instrumenting function",func.__name__)
return func in user code: #!/usr/bin/env python3
import hawktracer
from hawktracer import trace
hawktracer.enable()
from time import sleep
@trace
def foo():
pass
from hawktracer import start, stop
start("bar")
stop()
hawktracer.disable()
@trace
def baz():
pass The above code has a bug I haven't worked out yet... enable() and disable() don't have an effect. Maybe you already tried something like that and it's why you did what you did. There's probably another python intrusive profiler out there that has worked this issue out... |
Hi @drewm1980 Thanks a lot for suggestions. I'm aware the code I wrote is not ideal, but the main thing I wanted to point out is the end-user experience. So I assume we both agree that the api should be: @trace
def foo():
# this function is going to be traced when called
pass
hawktracer.enable() # tracing enabled
foo() # this will generate the tracepoint
hawktracer.disable() # tracing disabled
foo() # this function call won't generate tracepoints, because hawktracer is disabled Not sure about the function names/namespaces yet, but more or less it'll be something like above. If you agree with that design, I'll prototype it over the weekend. If the implementation is not efficient, we can iterate and improve it, but at least the functionality is going to be there and the API will be available for customers (including yourself:) ). Please let me know what do you think about that, so we can start working on that soon. |
I'm likewise unsure about what people end up really needing in practice, i.e. if they're more concerned about being able to enable/disable tracing:
Each project probably will have an opinion on how to do 0 depending on their build environment. I was trying to make 1. and 2. work. I think your last example is trying to make 3. work. Do you have internal python users at Amazon to drive your development priorities? Or do you have aspirations for hawktracer to grow into some sort of tool for AWS/cloud? The ordering above would probably be mine, but you're not working for us :) Being able to promise "no overhead in instrumented functions when disabled" is an important promise to be able to make to put people's minds at ease when sprinkling instrumentation code all over their codebase. I think the way I wrote the decorator probably achieved that... in C++ you probably want to have a preprocessor flag that ensures hawktrace isn't in the compiled code when disabled. It's the #2 FAQ on Telemetry's web page: |
hi @drewm1980 @trace
def foo():
pass The way it's going to be implemented is not that important for me right now - it's easy to change the implementation, but difficult to change the API. |
I of course have no objection! I have a point to make RE the implementation affecting the API, but I found it easier to write the code than to explain it clearly. I'll fork and create a pull request. |
Here's the pull request: No offense if you don't want to pull for whatever reason; cutting and pasting here was just getting clunky. I got 2. working, but it required changing how the API is used (see code). 3. is broken. |
Hi @drewm1980 your proposal looks good. I have a few improvements though that would allow us to do 3 as well. I changed the def _trace(func):
global _tracing_enabled
print("Instrumenting function",func.__name__)
def wrapper(*args, **kwargs):
if _tracing_enabled:
_start(func.__name__)
func(*args, **kwargs)
if _tracing_enabled:
_stop()
return wrapper so it checks in the wrapper if tracing is enabled - it's only an overhead (one I also changed def enable_decorator():
global _tracing_enabled
global trace
if _tracing_enabled:
trace = _trace so it only enables the decorator when the tracing is enabled. After that, the example from your pull request works as expected. I also changed default value of What do you think about those changes? |
Hi @drewm1980 I've done a prototype over the weekend on my personal github account. Please have a look and let me know what do you think about it. I'm very open for any suggestion, feel free to make pull request to that branch as well. Once we think it's ready, I'll merge it to this repository's master branch. |
Hi @loganek Very cool! So it looks like for the decorator disabling code you're doing what my python code was doing, but ported to C: static PyObject *
ht_python_core_trace(PyObject* Py_UNUSED(self), PyObject* args)
{
PyObject *traced_function;
if (!PyArg_ParseTuple(args, "O", &traced_function))
{
return NULL;
}
if (trace_method)
{
return PyCFunction_New(trace_method, traced_function);
}
else
{
Py_XINCREF(traced_function);
return traced_function;
}
} Does having the state for whether the decorator is enabled in C mean that users can do static PyObject*
ht_python_core_trace_function(PyObject* function, PyObject *args)
{
uintptr_t address = 0;
if (!PyCallable_Check(function))
{
Py_RETURN_NONE;
}
if (tracing_enabled)
{
PyObject* function_name_attr = PyObject_GetAttrString(function, "__name__");
if (function_name_attr)
{
address = ht_python_core_get_label_address(function_name_attr);
ht_feature_callstack_start_int(ht_global_timeline_get(), address);
Py_DECREF(function_name_attr);
}
}
PyEval_CallObject(function, args);
if (address)
{
ht_feature_callstack_stop(ht_global_timeline_get());
}
Py_RETURN_NONE;
} I'm studying how you're caching the function names... I'm wondering if there will be issues with functions that have the same name... in particular, in python multiple instances of the same string can end up sharing memory as an optimization, and this even holds for function names, in or out of a class: In [7]: x = 'foo'
In [8]: y = 'foo'
In [9]: id(x)
Out[9]: 140149505679912
In [10]: id(y)
Out[10]: 140149505679912
In [13]: def f():
...: pass
...:
In [14]: g = f
In [15]: def f():
...: pass
...:
In [16]: g.__name__
Out[16]: 'f'
In [17]: f.__name__
Out[17]: 'f'
In [19]: id(g.__name__)
Out[19]: 140149571706928
In [20]: id(f.__name__)
Out[20]: 140149571706928
In [26]: class Foo():
...: def __init__(self):
...: pass
...: def f(self):
...: pass
In [28]: id(Foo.f.__name__)
Out[28]: 140149571706928 That seems contrived, but in my codebase, there are multiple functions with the same name as members of various classes in various modules. If I understand correctly, your tracepoint_map is using function names as UUID's, and that's not going to work in a lot of python code bases; python's not like C where function names have to be unique or they collide, or like in C++ where they're transparently mangled so they don't collide when there is overloading. Could tracepoint_map be constructed in the decorator function, so that the overhead of setting it up is at decoration-time, rather than the first run through the code? Dunno if it's significant or not; just wanted to send some quick feedback after skimming the code. My team member who tried out hawktracer on C++ in our code will give your new python wrapper a shot. |
Hi @drewm1980
Yes, it's exactly like that, with a few little improvements.
Yes, I've tested various scenarios, but feel free to double-check.
In general I don't see any problem with using the same map entry for 2 different functions - as long as the name is correct, it's fine. But as far as I understand, you suggest to not use
Yeah, I think that'd be a nice improvement, which could be easily implemented (I hope). As I said in one of my previous comments, I'm more focusing on the interface and user experience first, so I didn't pay much attention to actual implementation. There might still be quite a few potential improvements in my code :)
Any feedback is very appreciated, especially as constructive as yours. Thanks a lot, and looking forward too see more of your comments :) |
cool! When I showed your example code to another developer, his initial impression was "I have to do all this stuff?". Of course it's not, but it would probably be good for adoption to have a minimal_example.py file that just traces one function without showing off any other optional features.
You can disregard that comment for now; we'll let you know if it's an issue in practice. If the user annotates their own nested regions, that turns into something like a call stack in the output anyway, so maybe the name shadowing isn't an issue as long as hawktracer still gets the nesting right when regions have the same name.
No need to thank me; you're doing all the hard work! |
Yeah, the minimalistic example would be: import HawkTracer.Core
@HawkTracer.Core.trace
def foo():
print("This is foo.")
foo() and run the program with
Sure, let me know if you have any real issues, so we can work to fix it. I think I'll merge my changes soon. I want to re-review it and fix potential problems. After that, I'll close the issue. Feel free to open another one, if you find some bugs or you have any future feature requests :) |
This is bikeshedding, but IMHO it's prettier as:
Doesn't it still need an init function call? Or is that already handled in the import? If so, nice! Also bikeshedding, but python has a convention for package and module name capitalization: From PEP 8: Package and Module Names Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged. It's not uncommon for users to just start typing stuff in an ipython shell to discover an API. Such users will try the lowercase module name first, curse, maybe check your docs, then discover the capitalized version. |
Sure, agree.
It's not upstream yet, but I have that change locally.
Good to know, will rename it to |
Hey @drewm1980 I finished the code and opened a PR #48 . Please review, any suggestion is very appreciated :) |
PR #48 has been merged, closing the issue. Feel free to open a new one in case there's some missing functionality. |
Has anyone written an integration for tracing python code? If so, do you have a reference?
Greetings from Belgium, by the way; I'm the American guy who asked you a couple questions at FOSDEM.
Cheers,
Andrew
The text was updated successfully, but these errors were encountered: