-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[idea] Thread-local incr/decr #28
Comments
I think someone at the sprints was already working on this, and @larryhastings was working on making buffered refcounts (again to rid of atomics, but in another way). Just to increase the priority of this issue - I have made some profiling (running Same profiling shows that atomic refcounting takes at least as much percentage from |
@larryhastings - what's going on about reworking refcounts? This seems to be the biggest slowdown so far... |
I have came here to post this idea and seems validate as concept. Just to be clear, when going to incre objrefcount on local threaf first if its zero, if so first do atomic incre of threadrefcount on object. Similarly when did decr objrefcount on local thread then check if its become zero? If so do atomic decr threadrefcount and than also check threadrefcount be comes zero & called destructor in same theard. |
I just watched your PyCon talk about this project, and I had an idea to reduce cache misses due to atomic incr/decr.
What if you introduced a thread-local refcount and a thread-global thread-refcount?
The idea would be that the number of threads that access an object rarely changes. So if a thread wants to change the refcount it can do so locally and quickly, and only when its refcount drops to zero does it need to do an atomic decr on the thread-refcount and free if it's 0. If it's non-zero it's the job of the remaining threads to clean up once their local refcount drops to 0.
As you mentioned, most objects are only ever used in one thread. So in addition to the above concept, which would still require 2 atomic operations at creation and destruction, the thread id of the object creator could be stored so that touching the thread-global counter can be deferred until another thread incr's the object, in which case it'd need to be set to 2.
[edit] Actually local storage in C works nothing like
threading.local
which is implemented using a dict. That makes it a lot slower I guess. There is probably a lot more I overlooked.The text was updated successfully, but these errors were encountered: