Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch plugin cache #748

Merged
merged 4 commits into from
Aug 11, 2023
Merged

Patch plugin cache #748

merged 4 commits into from
Aug 11, 2023

Conversation

WenzDaniel
Copy link
Collaborator

What is the problem / what does the code in this PR do
Adds fix and test for #747

@WenzDaniel
Copy link
Collaborator Author

Checked also with different straxen plugins that things are now working as expected.

Copy link
Contributor

@jmosbacher jmosbacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.
The only thing that bothers me a bit is modifying targets inside the loop iterating over it. I dont think this is defined behavior by the python spec so a future implementation of the list object may break it. Especially since python will be removing the GIL soon, CPython core implementations are expected to change in the near future.
Maybe go with the standard way of doing this

while targets:
    target = targets.pop(0)
    ...

@WenzDaniel
Copy link
Collaborator Author

WenzDaniel commented Aug 11, 2023

Sorry, based on your comment I am not sure whether you think the for-loop, or the appending-to-the-same-list is the problem. The former I agree, can be changed into a while loop. The latter cannot be changed as I do not know my full list of targets at the beginning of the loop (in the sense that I do not know the list of plugins my actual target depends on). So using pop does not work ( At least I cannot see how this would work). The only other solution I could think about is defining some "global" dictionary and do some recursive function call like it is done in _get_plugins if we are not loading things form cache.

@WenzDaniel
Copy link
Collaborator Author

WenzDaniel commented Aug 11, 2023

So something like this:

plugins = {}

def _get_depndent_plugins(targets):
    for target in targets:
        target_plugin = cached_plugins[target]

        for provides in target_plugin.provides:
            plugins[provides] = target_plugin
    
        [_get_depndent_plugins((target,)) for target in target_plugin.depends_on]

_get_depndent_plugins(targets)

But I liked the other solution better because it is easier to understand. However, if you are saying that this might lead to issues in the future we should move to this solution.

@jmosbacher
Copy link
Contributor

Sorry if I wasnt explicit enough. for loops in python use iterators, the next value is returned from the iterator object and not the original object. In many list-like implementations including the current implementation of the builtin list, the iterator object returned is the original object so modifying it inside the loop can work but its undefined behavior that can change with the implementation of list in CPython.

The while loop does not create an iterator, and therefore should always work.
to be explicit in the proposed implementation:

while targets:
    target = targets.pop(0)
    if target in plugins:
        continue

    target_plugin = cached_plugins[target]
    for provides in target_plugin.provides:
        plugins[provides] = target_plugin
    targets += list(target_plugin.depends_on)

@WenzDaniel
Copy link
Collaborator Author

I am learning new things now, nice. How does the while loop then work if not with iterators? (It is hard to find any information about this by simply searching for it. You always get things like "how to write a while loop in python...") So in a while loop the iteration is not done on the original object but the modified one?

@jmosbacher
Copy link
Contributor

jmosbacher commented Aug 11, 2023

the while loop will re-evaluates the expression for thuthyness at every iteration and executes the code body if the expression is true, calling __bool__ method if its an object of a different type than bool (__len__ if __bool__ not defined ).
lists thruthyness has been defined in the python spec to return true if the list has non-zero length for a while now so no need to use len(targets) unless you want to support very old python versions.

@WenzDaniel
Copy link
Collaborator Author

Ah sorry, no I was more curious why

while list:
    item = list.pop(0)
    list.append(...) #modify list

is okay but

for item in list:
    list.append(...) #modify list

is not. For me both are iterators. I understood from your comment above (maybe wrongly), that a for-loop is only defined to loop over the original "list" without it being modifications, and in the while loop "list" is reevaluated after each iteration.

@jmosbacher
Copy link
Contributor

jmosbacher commented Aug 11, 2023

The python spec. The for loop only evaluates the expression once. The while loop evaluates it at every execution

@WenzDaniel
Copy link
Collaborator Author

Cool thanks, was an interesting discussion. I will push the while loop change.

@jmosbacher
Copy link
Contributor

jmosbacher commented Aug 11, 2023

Maybe the confusion is on what an iterator is in python?
The iterator type as defined here has both __iter__ and the __next__ methods defined. These are what the iteration mechanism calls. the __iter__ method is called once at the beginning to set things up and then the __next__ method is called at every iteration to fetch the next object.

The for loop first evaluates the expression to get an iterator (calling __iter__ on it if its an object) and then begins the iteration. at every step of the iteration, the __next__ method is called and the loop ends if the StopIteration exception is raised during this call.

@WenzDaniel
Copy link
Collaborator Author

Yes this I knew, and due to this I would have though that modifying the list in a for loop would be save. Because the iterator only knows about the next object while iterating and only stops if StopIteration is raised. So I would have thought that adding things to my object I am iterating over would be fairly safe.

@jmosbacher
Copy link
Contributor

Ah yes, now i understand. The issue is that there is nothing in the spec that tells you to return the original object from container.__iter__ and not a copy. So e.g. when the GIL is removed, CPython may switch to returning a copy to avoid write collisions from different threads etc. so better to be safe than sorry :)

@WenzDaniel
Copy link
Collaborator Author

Thanks, was a nice discussion :D I can already go home now as I learned something today :D

@WenzDaniel
Copy link
Collaborator Author

Should be ready to be merged now.

@jmosbacher
Copy link
Contributor

oops, forgot to approve :)

@WenzDaniel WenzDaniel merged commit 32df173 into master Aug 11, 2023
8 checks passed
@WenzDaniel WenzDaniel deleted the patch_plugin_cache branch August 11, 2023 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants