Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long waiting times after computation finished for specific datasets #493

Closed
dfsp-spirit opened this issue May 4, 2023 · 4 comments
Closed
Assignees
Labels
Explore Examine novel functionality/proposed changes etc. Does not necessarily involve coding things. Performance Improve the number crunching

Comments

@dfsp-spirit
Copy link
Collaborator

dfsp-spirit commented May 4, 2023

Describe the bug
With one of our datasets, we noticed that when running freqanalysis, after computations are done, it takes a long time for the frontend function to return.

We need to investigate what is happening during the time period after the computation, and which specifics of the dataset lead to this behavior. Could be related to saving data, hdf5, ...

To Reproduce
Load dataset DataCombinedSessionsForCondition_Baseline_ALL_monopolar_Epoched_WindowLength500ms.spy from developer fileserver and run spy.freqanalysis with default settings. E.g., in a shell script:

#!/usr/bin/env python
# run this like: SPYLOGLEVEL=DEBUG python runspy.py
import syncopy as spy
import numpy as np
dt = spy.load("~/issue493/DataCombinedSessionsForCondition_Baseline_ALL_monopolar_Epoched_WindowLength500ms.spy")
res = spy.freqanalysis(dt)

Expected behavior
The function should return quickly after the computation is done.

@dfsp-spirit dfsp-spirit added Explore Examine novel functionality/proposed changes etc. Does not necessarily involve coding things. Performance Improve the number crunching labels May 4, 2023
@dfsp-spirit
Copy link
Collaborator Author

dfsp-spirit commented May 4, 2023

We investigated this both on the cluster and on our local machines, and it turns out it was due to the performance issue fixed yesterday in PR 455 . A %prunin ipy on the master branch (which still shows the reported behavior) showed that the time is spend in list comprehensions when accessing the time property (syncopy/datatype/continuous_data.py", line 184, in time).

However, on the dev branch, the long waiting times are gone.

Note: It is not clear when the bad behavior was introduced, as it seems unlikely that this has existed for a long time. First profilings with a very old syncopy version also confirm that this did not exist a long time ago (end of 2020). Regular performance testing may be required to avoid things like this in the future.

Edit: Inefficient selections were the cuplrit, which were introduced from the start of the project. However in release 2022.12 trivial in place selections were added to process_metadata, which lead to the spill over of those inefficiencies into standard CR runs. By drastically improving the time property and the selections with #489 and #455 the situation got resolved.

Edit: Inefficient selections were the cuplrit, which were introduced from the start of the project. However in release 2022.12 trivial in place selections were added to process_metadata, which lead to the spill over of those inefficiencies into standard CR runs. By drastically improving the time property and the selections with #489 and #455 the situation got resolved.

EDIT2: arithmetics also benefit greatly for the same reasons

@dfsp-spirit
Copy link
Collaborator Author

dfsp-spirit commented May 4, 2023

The behavior was introduced in version 2022.12 by the following code block in propagate_properties:

 # attach a dummy selection for easier propagation
    selection_cleanup = False
    if in_data.selection is None:
        in_data.selectdata(inplace=True)
        selection_cleanup = True

Creating the inplace selection was expensive with many trials, as it adds one slice per trial. This is fixed now.

@dfsp-spirit
Copy link
Collaborator Author

@kajal5888 So I am closing this as done. Please switch to the dev version to solve your issue.

@dfsp-spirit
Copy link
Collaborator Author

See #494 for the mentioned need for performance monitoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Explore Examine novel functionality/proposed changes etc. Does not necessarily involve coding things. Performance Improve the number crunching
Projects
None yet
Development

No branches or pull requests

2 participants