-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak due to LRU cache in method of EphysNWBData #495
Comments
Thanks for reporting this @gouwens. The data_set = create_ephys_data_set(
nwb_file=nwb_file, ontology=StimulusOntology.DEFAULT_STIMULUS_ONTOLOGY_FILE
)
for _ in range(5):
for num in data_set._data.sweep_numbers:
my_sweep = data_set.sweep(num)
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
# Prints out a summary of the large objects
summary.print_(sum1) Here is the output without the
Here is the output with the
I don't see much of a difference between the two, but they both seem to have a memory leak. Do you think this is related to #494? |
I think there could be a couple of things going on there. It could be that the other memory leak in #494 is still causing issues even with Here's code where I see a clear difference in memory usage with and without commenting out # Setup
from ipfx.stimulus import StimulusOntology
import allensdk.core.json_utilities as ju
from ipfx.dataset.mies_nwb_data import MIESNWBData
from ipfx.dataset.labnotebook import LabNotebookReaderIgorNwb
from pympler import muppy, summary
ontology = StimulusOntology(ju.read(StimulusOntology.DEFAULT_STIMULUS_ONTOLOGY_FILE))
# example nwb2 files
nwb_file_list = [
'/allen/programs/celltypes/production/mousecelltypes/prod176/Ephys_Roi_Result_628543361/nwb2_Scnn1a-Tg2-Cre;Ai14-346639.04.02.01.nwb',
'/allen/programs/celltypes/production/mousecelltypes/prod2457/Ephys_Roi_Result_998064513/nwb2_Vip-IRES-Cre;Ai14-504181.07.02.01.nwb',
'/allen/programs/celltypes/production/mousecelltypes/prod2480/Ephys_Roi_Result_1000110850/nwb2_Esr2-IRES2-Cre;Ai14-506384.03.02.01.nwb',
'/allen/programs/celltypes/production/mousecelltypes/prod2481/Ephys_Roi_Result_1000125224/nwb2_Esr2-IRES2-Cre;Ai14-506384.03.02.02.nwb',
]
# function to load & return a data set object
def load_data_set(nwb_path, ontology, load_into_memory):
labnotebook = LabNotebookReaderIgorNwb(nwb_file)
data_set = MIESNWBData(
nwb_file=nwb_path,
notebook=labnotebook,
ontology=ontology,
load_into_memory=load_into_memory
)
return data_set
# Keep memory examination code isolated in its own function
def summarize_memory():
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
for nwb_file in nwb_file_list:
ds = load_data_set(nwb_file, ontology, load_into_memory=False) # working around the #494 leak
for num in ds.sweep_numbers:
my_sweep_data = ds.get_sweep_data(num)
summarize_memory() With this code, I see this when
And this is what I see when
|
Thanks for the code @gouwens. I changed the import package from ipfx/ipfx/dataset/ephys_nwb_data.py Line 3 in 75a3ea7
Verification: from warnings import filterwarnings
from time import time
# Setup
from ipfx.stimulus import StimulusOntology
import allensdk.core.json_utilities as ju
from ipfx.dataset.mies_nwb_data import MIESNWBData
from ipfx.dataset.labnotebook import LabNotebookReaderIgorNwb
from pympler import muppy, summary
filterwarnings("ignore", category=UserWarning)
ontology = StimulusOntology(ju.read(StimulusOntology.DEFAULT_STIMULUS_ONTOLOGY_FILE))
# example nwb2 files
nwb_file_list = [
'/allen/programs/celltypes/production/mousecelltypes/prod176/Ephys_Roi_Result_628543361/nwb2_Scnn1a-Tg2-Cre;Ai14-346639.04.02.01.nwb',
'/allen/programs/celltypes/production/mousecelltypes/prod2457/Ephys_Roi_Result_998064513/nwb2_Vip-IRES-Cre;Ai14-504181.07.02.01.nwb',
'/allen/programs/celltypes/production/mousecelltypes/prod2480/Ephys_Roi_Result_1000110850/nwb2_Esr2-IRES2-Cre;Ai14-506384.03.02.01.nwb',
'/allen/programs/celltypes/production/mousecelltypes/prod2481/Ephys_Roi_Result_1000125224/nwb2_Esr2-IRES2-Cre;Ai14-506384.03.02.02.nwb',
]
# function to load & return a data set object
def load_data_set(nwb_path, ontology, load_into_memory):
labnotebook = LabNotebookReaderIgorNwb(nwb_file)
data_set = MIESNWBData(
nwb_file=nwb_path,
notebook=labnotebook,
ontology=ontology,
load_into_memory=load_into_memory
)
return data_set
# Keep memory examination code isolated in its own function
def summarize_memory(data_type: str = ""):
mem_summary = summary.summarize(muppy.get_objects())
output = [elem for elem in mem_summary if elem[0] == data_type]
if output:
summary.print_(output)
else:
summary.print_(mem_summary)
start_time = time()
for nwb_file in nwb_file_list:
# working around the #494 leak
ds = load_data_set(nwb_file, ontology, load_into_memory=False)
for _ in range(2): # repeat this twice to make sure that caching still works
for num in ds.sweep_numbers:
my_sweep_data = ds.get_sweep_data(num)
summarize_memory("numpy.ndarray") # numpy arrays are the biggest objects
print(f"\nTime elapsed: {time()-start_time} s") And console output:
Good catch! I'll put in a PR to patch this bug soon. |
Describe the bug
The LRU cache on the
_get_series()
method ofEphysNWBData
causes a memory leak becauseself
is passed to the cache, meaning the object can never be let go. This is an issue at least for theMIESNWBData
subclass because it has an instance variablenotebook
(usually aLabNotebookReaderIgorNwb
), which has a few large numpy arrays that eventually use a great deal of memory.The place in the code where that happens is here:
ipfx/ipfx/dataset/ephys_nwb_data.py
Line 111 in 75a3ea7
See https://stackoverflow.com/questions/33672412/python-functools-lru-cache-with-class-methods-release-object for information about the issues with using
@lru_cache
inside classes. There are a couple of strategies for handling this issue discussed in that post - I haven't though about what is the best option in this case, though.At the moment, I think I am working around it by manually flushing the cache when I'm done with the data set object, e.g.
But I think that's probably too much to expect a typical user to know about and implement.
The text was updated successfully, but these errors were encountered: