Load data #16

lauraporta · 2022-12-14T10:49:35Z

No description provided.

lauraporta · 2023-01-19T16:51:43Z

Summary

In this PR, I add the logic required to open and unpack the matlab files generated with the matlab codebase that I am translating, called allen-df files. They are specific for analysis modality, and in this case I am focusing only on the sf-tf one.

`load_data.py`

The load data method, after receiving the path of the file to load (in specs) creates a DataRaw object through the method load().
For now I skip the implementations that go beyond using allen-df files, this is why there are many NotImplementedErrors.

`data_raw.py`

In DataRaw various objects are extracted, all with the method unpack_data(), which deals with HDF5 using the library h5py. You can find an explanation of how does this method work in its docstring. It is tailor made to the specific file structure that I am opening.

Tests

The methods of DataRaw have the attribute @classmethod because this helps me testing them independently of the instantiation of the DataRaw class. In the test suite, I create on the local machine a mock HDF5 file check fundamental behaviors of my unpack_data() method.

JoeZiminski · 2023-01-20T13:26:15Z

Great job @lauraporta this is very nice, very well structured, code is clean and intuitive to follow. I've left a few suggestions but nothing major. Learnt some cool new features and might do some investigating into datashuttle structure inspired by #8!

JoeZiminski · 2023-01-20T11:05:34Z

load_suite2p/load/load_data.py

 from typing import Tuple

+import h5py
 from decouple import config


I think this needs to be a added to pyproject.toml from decouple import config

JoeZiminski · 2023-01-20T11:16:36Z

load_suite2p/load/load_data.py

 from decouple import config

+from ..objects.data_raw import DataRaw
+from ..objects.enums import AnalysisType, DataType
 from ..objects.specifications import Specifications
 from .read_config import read

 CONFIG_PATH = config("CONFIG_PATH")


This python-decouple module is very cool, I have not seen it before. As a completely naive user, I tried playing around with this just in the console and got 'CONFIG_PATH not found. Declare it as envvar or define a default value.', I can't quite figure out how it works. Is it possible to add these two lines of code to a function (e.g. a new function read_configs.get_config_path()) and add a docstring on how decouple works?

Would it be possible / sensible to get config_path in main.py and pass it load_data > get_specifications > read_configurations() ) so that settings are displayed together in main.py for readability? but maybe this suggestion is not optimal for python-decouple

I am using decouple as a way to not store in the repo the local paths of the config. It is set up in a way that it takes from granted the existence of an .env containing the variable CONFIG_PATH. I forgot to specify this, my bad. I will add it to Readme.md file.
I am not yet sure of the way I want to handle the configuration file yet, that is why this bit is still vague and unpolished.

JoeZiminski · 2023-01-20T11:23:35Z

load_suite2p/load/load_data.py



-def load_data(folder_name: str) -> Tuple[list, Specifications]:
+def load_data(folder_name: str) -> Tuple[DataRaw, Specifications]:


is folder_name the top_level folder of the experimental data directory? This is probably obvious as I can't recall how they lay out their data, but a quick example e.g. if the path is /windows/something/myproject the name is myproject? how does the loader know the full path to the project directory? This seems to be all handled very gracefully, maybe just some docs for the naive user on the main() docstring would be useful

The path is stored in the config file that was hidden from you sorry ☹️

JoeZiminski · 2023-01-20T11:24:07Z

load_suite2p/load/load_data.py

-    raise NotImplementedError("TODO")
+def load(specs: Specifications) -> DataRaw:
+    if specs.config["use-allen-dff"]:
+        if specs.config["analysis-type"] == "sf_tf":


is it possible to expand this abbreviation?

It is meant to be spatial frequency, temporal frequency, which is quite long...
I could just call it frequencies if it appears very confusing.

maybe spat_freq_temp_freq? or "spat_f_temp_f`?

JoeZiminski · 2023-01-20T11:27:50Z

load_suite2p/load/load_data.py

+            ]
+            if len(allen_data_files) == 1:
+                data_raw = DataRaw(
+                    h5py.File(allen_data_files[0].path, "r"), is_allen=True


This may already be happening in DataRaw, but if not it might be worth opening the file in a context manager that will close it automatically in case of an exception during loading, e.g.

with h5py.File(allen_data_files[0].path, "r") as h5py_file: data_raw = DataRaw(h5py_file, is_allen=True)

Yes it makes sense, thanks!

JoeZiminski · 2023-01-20T12:01:49Z