Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Deprecate txt data format in favor of csv (or tsv) #949

Open
3 tasks
asoplata opened this issue Nov 21, 2024 · 1 comment
Open
3 tasks

MAINT: Deprecate txt data format in favor of csv (or tsv) #949

asoplata opened this issue Nov 21, 2024 · 1 comment
Labels
docs Documentation and tutorials enhancement New feature or request file-formats IO refactor
Milestone

Comments

@asoplata
Copy link
Collaborator

Currently, the GUI outputs (dipole) simulation data ONLY into csv files, but the Dipole class outputs data ONLY to either custom txt or hdf5 files.

Subtasks:

  • GUI and API output should be harmonized (the same filetype!)
  • We should consider deprecating txt as a format entirely. Regardless of our eventual support for hdf5, we should transition to only outputting either csv or tsv in place of txt. See reasons below.
  • We should add a page to the documentation specifically about our input and output file formats, since currently it is different between legacy tutorials and code docstrings. (See below)

Motivation

  • I understand that the use of txt is a holdover from the original HNN-legacy version, but txt is a vague file format that is not descriptive about its contents.
  • Part of the motivation for this issue is also that we do not actually use the the original HNN-legacy txt format. The original HNN-legacy txt files have some description on this page https://jonescompneurolab.github.io/hnn-tutorials/gui/tour_gui under section Setting Parameters and Saving Model Data on Your Computer, reproduced here (emphasis mine):

Note that the directory path is /home/hnn/data/default, corresponding to the default Simulation Name parameter specified in the GUI. Also note the individual files present in the window:

  1. default.param - a backup copy of the param file used to run the simulation
  2. dpl.txt - normalized dipole in units of nAm; 1st column is time; 2nd column is layer 2 dipole; 3rd column is layer 5 dipole; 45h column is aggregate dipole from layers 2 and 5
  3. i.txt - currents from the cells
  4. param.txt - a machine-readable representation of all parameters used to run the simulation
  5. rawdpl.txt - un-normalized dipole; same columnar layout as dpl.txt
  6. rawspec.npz - spectrogram from the dipole saved in numpy format; you can use numpy to load this file (note that this file is only produced either when using rhythmic inputs or explicitly asking HNN to save the spectrogram; for the ERP shown above, no spectrogram is saved)
  7. spk.txt - a list of cell identifiers and spike times

In contrast, txt files output by HNN-Core’s Dipole.write() (here https://github.com/jonescompneurolab/hnn-core/blob/master/hnn_core/dipole.py#L673-L678 ) use a different columnar format (reproduced here):

Outputs

A tab separatd txt file where rows correspond
to samples and columns correspond to

  1. time (ms),
  2. aggregate current dipole (scaled nAm),
  3. L2/3 current dipole (scaled nAm), and
  4. L5 current dipole (scaled nAm)
  • As far as data output, even if we eventually switch the default data output to hdf5, I still think that we should retain the option to output plain-text data. hdf5 has its own learning curve, especially for newcomers, while csv is very common. In workshops, we may want to stick with plain-text output for this reason.
  • (We should of course still allow import of legacy HNN-Core and HNN-legacy txt files)
  • Our Dipole.write() function outputs our txt file but claims it is a tab-separated file. If we wanted to keep using tab-separated data (instead of comma-separated), then we should still switch to the tsv format. This would allow us to use a file that indicates its setup, but also enables us to use headers for the columns.
@asoplata asoplata added file-formats enhancement New feature or request docs Documentation and tutorials refactor IO labels Nov 21, 2024
@asoplata asoplata added this to the 0.5 milestone Nov 21, 2024
@asoplata asoplata changed the title Deprecate txt data format in favor of csv (or tsv) MAINT: Deprecate txt data format in favor of csv (or tsv) Nov 22, 2024
@jasmainak
Copy link
Collaborator

tsv is not a bad idea as the reader for the old format is not too different from tsv. Keep in mind that a text-based format is going to be larger in size than a binary format. The reason for hdf5 was that it is compatible with Matlab too while being a binary format.

@asoplata it might be a good idea to consolidate IO issues into one ... I have a feeling that the discussion has been fragmented which has led to a fragmented implementation. For a consistent IO implementation, one should plan in advance how the various compatibilities (between versions, between GUI and HNN-core) will be managed and document the format before implementing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation and tutorials enhancement New feature or request file-formats IO refactor
Projects
None yet
Development

No branches or pull requests

2 participants