{Edge,Node}Population should have a .to_pandas method. #140

matz-e · 2021-04-07T08:24:10Z

See title. For better usability, SONATA™ should provide functionality to provide a subset of the populations as Pandas dataframes for easier manipulation. Ideal usage from my side:

import libsonata as so
pop = so.EdgeStorage("foo.h").open_population("bar")
df = pop.to_pandas(so.Selection([(123, 666)])
stuff = df[(df.source_node_id > 313) & (df.axonal_delay < 3)]

(paraphrasing a bit)

mgeplf · 2021-04-07T08:35:34Z

This sort of functionality is part of SNAP. I'd prefer to avoid having pandas as a requirement of python libsonata, because it's a heavy dependency.

matz-e · 2021-04-07T11:12:19Z

Sure it is a heavy dependency, but we already depend on numpy, which itself is heavy:

Input spec
--------------------------------
 -   py-pandas

Concretized
--------------------------------
[+]  [email protected]%[email protected] arch=linux-rhel7-x86_64
[^]      ^[email protected]%[email protected] arch=linux-rhel7-x86_64
[^]          ^[email protected]%[email protected]+blas+lapack arch=linux-rhel7-x86_64
[^]              ^[email protected]%[email protected]~ilp64+shared threads=none arch=linux-rhel7-x86_64
[^]              ^[email protected]%[email protected] arch=linux-rhel7-x86_64
[^]                  ^[email protected]%[email protected] arch=linux-rhel7-x86_64
[^]                      ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-rhel7-x86_64
[^]      ^[email protected]%[email protected] arch=linux-rhel7-x86_64
[^]      ^[email protected]%[email protected] arch=linux-rhel7-x86_64
[^]          ^[email protected]%[email protected]~toml arch=linux-rhel7-x86_64
[^]          ^[email protected]%[email protected] arch=linux-rhel7-x86_64
[^]      ^[email protected]%[email protected] arch=linux-rhel7-x86_64

not that much more in the dependency tree that isn't numpy…

Seems like the counter-argument is to depend on something that is heavier, and pulls in a bunch of morphology dependencies. To me, it more seems like the API augmentations of snap should be migrated here…

mgeplf · 2021-04-07T11:18:50Z

numexpr/dateutil/pytz/etc are quite a bit more than just numpy (spack concretization is deceptive - pip install numpy only installs numpy; pandas installs more.

libsonata is supposed to be very low-level, very low dependy; the productivity stuff goes in SNAP.

matz-e · 2021-04-08T07:43:16Z

I disagree: compared to numpy, these additional dependencies don't seem all that heavy. Having to work with SNAP instead seems a little like saying we should use Qt for comfortable XML reading in C++.

mgeplf · 2021-04-08T11:37:38Z

Put another way, numpy is a required dependency in that it's the compact way to return numeric data in python. It would be hard/impossible to not use numpy, which is why it fits with the minimalist purpose of the library. The improvement you're describing is an ergonomic/convenience one, which should be handled by higher level libraries (ie: SNAP).

The idea is that this is safe to use by anything (ex: neurodamus-py), with the mimimal set of requirements.

What is your use case?

matz-e · 2021-04-09T06:34:09Z

My use case is to bulk load SONATA into Pandas to pass through to Spark. If I look into a file manually, I would also use this to compare between SONATA, Parquet, and binary data… so having some .to_df that returns something with columns ['source_node_id', 'target_node_id', 'delay', 'conductivity'…] would be very nice and still pretty basic.

mgeplf · 2021-04-09T06:55:54Z

Since you have to implement it for your use case, we should be able to take a look at it, and then make a decision.

alkino · 2021-11-26T16:59:59Z

For exemple, report_reader.hpp with DataFrame is ready to load inside pandas. Is it a solution for you? @matz-e

There is no dependency to pandas inside libsonata, but the output data is oriented pandas.

matz-e · 2021-11-26T17:20:16Z

Can I add columns to it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{Edge,Node}Population should have a .to_pandas method. #140

{Edge,Node}Population should have a .to_pandas method. #140

matz-e commented Apr 7, 2021

mgeplf commented Apr 7, 2021

matz-e commented Apr 7, 2021

mgeplf commented Apr 7, 2021

matz-e commented Apr 8, 2021

mgeplf commented Apr 8, 2021

matz-e commented Apr 9, 2021

mgeplf commented Apr 9, 2021

alkino commented Nov 26, 2021 •

edited

Loading

matz-e commented Nov 26, 2021

{Edge,Node}Population should have a .to_pandas method. #140

{Edge,Node}Population should have a .to_pandas method. #140

Comments

matz-e commented Apr 7, 2021

mgeplf commented Apr 7, 2021

matz-e commented Apr 7, 2021

mgeplf commented Apr 7, 2021

matz-e commented Apr 8, 2021

mgeplf commented Apr 8, 2021

matz-e commented Apr 9, 2021

mgeplf commented Apr 9, 2021

alkino commented Nov 26, 2021 • edited Loading

matz-e commented Nov 26, 2021

alkino commented Nov 26, 2021 •

edited

Loading