-
Short intro I saw that when calling Question Is there also a way to get natively a plain 2D array? (for later padding and dump into the h5) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I had forgotten about uproot5/src/uproot/interpretation/library.py Lines 631 to 715 in 94c085b That makes it an undocumented feature. According to this repo's git history, I implemented it 4 years ago. I would guess that
into data with types like
because that's what ak.zip does. This can only work if the variable-length lists ( I'll try it out on uproot-HZZ.root: >>> import uproot, skhep_testdata
>>> tree = uproot.open(skhep_testdata.data_path("uproot-HZZ.root"))["events"]
>>> tree.show(filter_name=["Electron_*", "Muon_*"])
name | typename | interpretation
---------------------+--------------------------+-------------------------------
Muon_Px | float[] | AsJagged(AsDtype('>f4'))
Muon_Py | float[] | AsJagged(AsDtype('>f4'))
Muon_Pz | float[] | AsJagged(AsDtype('>f4'))
Muon_E | float[] | AsJagged(AsDtype('>f4'))
Muon_Charge | int32_t[] | AsJagged(AsDtype('>i4'))
Muon_Iso | float[] | AsJagged(AsDtype('>f4'))
Electron_Px | float[] | AsJagged(AsDtype('>f4'))
Electron_Py | float[] | AsJagged(AsDtype('>f4'))
Electron_Pz | float[] | AsJagged(AsDtype('>f4'))
Electron_E | float[] | AsJagged(AsDtype('>f4'))
Electron_Charge | int32_t[] | AsJagged(AsDtype('>i4'))
Electron_Iso | float[] | AsJagged(AsDtype('>f4')) Without >>> tree.arrays(filter_name=["Electron_*", "Muon_*"]).show(type=True)
type: 2421 * {
Muon_Px: var * float32,
Muon_Py: var * float32,
Muon_Pz: var * float32,
Muon_E: var * float32,
Muon_Charge: var * int32,
Muon_Iso: var * float32,
Electron_Px: var * float32,
Electron_Py: var * float32,
Electron_Pz: var * float32,
Electron_E: var * float32,
Electron_Charge: var * int32,
Electron_Iso: var * float32
}
[{Muon_Px: [-52.9, 37.7], Muon_Py: [-11.7, 0.693], Muon_Pz: [...], ...},
{Muon_Px: [-0.816], Muon_Py: [-24.4], Muon_Pz: [20.2], Muon_E: [31.7], ...},
{Muon_Px: [49, 0.828], Muon_Py: [-21.7, 29.8], Muon_Pz: [...], ...},
{Muon_Px: [22.1, 76.7], Muon_Py: [-85.8, -14], Muon_Pz: [...], ...},
{Muon_Px: [45.2, 39.8], Muon_Py: [67.2, 25.4], Muon_Pz: [...], ...},
{Muon_Px: [9.23, -5.79], Muon_Py: [40.6, -30.3], Muon_Pz: [...], ...},
{Muon_Px: [12.5, 29.5], Muon_Py: [-42.5, -4.45], Muon_Pz: [...], ...},
{Muon_Px: [34.9], Muon_Py: [-16], Muon_Pz: [156], Muon_E: [160], ...},
{Muon_Px: [-53.2, 11.5], Muon_Py: [92, -4.42], Muon_Pz: [...], ...},
{Muon_Px: [-67, -18.1], Muon_Py: [53.2, -35.1], Muon_Pz: [...], ...},
...,
{Muon_Px: [14.9], Muon_Py: [32], Muon_Pz: [-156], Muon_E: [160], ...},
{Muon_Px: [-24.2], Muon_Py: [-35], Muon_Pz: [-19.2], Muon_E: [46.7], ...},
{Muon_Px: [-9.2], Muon_Py: [-42.2], Muon_Pz: [-64.3], Muon_E: [77.4], ...},
{Muon_Px: [34.5, -31.6], Muon_Py: [28.8, -10.4], Muon_Pz: [...], ...},
{Muon_Px: [-39.3], Muon_Py: [-14.6], Muon_Pz: [61.7], Muon_E: [74.6], ...},
{Muon_Px: [35.1], Muon_Py: [-14.2], Muon_Pz: [161], Muon_E: [165], ...},
{Muon_Px: [-29.8], Muon_Py: [-15.3], Muon_Pz: [-52.7], Muon_E: [62.4], ...},
{Muon_Px: [1.14], Muon_Py: [63.6], Muon_Pz: [162], Muon_E: [174], ...},
{Muon_Px: [23.9], Muon_Py: [-35.7], Muon_Pz: [54.7], Muon_E: [69.6], ...}] and with >>> tree.arrays(filter_name=["Electron_*", "Muon_*"], how="zip").show(type=True)
type: 2421 * {
Muon: var * {
Px: float32,
Py: float32,
Pz: float32,
E: float32,
Charge: int32,
Iso: float32
},
Electron: var * {
Px: float32,
Py: float32,
Pz: float32,
E: float32,
Charge: int32,
Iso: float32
}
}
[{Muon: [{Px: -52.9, Py: ..., ...}, ...], Electron: []},
{Muon: [{Px: -0.816, Py: -24.4, ...}], Electron: []},
{Muon: [{Px: 49, Py: -21.7, ...}, ...], Electron: []},
{Muon: [{Px: 22.1, Py: -85.8, ...}, ...], Electron: []},
{Muon: [{Px: 45.2, Py: 67.2, ...}, ...], Electron: [...]},
{Muon: [{Px: 9.23, Py: 40.6, ...}, ...], Electron: []},
{Muon: [{Px: 12.5, Py: -42.5, ...}, ...], Electron: []},
{Muon: [{Px: 34.9, Py: -16, ...}], Electron: []},
{Muon: [{Px: -53.2, Py: 92, ...}, ...], Electron: []},
{Muon: [{Px: -67, Py: 53.2, ...}, ...], Electron: []},
...,
{Muon: [{Px: 14.9, Py: 32, ...}], Electron: []},
{Muon: [{Px: -24.2, Py: -35, ...}], Electron: []},
{Muon: [{Px: -9.2, Py: -42.2, ...}], Electron: []},
{Muon: [{Px: 34.5, Py: 28.8, ...}, ...], Electron: []},
{Muon: [{Px: -39.3, Py: -14.6, ...}], Electron: []},
{Muon: [{Px: 35.1, Py: -14.2, ...}], Electron: []},
{Muon: [{Px: -29.8, Py: -15.3, ...}], Electron: []},
{Muon: [{Px: 1.14, Py: 63.6, ...}], Electron: []},
{Muon: [{Px: 23.9, Py: -35.7, ...}], Electron: []}] That's nice: it noticed that some branches have compatible list lengths and it made a nested structure that grouped the two equivalence classes. ( I'm sure it's not using the names to determine the groupings, and if you have any branch that is accidentally more filtered than the ones that it's supposed to be grouped with, it will identify that branch as another group. That can be one way that the results can be unexpected. But finally, these are not going to be good data structures for converting data into HDF5. HDF5 can't represent the hierarchical nesting within an array that ak.zip deliberately creates. (HDF5's hierarchy is for groups of different arrays.) To get data into an HDF5 file, you don't want to zip them together, you want to ak.unzip them apart. >>> arrays = tree.arrays(filter_name=["Electron_*", "Muon_*"], how=dict)
>>> type(arrays)
<class 'dict'>
>>> arrays.keys()
dict_keys(['Muon_Px', 'Muon_Py', 'Muon_Pz', 'Muon_E', 'Muon_Charge', 'Muon_Iso', 'Electron_Px', 'Electron_Py', 'Electron_Pz', 'Electron_E', 'Electron_Charge', 'Electron_Iso'])
>>> arrays["Muon_Px"]
<Array [[-52.9, 37.7], [-0.816], ..., [23.9]] type='2421 * var * float32'>
>>> arrays["Muon_Py"]
<Array [[-11.7, 0.693], [-24.4], ..., [-35.7]] type='2421 * var * float32'>
>>> arrays["Electron_Px"]
<Array [[], [], [], [], [...], ..., [], [], [], []] type='2421 * var * float32'> And then you have to find some way to flatten them for HDF5. (I'm assuming that you won't be using HDF5's >>> muon_px = ak.flatten(arrays["Muon_Px"])
>>> nmuon = ak.num(arrays["Muon_Px"])
>>> muon_px
<Array [-52.9, 37.7, -0.816, 49, ..., -29.8, 1.14, 23.9] type='3825 * float32'>
>>> nmuon
<Array [2, 1, 2, 2, 2, 2, 2, 1, ..., 1, 2, 1, 1, 1, 1, 1] type='2421 * int64'> because then you could get the ragged shape back with >>> ak.unflatten(muon_px, nmuon)
<Array [[-52.9, 37.7], [-0.816], ..., [23.9]] type='2421 * var * float32'> But if padding is better for your application, you could >>> ak.to_numpy(ak.fill_none(ak.pad_none(arrays["Muon_Px"], np.max(nmuon)), np.nan))
array([[-52.89945602, 37.73778152, nan, nan],
[ -0.81645936, nan, nan, nan],
[ 48.98783112, 0.82756668, nan, nan],
...,
[-29.75678635, nan, nan, nan],
[ 1.14186978, nan, nan, nan],
[ 23.9132061 , nan, nan, nan]]) Or hard-code a padding length, perhaps even clipping the lists that are too long. (See ak.pad_none.) |
Beta Was this translation helpful? Give feedback.
I had forgotten about
how="zip"
, and I didn't find it in any documentation: it's not in uproot.TTree.arrays or uproot.interpretation.library.Awkward, but I found the implementation here:uproot5/src/uproot/interpretation/library.py
Lines 631 to 715 in 94c085b