-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] "group by" for cat axes / index based slicing #211
Comments
@henryiii is there currently no way to do this even manually by assignment due to https://github.com/scikit-hep/boost-histogram/blob/6f4813f1e4b326ca14074b739f2214383f46bec6/src/boost_histogram/_internal/hist.py#L819 ? |
|
PS: Assuming this is for unordered axes only. |
PS: The issue that got opened and fixed in Boost.Histogram was for slicing on categorical axes, which enabled |
Ok thanks. In case anyone stumbles here. This seems to be the workaround, thanks @henryiii
|
Seems like a nice feature for boost-histogram. |
Related issue. This |
boost-histogram doesn't have named axes, so it wouldn't be as pretty, and would need another layer of wrapping in Hist anyway, just like fill, project, ... (not against it, but probably best to implement it here first)
How would it know what entries to add?
This is almost implementable on top of scikit-hep/boost-histogram#576, save for the caveats mentioned there. |
Admittedly I didn't think about it too deeply, but it could just pad zeros to the dimensions along the missing categorical entries? Should be equivalent to adding two histograms where the growth/cat axis has different entries? |
Since we know the new axis elements already (the dictionary keys) I think we could have a workaround without growth as follows: import hist
def group(h: hist.Hist, oldname: str, newname: str, grouping: dict[str, list[str]]):
hnew = hist.Hist(
hist.axis.StrCategory(grouping, name=newname),
*(ax for ax in h.axes if ax.name != oldname),
storage=h._storage_type,
)
for i, indices in enumerate(grouping.values()):
hnew.view(flow=True)[i] = h[{oldname: indices}][{oldname: sum}].view(flow=True)
return hnew Note that the new axis is put at the beginning (for convenience in implementation). I couldn't find a public accessor for the storage type though. An example
returning
|
A small update to my previous comment: the workaround now needs |
Can't you use |
Oops, guess it exists now! |
I don't think this is currently implemented, but would be super useful, allowing to merge samples that were processed separately.
I am imagining syntax like:
h[{'category: {'merged': ['sampleA', 'sampleB'], ...}}]
Also I thought
would work, but it doesn't seem to be possibly currently.
The text was updated successfully, but these errors were encountered: