Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarization functions #194

Open
emrysshevek opened this issue Aug 8, 2019 · 0 comments
Open

Summarization functions #194

emrysshevek opened this issue Aug 8, 2019 · 0 comments

Comments

@emrysshevek
Copy link
Contributor

Part of the goal of this package is to encode a dataset as a one dimensional vector with a consistent size. To do that, we use the profile_distribution function on any metafeatures that return a sequence of values (e.g. means of numeric features) in order to flatten it to a consistent shape.

Currently, profile_distribution has a rigid set of summarization functions it computes every time no matter what. It would be nice to refactor this into a more flexible summarization function that allows only subset of summary measures to be computed, or possibly to have custom summary functions passed in.

This would possibly include rethinking the naming scheme for our metafeatures and the structure of the computation in order to allow an arbitrary number of summaries to be computed on a given metafeature. This could follow more closely with our current method of including the summary as a prefix to the metafeature (e.g. MeanMeansOfNumericFeatures, SumMeansOfNumericFeatures) or we could move closer to the D3M way of including the summary as an extension (e.g. MeansOfNumericFeatures.mean). The second way could also more naturally allow several chained operations to be clearly indicated (e.g. NumericFeatures.entropy.mean).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant