Summarization functions #194

emrysshevek · 2019-08-08T17:13:04Z

Part of the goal of this package is to encode a dataset as a one dimensional vector with a consistent size. To do that, we use the profile_distribution function on any metafeatures that return a sequence of values (e.g. means of numeric features) in order to flatten it to a consistent shape.

Currently, profile_distribution has a rigid set of summarization functions it computes every time no matter what. It would be nice to refactor this into a more flexible summarization function that allows only subset of summary measures to be computed, or possibly to have custom summary functions passed in.

This would possibly include rethinking the naming scheme for our metafeatures and the structure of the computation in order to allow an arbitrary number of summaries to be computed on a given metafeature. This could follow more closely with our current method of including the summary as a prefix to the metafeature (e.g. MeanMeansOfNumericFeatures, SumMeansOfNumericFeatures) or we could move closer to the D3M way of including the summary as an extension (e.g. MeansOfNumericFeatures.mean). The second way could also more naturally allow several chained operations to be clearly indicated (e.g. NumericFeatures.entropy.mean).

The text was updated successfully, but these errors were encountered:

emrysshevek added discussion new feature new metafeatures labels Aug 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summarization functions #194

Summarization functions #194

emrysshevek commented Aug 8, 2019

Summarization functions #194

Summarization functions #194

Comments

emrysshevek commented Aug 8, 2019