You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of the goal of this package is to encode a dataset as a one dimensional vector with a consistent size. To do that, we use the profile_distribution function on any metafeatures that return a sequence of values (e.g. means of numeric features) in order to flatten it to a consistent shape.
Currently, profile_distribution has a rigid set of summarization functions it computes every time no matter what. It would be nice to refactor this into a more flexible summarization function that allows only subset of summary measures to be computed, or possibly to have custom summary functions passed in.
This would possibly include rethinking the naming scheme for our metafeatures and the structure of the computation in order to allow an arbitrary number of summaries to be computed on a given metafeature. This could follow more closely with our current method of including the summary as a prefix to the metafeature (e.g. MeanMeansOfNumericFeatures, SumMeansOfNumericFeatures) or we could move closer to the D3M way of including the summary as an extension (e.g. MeansOfNumericFeatures.mean). The second way could also more naturally allow several chained operations to be clearly indicated (e.g. NumericFeatures.entropy.mean).
The text was updated successfully, but these errors were encountered:
Part of the goal of this package is to encode a dataset as a one dimensional vector with a consistent size. To do that, we use the
profile_distribution
function on any metafeatures that return a sequence of values (e.g. means of numeric features) in order to flatten it to a consistent shape.Currently,
profile_distribution
has a rigid set of summarization functions it computes every time no matter what. It would be nice to refactor this into a more flexible summarization function that allows only subset of summary measures to be computed, or possibly to have custom summary functions passed in.This would possibly include rethinking the naming scheme for our metafeatures and the structure of the computation in order to allow an arbitrary number of summaries to be computed on a given metafeature. This could follow more closely with our current method of including the summary as a prefix to the metafeature (e.g. MeanMeansOfNumericFeatures, SumMeansOfNumericFeatures) or we could move closer to the D3M way of including the summary as an extension (e.g. MeansOfNumericFeatures.mean). The second way could also more naturally allow several chained operations to be clearly indicated (e.g. NumericFeatures.entropy.mean).
The text was updated successfully, but these errors were encountered: