Function: extract_relevant_features: throws AssertionError: X and y must contain the same number of samples. #945

lthiess8 · 2022-05-21T17:43:13Z

Hi, I get an assertion error when using the fuction extract_relevant_features().
When I print len(X) and len(y), I get the same values.

Python version: 3.8.5
tsfresh version: 0.19.0
Install method (conda, pip, source): pip

Thanks in advance!

36965
36965
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-59dfec12df74> in <module>
     24     print(len(df))
     25     print(len(target))
---> 26     extracted_relevant_features = extract_relevant_features(df, target, column_id='abgang', column_sort='time',  column_value = 'values', default_fc_parameters=EfficientFCParameters(), ml_task='classification')
     27     extracted_features = extract_features(df, column_id='abgang', column_sort='time',  column_value = 'values', default_fc_parameters=EfficientFCParameters(),n_jobs=8, disable_progressbar=True)
     28 

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tsfresh/convenience/relevant_extraction.py in extract_relevant_features(timeseries_container, y, X, default_fc_parameters, kind_to_fc_parameters, column_id, column_sort, column_kind, column_value, show_warnings, disable_progressbar, profile, profiling_filename, profiling_sorting, test_for_binary_target_binary_feature, test_for_binary_target_real_feature, test_for_real_target_binary_feature, test_for_real_target_real_feature, fdr_level, hypotheses_independent, n_jobs, distributor, chunksize, ml_task)
    198     )
    199 
--> 200     X_sel = select_features(
    201         X_ext,
    202         y,

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tsfresh/feature_selection/selection.py in select_features(X, y, test_for_binary_target_binary_feature, test_for_binary_target_real_feature, test_for_real_target_binary_feature, test_for_real_target_real_feature, fdr_level, hypotheses_independent, n_jobs, show_warnings, chunksize, ml_task, multiclass, n_significant)
    152     )
    153     assert len(y) > 1, "y must contain at least two samples."
--> 154     assert len(X) == len(y), "X and y must contain the same number of samples."
    155     assert (
    156         len(set(y)) > 1

AssertionError: X and y must contain the same number of samples.

The text was updated successfully, but these errors were encountered:

CelieDs · 2022-06-20T11:44:42Z

Hello! I encountered the same issue, did you manage to find a solution?
Thanks in advance

lthiess8 · 2022-06-21T16:29:08Z

Hello @CelieDs,

for some reason the indices of X and y did not match.
This notebook helped me to find the solution:
https://github.com/blue-yonder/tsfresh/blob/main/notebooks/advanced/05%20Timeseries%20Forecasting%20(multiple%20ids).ipynb

when i changed the code to the following, it worked for me:

target = df_melted.set_index("time").sort_index().label

target = target[target.index.isin(extracted_features.index)]
extracted_features = extracted_features[extracted_features.index.isin(target.index)]

features_selected = select_features(extracted_features, target, ml_task='classification')

lthiess8 added the bug label May 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function: extract_relevant_features: throws AssertionError: X and y must contain the same number of samples. #945

Function: extract_relevant_features: throws AssertionError: X and y must contain the same number of samples. #945

lthiess8 commented May 21, 2022

CelieDs commented Jun 20, 2022

lthiess8 commented Jun 21, 2022

Function: extract_relevant_features: throws AssertionError: X and y must contain the same number of samples. #945

Function: extract_relevant_features: throws AssertionError: X and y must contain the same number of samples. #945

Comments

lthiess8 commented May 21, 2022

CelieDs commented Jun 20, 2022

lthiess8 commented Jun 21, 2022