Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Mr-Geekman · 2023-08-14T15:39:30Z

Issue by GooseIt
Wednesday Feb 08, 2023 at 15:55 GMT
Originally opened as tinkoff-ai#1106

🚀 Feature Request

It may be useful to impose the same order on both the return dataframe of TSDataset.to_dataset() and the dataframe df constructed during TSDataset.__init__() as the order imposed on the return dataframe of TSDataset.to_flatten() for the sake of consistency.
Current order of columns in both the return dataframe of TSDataset._to_dataset() and TSDataset.df places "target" along other features in alphabetical order, while order of columns in the return dataframe of TSDataset.to_flatten() places "target" after "timestamp" and "segment" and prior to other features in alphabetical order.
The order after TSDataset.to_flatten() makes observing "target" value more convenient (as it is not hidden among many other features) and emphasises its special role.

Proposal

I propose the following order of columns:

timestamp,
segment,
target,
other columns in alphabetical order.

How it can be done for TSDataset.to_dataset():

Find line df_copy = df_copy.pivot(index="timestamp", columns="segment") in etna.datasets.tsdataset.py
Prior to it reorder columns of df_copy in a way that puts "target" prior to other features, if said "target" is provided. It should look like feature_columns.remove("target") and in the next line df_copy = df_copy[["timestamp, "segment", "target"] + feature_columns]

How it can be done for TSDataset.__init__():

Find line df = pd.concat((df, self.df_exog), axis=1).loc[df.index].sort_index(axis=1, level=(0, 1)) in etna.datasets.tsdataset.py
Correct it in a way that puts "target" before other columns, still sorted in alphabetical order.

Test cases

Fix doctest of TSDataset.to_dataset().
Make sure current tests pass.
Add tests on order of columns for both modified methods to etna.tests.test_datasets.test_dataset.py:

test_to_dataset_correct_column_order for TSDataset.to_dataset()
test_init_with_exog_correct_column_order for TSDataset.__init__() with df_exog != None

Additional context

See issue tinkoff-ai#873 for similar issue for TSDataset.to_flatten()

The text was updated successfully, but these errors were encountered:

Mr-Geekman added the enhancement New feature or request label Aug 14, 2023

etna-team locked and limited conversation to collaborators May 30, 2024

d-a-bunin converted this issue into discussion #370 May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Mr-Geekman commented Aug 14, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Fix order of columns after TSDataset.to_pandas() and after TSDataset.__init__() #25

Fix order of columns after TSDataset.to_pandas() and after TSDataset.__init__() #25

Comments

Mr-Geekman commented Aug 14, 2023 • edited Loading

🚀 Feature Request

Proposal

Test cases

Additional context

This issue was moved to a discussion.

Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Mr-Geekman commented Aug 14, 2023 •

edited

Loading