You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used same 29 basins from the Changdian region across two datasets to train the LSTM model. One dataset is named FD_source, and the other contains only the 29 Changdian basins. Both datasets share the same time periods, basin IDs, and variables.
But I discovered an issue: the scaler used for the "streamflow" variable (in the dapengscaler_stat.json) differed between the two datasets. To investigate further, I conducted the following experiments:
Running the LSTM model with 29 basins from both datasets produced different results.
Running the model with 28 basins, excluding changdian_95350, resulted in consistent scalers and similar outcomes across the datasets.
Upon deeper analysis, I found that the data is stored in NC files, with basins grouped in batches of 100 basins. However, changdian_95350 is stored in a separate file, while the other basins are grouped together in the same file. This discrepancy likely caused changes in the order of watersheds when reading the data, which, in turn, affected the normalization process due to differences in statistical calculations.
This finding highlights a potential issue with data handling that could introduce errors in the model's results. Below are some of the observed results from the experiments.
The text was updated successfully, but these errors were encountered:
Yes, this is a bug when reading data using SelfMadeHydrodataset class in hydrodatasource package. Basin IDs should sort basins, or some may be put in the wrong indices. I will release a new version for hydrodatasource
I used same 29 basins from the Changdian region across two datasets to train the LSTM model. One dataset is named FD_source, and the other contains only the 29 Changdian basins. Both datasets share the same time periods, basin IDs, and variables.
But I discovered an issue: the scaler used for the "streamflow" variable (in the dapengscaler_stat.json) differed between the two datasets. To investigate further, I conducted the following experiments:
Upon deeper analysis, I found that the data is stored in NC files, with basins grouped in batches of 100 basins. However, changdian_95350 is stored in a separate file, while the other basins are grouped together in the same file. This discrepancy likely caused changes in the order of watersheds when reading the data, which, in turn, affected the normalization process due to differences in statistical calculations.
This finding highlights a potential issue with data handling that could introduce errors in the model's results. Below are some of the observed results from the experiments.
The text was updated successfully, but these errors were encountered: