Maybe it's about some errors during dataset reading #12

xxcj-cyk · 2024-12-28T16:11:31Z

I used same 29 basins from the Changdian region across two datasets to train the LSTM model. One dataset is named FD_source, and the other contains only the 29 Changdian basins. Both datasets share the same time periods, basin IDs, and variables.

But I discovered an issue: the scaler used for the "streamflow" variable (in the dapengscaler_stat.json) differed between the two datasets. To investigate further, I conducted the following experiments:

Running the LSTM model with 29 basins from both datasets produced different results.
Running the model with 28 basins, excluding changdian_95350, resulted in consistent scalers and similar outcomes across the datasets.

Upon deeper analysis, I found that the data is stored in NC files, with basins grouped in batches of 100 basins. However, changdian_95350 is stored in a separate file, while the other basins are grouped together in the same file. This discrepancy likely caused changes in the order of watersheds when reading the data, which, in turn, affected the normalization process due to differences in statistical calculations.

This finding highlights a potential issue with data handling that could introduce errors in the model's results. Below are some of the observed results from the experiments.

OuyangWenyu · 2024-12-29T07:39:20Z

Yes, this is a bug when reading data using SelfMadeHydrodataset class in hydrodatasource package. Basin IDs should sort basins, or some may be put in the wrong indices. I will release a new version for hydrodatasource

xxcj-cyk added the bug Something isn't working label Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe it's about some errors during dataset reading #12

Maybe it's about some errors during dataset reading #12

xxcj-cyk commented Dec 28, 2024 •

edited

Loading

OuyangWenyu commented Dec 29, 2024

Maybe it's about some errors during dataset reading #12

Maybe it's about some errors during dataset reading #12

Comments

xxcj-cyk commented Dec 28, 2024 • edited Loading

OuyangWenyu commented Dec 29, 2024

xxcj-cyk commented Dec 28, 2024 •

edited

Loading