Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe it's about some errors during dataset reading #12

Open
xxcj-cyk opened this issue Dec 28, 2024 · 1 comment
Open

Maybe it's about some errors during dataset reading #12

xxcj-cyk opened this issue Dec 28, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@xxcj-cyk
Copy link
Contributor

xxcj-cyk commented Dec 28, 2024

I used same 29 basins from the Changdian region across two datasets to train the LSTM model. One dataset is named FD_source, and the other contains only the 29 Changdian basins. Both datasets share the same time periods, basin IDs, and variables.

But I discovered an issue: the scaler used for the "streamflow" variable (in the dapengscaler_stat.json) differed between the two datasets. To investigate further, I conducted the following experiments:


  1. Running the LSTM model with 29 basins from both datasets produced different results.
  2. Running the model with 28 basins, excluding changdian_95350, resulted in consistent scalers and similar outcomes across the datasets.
    
    Upon deeper analysis, I found that the data is stored in NC files, with basins grouped in batches of 100 basins. However, changdian_95350 is stored in a separate file, while the other basins are grouped together in the same file. This discrepancy likely caused changes in the order of watersheds when reading the data, which, in turn, affected the normalization process due to differences in statistical calculations.
    
    This finding highlights a potential issue with data handling that could introduce errors in the model's results. Below are some of the observed results from the experiments.

image
image
image

@xxcj-cyk xxcj-cyk added the bug Something isn't working label Dec 28, 2024
@OuyangWenyu
Copy link
Owner

Yes, this is a bug when reading data using SelfMadeHydrodataset class in hydrodatasource package. Basin IDs should sort basins, or some may be put in the wrong indices. I will release a new version for hydrodatasource

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants