-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
you are prepocessing test dataset before implementing the model #13
Comments
Not to speak for the developer, but you are aware that that is how you train an AI model right? You can preprocess the dataset and then use the model when you run it on a live sample. |
In general I would say that if you preprocess your dataset and then you split into train/test, you train the model and you check the results in the test part, then you are making a mistake. Because you assume to have knowledge of the future in order to preprocess the whole train/test dataset. I was thinking this is the case, but I am not sure anymore, I need to check the code again and I don't have time right now. |
@mg64ve I am looking into this exact issue in implementing the WSAE-LSTM model, which uses the wavelet transform to denoise data (Bao et al., 2017): My implementation is a work in progress/currently vastly incomplete, but my understanding so far is that you cannot apply the wavelet transform to the entire dataset in one pass - but you can arrange the data in a continuous fashion in a clearly defined train-validate-test split that appears to mostly sidestep this issue. From Bao et al. (2017) defining the train-validate-test split arrangement for continuous training (Fig 7): |
@timothyyu absolutely right! You should apply wavelet and any kind of preprocessing separately on train and test dataset. Recurrent Neural Networks for Financial Time-Series Modelling / Gavin Tsang; Jingjing Deng; Xianghua Xie It has some interesting concepts. |
ok @timothyyu , does this example come from your code? |
@mg64ve yes, this is from my own code. I have an updated implementation of the above (scaling is done on the train set, and then applied to the validate and test set per period/interval, and then the wavelet transform is applied to each train-validate-test split individually): |
@mg64ve here is an updated version of the above that clearly illustrates the
|
Hi @timothyyu thanks for your reply, let me check the code. |
Scaling is done with ddi_scaled[index_name][intervals from 1-24][1-train,2-validate,3-test]
|
Hi @timothyyu , I had a look to autoencoder.py and model.py. You basically don't use embedding. |
Hi, thank you for sharing your work and it's interesting. I am looking at the codes, but there were always some errors in generating results for stocks(I see well in FX rate). I would like to compare my results with yours for AAPL. Could you also present a predicted log return vs historical log return for AAPL for the most three years or one years if possible? Thank you very much! |
Why Line 54 in 18a58ef
|
@az13js I believe it is because log return it is used during preprocessing |
yeah, you should first split, then preprocess |
This means that you assume to know the future!
This would never work.
Regards.
The text was updated successfully, but these errors were encountered: