Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new mode - ts_augmentation #35

Open
DhavalRepo18 opened this issue Jul 25, 2023 · 11 comments
Open

Create new mode - ts_augmentation #35

DhavalRepo18 opened this issue Jul 25, 2023 · 11 comments
Labels
comp: GutenTAG (general) enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed 🏅 very low MoSCoW: Wont-have (Would-have)

Comments

@DhavalRepo18
Copy link

DhavalRepo18 commented Jul 25, 2023

We are user of this repo to create time series. We like to introduce new mode on a top of supervised and semi-supervised, call "ts-augmentation' where we produce

  • time series (original)
  • same time series with anomaly (augmented time series)
  • label

We can provide a small code.

@SebastianSchmidl
Copy link
Member

Thank you for using our library! We are happy to discuss extensions to GutenTAG.

For me, your description of the output sounds exactly like our semi-supervised output mode:

  • training timeseries without anomalies ➕ test timeseries with anomalies
  • if you don't specify noise (variance) both timeseries (train and test) are equal despite the anomaly

Can you elaborate a bit further? What exactly is the input and output of the newly proposed mode, and why is it currently not supported by GutenTAG?

@SebastianSchmidl SebastianSchmidl added enhancement New feature or request question Further information is requested comp: GutenTAG (general) labels Jul 28, 2023
@DhavalRepo18
Copy link
Author

The variance was an issue when we tested. So we want to avoid touching that stuff. There is one fix with variance, i.e., if we pass the same seed, it will meet the need even with a variance setting.

@SebastianSchmidl
Copy link
Member

That is precisely how GutenTAG should behave.

I still don't understand your use case, though.

@DhavalRepo18
Copy link
Author

Our use case is very simple. We wanted to generate

  • a time series T
  • its modified version with planted anomaly (the place where the anomaly is planted will be changing rest of the stuff will remain the same)
  • the anomaly label.

We tried using semi-supervised, but when we added variance cases, it change the base time series as well as the anomalous time series. We like to have a separate mode where it works with variance stuff too.

https://github.com/HPI-Information-Systems/gutentag/blob/main/gutenTAG/generator/timeseries.py#L52

Code is passing the new random seed, and we like to use the same seed. this way even with variance it remains the same. We do have some code written that can be made available.

@SebastianSchmidl
Copy link
Member

OK. I get your point.

Currently, you can achieve this by calling GutenTAG twice:

  1. Generate your base time series (TS) and store to disk.
  2. Load generated TS using custom-input-BO and generate your augmented TS with anomalies etc. (semi-supervised=False and supervised=False)

But if you want to generate many such TS, it's quite tedious.

How do you propose to solve this?
It seems to me that the different seeding is the only thing that is preventing you from using the semi-supervised-mode. Is this true? If yes, then adding a third output-mode that just uses the same seed is everything we need to change.

@SebastianSchmidl SebastianSchmidl added 🏅 low MoSCoW: Could-have and removed question Further information is requested labels Jul 30, 2023
@sangy14
Copy link

sangy14 commented Aug 1, 2023

@CodeLionX I am also working with @DhavalRepo18. Yes, as mentioned by you different seeds were causing the issue. So we did the following to generate time-series and same time-series + anomalies.
image
So as mentioned by @DhavalRepo18, we would like to have one more mode which would allow us to do so.

@SebastianSchmidl
Copy link
Member

Introducing another mode might work. However, the modes are not mutually exclusive and can be used together. This means that we would need to generate an additional time series with the same contents as the test time series — or just copy it? This implies adding a new TrainingType and a new filename. In addition, we need to change many parts within GutenTAG, we break backwards compatibility, and we lose compatibility with TimeEval (TimeEval would not support the new learning type; I would rather not encourage training on this new data format as well). Seems to be many drawbacks with limited usability, IMHO.

I would propose the following: We add another setting key exact-train-bo: bool = False and ensure that independent of whether semi-supervised or supervised is enabled, the BO of the training time series and testing time series are the same if enabled. Only the anomalies would differ in this case.
This setting now applies to all existing learning types (TrainingType) and we can show a warning (training on this data might lead to overfitting / bad generalizability) if it is enabled. We still need to touch some parts in GutenTAG, but only additions in the config and input behavior while maintaining backwards compatibility.


Does this align with your requirements? Do you want to contribute such a feature?

@DhavalRepo18
Copy link
Author

DhavalRepo18 commented Aug 4, 2023

@CodeLionX We agree with your suggestion. But we can test it once the feature is available. Meanwhile, we used our internal hack as you rightly pointed out the same conclusion as we had (code modification is though).

@DhavalRepo18
Copy link
Author

@CodeLionX Thanking you for your help.

@DhavalRepo18
Copy link
Author

@CodeLionX pls feel free to close/update the code. We may use the initial solution when you generate two times.

@SebastianSchmidl
Copy link
Member

Dear @DhavalRepo18,

currently, I don't have the time to work on this feature. However, I'll leave this issue open because I don't see a reason to not implement this as proposed in #35 (comment).

If somebody wants to try implementing this, they are welcome to do so and I can offer my support.

Thanks.

@SebastianSchmidl SebastianSchmidl added good first issue Good for newcomers help wanted Extra attention is needed 🏅 very low MoSCoW: Wont-have (Would-have) and removed 🏅 low MoSCoW: Could-have labels Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: GutenTAG (general) enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed 🏅 very low MoSCoW: Wont-have (Would-have)
Projects
None yet
Development

No branches or pull requests

3 participants