Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-second data with various frequencies #72

Open
lhtfb opened this issue Jun 3, 2024 · 8 comments
Open

Sub-second data with various frequencies #72

lhtfb opened this issue Jun 3, 2024 · 8 comments

Comments

@lhtfb
Copy link

lhtfb commented Jun 3, 2024

Hi! Thank you a lot for your work!
I have a small question regarding lag frequencies.
https://ts.gluon.ai/v0.11.x/_modules/gluonts/time_feature/lag.html#get_lags_for_frequency

As I understand this article, the smallest frequency is 1 second. However, is it possible to go into the millisecond range by any chance?

Additionally, what should I do if my data is not constant, like it has lags of 0.3s, 0.5s, 1.2s, 0.5s, 0.8s, etc.? Rearranging data to a constant lag might significantly reduce quality.

@ashok-arjun
Copy link
Contributor

ashok-arjun commented Jun 16, 2024

Hi @lhtfb ! Apologies for the delayed response.

  1. Yes, it is possible to construct lags for higher frequencies (millisecond/nanosecond), but since gluonts does not support it, you would have to write a custom wrapper to use it for those ranges. Note that to construct lags based on second frequency, gluon uses the following code as in the link you provide:
    elif offset_name == "S":
        lags = (
            _make_lags_for_second(offset.n)
            + _make_lags_for_minute(offset.n / 60)
            + _make_lags_for_hour(offset.n / (60 * 60))
        )

which is basically constructing lags for the "second" seasonality (60 seconds, 120 seconds etc.), "minute" seasonality lags (60 minutes, 120 minutes etc. but returning the appropriate lag index for the "second-frequency" data), hourly seasonality lags (24 hours, 48 hours etc. but returning the appropriate lag index for the "second-frequency" data). What you really need is the "second frequency" lags, but Gluon gives you long-term lags also based on other seasonalitites which might exist in your data. You might or might not stick with this approach when writing the wrapper for the millisecond frequency.

You might want to write something like this (I have not tested this):

    def _make_lags_for_millisecond(multiple, num_cycles=3):
        return [
            _make_lags(k * 1000 // multiple, 1) for k in range(1, num_cycles + 1) # You may change the second argument of _make_lags if you'd like
        ]

    elif offset_name == "L":
        # "L" represents millisecond
        lags = (
            _make_lags_for_millisecond(offset.n)
            + _make_lags_for_second(offset.n / 1000) # This is optional though as I said
            + _make_lags_for_minute(offset.n / (1000 * 60)) # This is optional though as I said
        )
  1. Even when your data is not regular, you can use lag-llama as is, and check performance. The lags will not be as meaningful anymore as they assume regularity. In case it does not work, we do not support getting the lags for irregular data for now :(

@ashok-arjun
Copy link
Contributor

Hi @lhtfb! Just following up on this issue, if it's resolved / I can close it.

@yitaochen
Copy link

yitaochen commented Sep 10, 2024

Hi @ashok-arjun, I find estimator.train() will complain about the data spacing is not uniform, any idea of how to work around it?

/lib/python3.12/site-packages/gluonts/dataset/pandas.py:180, in PandasDataset._pair_to_dataentry(self, item_id, df)
    177     df.sort_index(inplace=True)
    179 if not self.unchecked:
--> 180     assert is_uniform(df.index), (
    181         "Dataframe index is not uniformly spaced. "
    182         "If your dataframe contains data from multiple series in the "
    183         'same column ("long" format), consider constructing the '
    184         "dataset with `PandasDataset.from_long_dataframe` instead."
    185     )
    187 entry = {
    188     "start": df.index[0],
    189 }
    191 target = df[self.target].values

AssertionError: Dataframe index is not uniformly spaced. If your dataframe contains data from multiple series in the same column ("long" format), consider constructing the dataset with `PandasDataset.from_long_dataframe` instead.

update: I find I can set unchecked = True to avoid the error.

@yitaochen
Copy link

yitaochen commented Sep 10, 2024

Hi @ashok-arjun, one additional question: we have to change time feature from "S" to "L" right?

    def create_transformation(self) -> Transformation:
        if self.time_feat:
            return Chain(
                [
                    AddTimeFeatures(
                        start_field=FieldName.START,
                        target_field=FieldName.TARGET,
                        output_field=FieldName.FEAT_TIME,
                        time_features=time_features_from_frequency_str("S"),
                        pred_length=self.prediction_length,
                    ),
                    AddObservedValuesIndicator(
                        target_field=FieldName.TARGET,
                        output_field=FieldName.OBSERVED_VALUES,
                        imputation_method=DummyValueImputation(0.0),
                    ),
                ]
            )
        else:
            return Chain(
                [
                    AddObservedValuesIndicator(
                        target_field=FieldName.TARGET,
                        output_field=FieldName.OBSERVED_VALUES,
                        imputation_method=DummyValueImputation(0.0),
                    ),
                ]
            )

@ashok-arjun
Copy link
Contributor

That's for time features. It won't work if you change it when you use the pretrained model (it's trained to use S). If you are training your own model, you may set it to the appropriate freq.

@yitaochen
Copy link

That's for time features. It won't work if you change it when you use the pretrained model (it's trained to use S). If you are training your own model, you may set it to the appropriate freq.

Thank you for the response. I'm training my own model with milliseconds as base freq. I have one more question and it'd be much appreciated if you could tell me how the time features are used, are they turned into embeddings and combined with input in an additive way like positional encodings (and on top of positional encodings)?

@ashok-arjun
Copy link
Contributor

They're concatenated with the input (lags) before the input is passed to the transformer - see https://github.com/time-series-foundation-models/lag-llama/blob/main/lag_llama/model/module.py#L537

@yitaochen
Copy link

I see, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants