-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with roll_by family #137
Comments
While the |
The main issue I’m concerning is the mable object once modeling the slided tsibble, model can’t be lazily evaluated. |
I talked to Hadley about this. He thinks it's a potentially good approach, more like a lazily evaluated grouped data frame. He also mentioned the tidymodels bootstrapping approach https://davisvaughan.github.io/strapgod/articles/dplyr-support.html |
That sounds promising. |
While I think this functionality is useful, I want to wait to see more developments with slurrr (#143) before working on this further. The original motivating issue with the window "id" being part of the key structure has now been resolved in fabletools by supporting disjoint hierarchies. Further, if slurrr is the package location going further, this functionality should belong in that package. Maybe @DavisVaughan can consider it's implementation/use, and tsibble can later add a |
oh, i think slide_by should stay in tsibble, no conflicts with slurrr.
…On Thu, 15 Aug 2019 at 16:31, mitchelloharawild ***@***.***> wrote:
While I think this functionality is useful, I want to wait to see more
developments with slurrr (#143
<#143>) before working on this
further.
The original motivating issue with the window "id" being part of the key
structure has now been resolved in fabletools by supporting disjoint
hierarchies.
Further, if slurrr is the package location going further, this
functionality should belong in that package. Maybe @DavisVaughan
<https://github.com/DavisVaughan> can consider it's implementation/use,
and tsibble can later add a tbl_ts method that supports time aware
sliding inputs.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#137?email_source=notifications&email_token=AASXVGE6RBGVGM7EEBPUILTQETZ2NA5CNFSM4IHAT4LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4K67YA#issuecomment-521531360>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASXVGFXDY7VRWQAOYWLZKLQETZ2NANCNFSM4IHAT4LA>
.
|
Oooh I had an idea for this too yesterday. It does use the lazy grouping feature of dplyr, but would require a new data frame subclass. I'm not quite convinced that it would be worth it over a version of I think it would be pretty hard (not impossible) to get another df subclass that would be able to 1) add and remove virtual groups correctly 2) be able to subclass on top of tsibble or a data frame, and know how to drop back to a pure tsibble after some kind of summary operation where the virtual groups are materialized. I think its much "easier" to let This is just a partial example of how it might look. > pedestrian_small
# A tsibble: 240 x 6 [1h] <Australia/Melbourne>
# Key: Sensor [1]
Sensor Date_Time Date Time Count ym
<chr> <dttm> <date> <int> <int> <mth>
1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630 2015 Jan
2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826 2015 Jan
3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567 2015 Jan
4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264 2015 Jan
5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139 2015 Jan
6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77 2015 Jan
7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44 2015 Jan
8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56 2015 Jan
9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113 2015 Jan
10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166 2015 Jan
# … with 230 more rows
>
> unique(pedestrian_small$ym)
[1] "2015 Jan" "2015 Feb" "2015 Mar" "2015 Apr" "2015 May" "2015 Jun" "2015 Jul" "2015 Aug" "2015 Sep" "2015 Oct"
[11] "2015 Nov" "2015 Dec" "2016 Jan" "2016 Feb" "2016 Mar" "2016 Apr" "2016 May" "2016 Jun" "2016 Jul" "2016 Aug"
[21] "2016 Sep" "2016 Oct" "2016 Nov" "2016 Dec"
>
> pedestrian_small_slide_grouped <- slide_by(pedestrian_small, ym, before = 1, after = 1, step = 1)
> pedestrian_small_slide_grouped
# A tibble: 240 x 6
# Groups: .start, .stop [24]
Sensor Date_Time Date Time Count ym
<chr> <dttm> <date> <int> <int> <mth>
1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630 2015 Jan
2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826 2015 Jan
3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567 2015 Jan
4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264 2015 Jan
5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139 2015 Jan
6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77 2015 Jan
7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44 2015 Jan
8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56 2015 Jan
9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113 2015 Jan
10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166 2015 Jan
# … with 230 more rows
>
> pedestrian_small_slide_grouped %>%
+ summarise(
+ ym_start = first(ym),
+ ym_end = last(ym),
+ MonthlyCount = mean(Count)
+ )
# A tibble: 24 x 5
# Groups: .start [23]
.start .stop ym_start ym_end MonthlyCount
<date> <date> <mth> <mth> <dbl>
1 2015-01-01 2015-02-01 2015 Jan 2015 Feb 227.
2 2015-01-01 2015-03-01 2015 Jan 2015 Mar 180.
3 2015-02-01 2015-04-01 2015 Feb 2015 Apr 112.
4 2015-03-01 2015-05-01 2015 Mar 2015 May 147.
5 2015-04-01 2015-06-01 2015 Apr 2015 Jun 165.
6 2015-05-01 2015-07-01 2015 May 2015 Jul 160.
7 2015-06-01 2015-08-01 2015 Jun 2015 Aug 120.
8 2015-07-01 2015-09-01 2015 Jul 2015 Sep 130.
9 2015-08-01 2015-10-01 2015 Aug 2015 Oct 141
10 2015-09-01 2015-11-01 2015 Sep 2015 Nov 168.
# … with 14 more rows
>
> pedestrian_small_slide_along <- mutate(
+ pedestrian_small,
+ col = slide_along_impl(Count, ~mean(.x), .index = ym, .ptype = dbl(), .before = 1, .after = 1)
+ )
>
> pedestrian_small_slide_along
# A tsibble: 240 x 7 [1h] <Australia/Melbourne>
# Key: Sensor [1]
Sensor Date_Time Date Time Count ym col
<chr> <dttm> <date> <int> <int> <mth> <dbl>
1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630 2015 Jan 227.
2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826 2015 Jan 227.
3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567 2015 Jan 227.
4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264 2015 Jan 227.
5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139 2015 Jan 227.
6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77 2015 Jan 227.
7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44 2015 Jan 227.
8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56 2015 Jan 227.
9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113 2015 Jan 227.
10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166 2015 Jan 227.
# … with 230 more rows
>
> # group by ym and slice(1) to get it down to the right number of rows
> pedestrian_small_slide_along %>%
+ group_by(ym) %>%
+ slice(1)
# A tsibble: 24 x 7 [1h] <Australia/Melbourne>
# Key: Sensor [1]
# Groups: ym [24]
Sensor Date_Time Date Time Count ym col
<chr> <dttm> <date> <int> <int> <mth> <dbl>
1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630 2015 Jan 227.
2 Birrarung Marr 2015-02-01 00:00:00 2015-02-01 0 178 2015 Feb 180.
3 Birrarung Marr 2015-03-01 00:00:00 2015-03-01 0 69 2015 Mar 112.
4 Birrarung Marr 2015-04-01 00:00:00 2015-04-01 0 25 2015 Apr 147.
5 Birrarung Marr 2015-05-01 00:00:00 2015-05-01 0 13 2015 May 165.
6 Birrarung Marr 2015-06-01 00:00:00 2015-06-01 0 9 2015 Jun 160.
7 Birrarung Marr 2015-07-01 00:00:00 2015-07-01 0 21 2015 Jul 120.
8 Birrarung Marr 2015-08-01 00:00:00 2015-08-01 0 86 2015 Aug 130.
9 Birrarung Marr 2015-09-01 00:00:00 2015-09-01 0 12 2015 Sep 141
10 Birrarung Marr 2015-10-01 00:00:00 2015-10-01 0 22 2015 Oct 168.
# … with 14 more rows |
Another option is a dplyr verb (it breaks the yearmonth class too when binding groups together, but maybe that can be fixed) set.seed(123)
date <- as.Date(c("2019-01-01", "2019-01-02", "2019-02-01", "2019-02-02", "2019-02-03", "2019-03-01", "2019-04-01", "2019-04-02"))
idx <- yearmonth(date)
tib <- tibble(
g = c("A", "A", "A", "B", "B", "B", "B", "B"),
x = rnorm(length(g)),
date = date,
idx = idx
)
tib
#> # A tibble: 8 x 4
#> g x date idx
#> <chr> <dbl> <date> <mth>
#> 1 A -0.560 2019-01-01 2019 Jan
#> 2 A -0.230 2019-01-02 2019 Jan
#> 3 A 1.56 2019-02-01 2019 Feb
#> 4 B 0.0705 2019-02-02 2019 Feb
#> 5 B 0.129 2019-02-03 2019 Feb
#> 6 B 1.72 2019-03-01 2019 Mar
#> 7 B 0.461 2019-04-01 2019 Apr
#> 8 B -1.27 2019-04-02 2019 Apr
tib %>%
slidearize(y = mean(x), .index = idx, .before = 1)
#> # A tibble: 4 x 2
#> idx y
#> <mth> <dbl>
#> 1 2019 Jan -0.395
#> 2 2019 Feb 0.194
#> 3 2019 Mar 0.868
#> 4 2019 Apr 0.304
tib %>%
group_by(g) %>%
slidearize(y = mean(x), .index = idx, .before = 1)
#> # A tibble: 5 x 3
#> # Groups: g [2]
#> g idx y
#> <chr> <date> <dbl>
#> 1 A 2019-01-01 -0.395
#> 2 A 2019-02-01 0.256
#> 3 B 2019-02-01 -0.395
#> 4 B 2019-03-01 0.256
#> 5 B 2019-04-01 0.586
as_tibble(pedestrian) %>%
mutate(ym = yearmonth(Date)) %>%
slidearize(Count = mean(Count), .index = ym, .before = 1, .after = 1)
#> # A tibble: 24 x 2
#> ym Count
#> <mth> <dbl>
#> 1 2015 Jan 559.
#> 2 2015 Feb 626.
#> 3 2015 Mar 658.
#> 4 2015 Apr 678.
#> 5 2015 May 637.
#> 6 2015 Jun 651.
#> 7 2015 Jul 627.
#> 8 2015 Aug 635.
#> 9 2015 Sep 646.
#> 10 2015 Oct 661.
#> # … with 14 more rows
as_tibble(pedestrian) %>%
mutate(ym = yearmonth(Date)) %>%
group_by(Sensor) %>%
slidearize(Count = mean(Count), .index = ym, .before = 1, .after = 1)
#> # A tibble: 95 x 3
#> # Groups: Sensor [4]
#> Sensor ym Count
#> <chr> <date> <dbl>
#> 1 Birrarung Marr 2015-01-01 592.
#> 2 Birrarung Marr 2015-02-01 634.
#> 3 Birrarung Marr 2015-03-01 546.
#> 4 Birrarung Marr 2015-04-01 554.
#> 5 Birrarung Marr 2015-05-01 397.
#> 6 Birrarung Marr 2015-06-01 429.
#> 7 Birrarung Marr 2015-07-01 390.
#> 8 Birrarung Marr 2015-08-01 398.
#> 9 Birrarung Marr 2015-09-01 392.
#> 10 Birrarung Marr 2015-10-01 539.
#> # … with 85 more rows
pedestrian %>%
mutate(ym = yearmonth(Date_Time)) %>%
tidyr::nest(data = c(-Sensor, -ym)) %>%
group_by(Sensor) %>%
mutate(Monthly_MA = slide_dbl(data, ~ mean(.$Count, na.rm = TRUE), .size = 3, .align = "center", .bind = TRUE
))
#> # A tibble: 95 x 4
#> # Groups: Sensor [4]
#> Sensor ym data Monthly_MA
#> <chr> <mth> <list> <dbl>
#> 1 Birrarung Marr 2015 Jan <tsibble [744 × 4]> NA
#> 2 Birrarung Marr 2015 Feb <tsibble [672 × 4]> 634.
#> 3 Birrarung Marr 2015 Mar <tsibble [744 × 4]> 546.
#> 4 Birrarung Marr 2015 Apr <tsibble [720 × 4]> 554.
#> 5 Birrarung Marr 2015 May <tsibble [144 × 4]> 397.
#> 6 Birrarung Marr 2015 Jun <tsibble [720 × 4]> 429.
#> 7 Birrarung Marr 2015 Jul <tsibble [744 × 4]> 390.
#> 8 Birrarung Marr 2015 Aug <tsibble [744 × 4]> 398.
#> 9 Birrarung Marr 2015 Sep <tsibble [720 × 4]> 392.
#> 10 Birrarung Marr 2015 Oct <tsibble [119 × 4]> 539.
#> # … with 85 more rows make_virtual_group_info <- function(index, rows, before, after, step, sym) {
index <- vec_slice(index, rows)
index_split <- vec_split_id(index)
endpoints <- locate_endpoints(before, after, step, index_split$key) # the only special thing
new_rows <- purrr::map2(endpoints$starts, endpoints$stops, ~vec_c(!!!vec_slice(index_split$id, seq2(.x, .y))))
tibble(!!sym := index_split$key, !!expr(.rows) := new_rows)
}
slidearize <- function(.data, ...) {
UseMethod("slidearize")
}
slidearize.tbl_df <- function(.data, ..., .index, .before = 0L, .after = 0L, .step = 1L) {
.index_chr <- tidyselect::vars_select(names(.data), !!enquo(.index))
.index_sym <- sym(.index_chr)
.index <- dplyr::pull(.data, .index_chr)
group_info <- make_virtual_group_info(
.index,
vec_seq_along(.index),
before = .before,
after = .after,
step = .step,
sym = .index_sym
)
.data <- new_grouped_df(.data, group_info)
summarise(.data, ...)
}
slidearize.grouped_df <- function(.data, ..., .index, .before = 0L, .after = 0L, .step = 1L) {
.index_chr <- tidyselect::vars_select(names(.data), !!enquo(.index))
.index_sym <- sym(.index_chr)
.index <- dplyr::pull(.data, .index_chr)
group_info <- dplyr::group_data(.data)
group_rows <- group_info[[".rows"]]
group_info[[".rows"]] <- lapply(
group_rows,
make_virtual_group_info,
index = .index,
before = .before,
after = .after,
step = .step,
sym = .index_sym
)
group_info <- tidyr::unnest(group_info, cols = .rows)
attr(.data, "groups") <- group_info
summarise(.data, ...)
} |
If we were to have a family of While For instance, a common use of rolling windows is to compare the raw data with a rolling mean. library(tsibble)
library(dplyr)
library(ggplot2)
tsibbledata::gafa_stock %>%
group_by(Symbol) %>%
mutate(Close_MA = slide_dbl(Close, mean, .size = 7, .align = "c"))
#> # A tsibble: 5,032 x 9 [!]
#> # Key: Symbol [4]
#> # Groups: Symbol [4]
#> Symbol Date Open High Low Close Adj_Close Volume Close_MA
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200 NA
#> 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900 NA
#> 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700 NA
#> 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300 77.4
#> 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400 77.0
#> 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200 77.1
#> 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000 77.4
#> 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200 77.7
#> 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400 77.6
#> 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700 77.9
#> # … with 5,022 more rows The theoretical equivalent with library(tsibble)
library(dplyr)
library(ggplot2)
tsibbledata::gafa_stock %>%
group_by(Symbol) %>%
slide_by(.size = 7, .align = "c") %>%
mutate(Close_MA = mean(Close)) Any differences in dimension are padded according to |
slide_by()
,stretch_by()
,tile_by()
which integrates with dplyr'smutate()
,summarise()
, etc.The text was updated successfully, but these errors were encountered: