diff --git a/_posts/2018-06-25-sunspots-lstm/sunspots-lstm.Rmd b/_posts/2018-06-25-sunspots-lstm/sunspots-lstm.Rmd index 4a16541e5..b719a0855 100644 --- a/_posts/2018-06-25-sunspots-lstm/sunspots-lstm.Rmd +++ b/_posts/2018-06-25-sunspots-lstm/sunspots-lstm.Rmd @@ -191,11 +191,11 @@ include_graphics("images/cowplot.png") When doing cross validation on sequential data, the time dependencies on preceding samples must be preserved. We can create a cross validation sampling plan by offsetting the window used to select sequential sub-samples. In essence, we're creatively dealing with the fact that there's no future test data available by creating multiple synthetic "futures" - a process often, esp. in finance, called "backtesting". -As mentioned in the introduction, the [rsample](https://cran.r-project.org/package=rsample) package includes facitlities for backtesting on time series. The vignette, ["Time Series Analysis Example"](https://topepo.github.io/rsample/articles/Applications/Time_Series.html), describes a procedure that uses the `rolling_origin()` function to create samples designed for time series cross validation. We'll use this approach. +As mentioned in the introduction, the [rsample](https://cran.r-project.org/package=rsample) package includes facitlities for backtesting on time series. The vignette, ["Time Series Analysis Example"](https://tidymodels.github.io/rsample/articles/Applications/Time_Series.html), describes a procedure that uses the `rolling_origin()` function to create samples designed for time series cross validation. We'll use this approach. #### Developing a backtesting strategy -The sampling plan we create uses 50 years (`initial` = 12 x 50 samples) for the training set and ten years (`assess` = 12 x 10) for the testing (validation) set. We select a `skip` span of about twenty years (`skip` = 12 x 20 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select `cumulative = FALSE` to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the `rolling_origin_resamples`. +The sampling plan we create uses 100 years (`initial` = 12 x 100 samples) for the training set and 50 years (`assess` = 12 x 50) for the testing (validation) set. We select a `skip` span of about 22 years (`skip` = 12 x 22 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select `cumulative = FALSE` to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the `rolling_origin_resamples`. ```{r} periods_train <- 12 * 100 diff --git a/_posts/2018-06-25-sunspots-lstm/sunspots-lstm.html b/_posts/2018-06-25-sunspots-lstm/sunspots-lstm.html index f9b4be051..b257eb138 100644 --- a/_posts/2018-06-25-sunspots-lstm/sunspots-lstm.html +++ b/_posts/2018-06-25-sunspots-lstm/sunspots-lstm.html @@ -66,6 +66,10 @@ width: 100%; } + .pandoc-table>caption { + margin-bottom: 10px; + } + .pandoc-table th:not([align]) { text-align: left; } @@ -82,6 +86,10 @@ padding-right: 16px; } + .l-screen .caption { + margin-left: 10px; + } + .shaded { background: rgb(247, 247, 247); padding-top: 20px; @@ -601,9 +609,9 @@ var fn = $('#' + id); var fn_p = $('#' + id + '>p'); fn_p.find('.footnote-back').remove(); - var text = fn_p.text(); + var text = fn_p.html(); var dtfn = $(''); - dtfn.text(text); + dtfn.html(text); $(this).replaceWith(dtfn); }); // remove footnotes @@ -629,7 +637,10 @@ var clz = ""; var language = pre.attr('class'); if (language) { - if ($.inArray(language, ["r", "cpp", "c", "java"]) != -1) + // map unknown languages to "clike" (without this they just dissapear) + if ($.inArray(language, ["bash", "clike", "css", "go", "html", + "javascript", "js", "julia", "lua", "markdown", + "markup", "mathml", "python", "svg", "xml"]) == -1) language = "clike"; language = ' language="' + language + '"'; var dt_code = $(''); @@ -844,9 +855,19 @@ // prevent underline for linked images $('a > img').parent().css({'border-bottom' : 'none'}); - // mark figures created by knitr chunks as 100% width + // mark non-body figures created by knitr chunks as 100% width $('.layout-chunk').each(function(i, val) { - $(this).find('img, .html-widget').css('width', '100%'); + var figures = $(this).find('img, .html-widget'); + if ($(this).attr('data-layout') !== "l-body") { + figures.css('width', '100%'); + } else { + figures.css('max-width', '100%'); + figures.filter("[width]").each(function(i, val) { + var fig = $(this); + fig.css('width', fig.attr('width') + 'px'); + }); + + } }); // auto-append index.html to post-preview links in file: protocol @@ -858,7 +879,7 @@ // get rid of index.html references in header if (window.location.protocol !== "file:") { - $('.radix-site-header a').each(function(i,val) { + $('.radix-site-header a[href]').each(function(i,val) { $(this).attr('href', $(this).attr('href').replace("index.html", "./")); }); } @@ -867,6 +888,8 @@ $('tr.header').parent('thead').parent('table').addClass('pandoc-table'); $('.kable-table').children('table').addClass('pandoc-table'); + // add figcaption style to table captions + $('caption').parent('table').addClass("figcaption"); // initialize posts list if (window.init_posts_list) @@ -878,6 +901,7 @@ $('#disqus_thread').toggleClass('hidden'); if (!$('#disqus_thread').hasClass('hidden')) { var offset = $(this).offset(); + $(window).resize(); $('html, body').animate({ scrollTop: offset.top - 35 }); @@ -910,7 +934,7 @@ @@ -1053,9 +1077,9 @@

Visualizing sunspot data with cow

Backtesting: time series cross validation

When doing cross validation on sequential data, the time dependencies on preceding samples must be preserved. We can create a cross validation sampling plan by offsetting the window used to select sequential sub-samples. In essence, we’re creatively dealing with the fact that there’s no future test data available by creating multiple synthetic “futures” - a process often, esp. in finance, called “backtesting”.

-

As mentioned in the introduction, the rsample package includes facitlities for backtesting on time series. The vignette, “Time Series Analysis Example”, describes a procedure that uses the rolling_origin() function to create samples designed for time series cross validation. We’ll use this approach.

+

As mentioned in the introduction, the rsample package includes facitlities for backtesting on time series. The vignette, “Time Series Analysis Example”, describes a procedure that uses the rolling_origin() function to create samples designed for time series cross validation. We’ll use this approach.

Developing a backtesting strategy

-

The sampling plan we create uses 50 years (initial = 12 x 50 samples) for the training set and ten years (assess = 12 x 10) for the testing (validation) set. We select a skip span of about twenty years (skip = 12 x 20 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select cumulative = FALSE to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the rolling_origin_resamples.

+

The sampling plan we create uses 100 years (initial = 12 x 100 samples) for the training set and 50 years (assess = 12 x 50) for the testing (validation) set. We select a skip span of about 22 years (skip = 12 x 22 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select cumulative = FALSE to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the rolling_origin_resamples.


 periods_train <- 12 * 100
diff --git a/docs/posts/2018-06-25-sunspots-lstm/index.html b/docs/posts/2018-06-25-sunspots-lstm/index.html
index 0b136f3fb..45cc43c12 100644
--- a/docs/posts/2018-06-25-sunspots-lstm/index.html
+++ b/docs/posts/2018-06-25-sunspots-lstm/index.html
@@ -1470,9 +1470,9 @@ 

Visualizing sunspot data with cow

Backtesting: time series cross validation

When doing cross validation on sequential data, the time dependencies on preceding samples must be preserved. We can create a cross validation sampling plan by offsetting the window used to select sequential sub-samples. In essence, we’re creatively dealing with the fact that there’s no future test data available by creating multiple synthetic “futures” - a process often, esp. in finance, called “backtesting”.

-

As mentioned in the introduction, the rsample package includes facitlities for backtesting on time series. The vignette, “Time Series Analysis Example”, describes a procedure that uses the rolling_origin() function to create samples designed for time series cross validation. We’ll use this approach.

+

As mentioned in the introduction, the rsample package includes facitlities for backtesting on time series. The vignette, “Time Series Analysis Example”, describes a procedure that uses the rolling_origin() function to create samples designed for time series cross validation. We’ll use this approach.

Developing a backtesting strategy

-

The sampling plan we create uses 50 years (initial = 12 x 50 samples) for the training set and ten years (assess = 12 x 10) for the testing (validation) set. We select a skip span of about twenty years (skip = 12 x 20 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select cumulative = FALSE to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the rolling_origin_resamples.

+

The sampling plan we create uses 100 years (initial = 12 x 100 samples) for the training set and 50 years (assess = 12 x 50) for the testing (validation) set. We select a skip span of about 22 years (skip = 12 x 22 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select cumulative = FALSE to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the rolling_origin_resamples.


 periods_train <- 12 * 100
diff --git a/docs/posts/posts.json b/docs/posts/posts.json
index c53da9063..f0de7b7ff 100644
--- a/docs/posts/posts.json
+++ b/docs/posts/posts.json
@@ -323,7 +323,7 @@
       "Time Series"
     ],
     "preview": "posts/2018-06-25-sunspots-lstm/images/backtested_test.png",
-    "last_modified": "2018-09-12T12:45:46-04:00",
+    "last_modified": "2019-01-07T09:09:45-05:00",
     "preview_width": 800,
     "preview_height": 416
   },
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 584a1d0d2..2c7731b82 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -78,7 +78,7 @@
   
   
     https://blogs.rstudio.com/tensorflow/posts/2018-06-25-sunspots-lstm/
-    2018-09-12T12:45:46-04:00
+    2019-01-07T09:09:45-05:00
   
   
     https://blogs.rstudio.com/tensorflow/posts/2018-06-06-simple-audio-classification-keras/