Skip to content

Commit

Permalink
Merge pull request #27 from rstudio/sunspots-fix
Browse files Browse the repository at this point in the history
fixes due to #3, #4
  • Loading branch information
jjallaire authored Jan 7, 2019
2 parents 39a95f6 + 4f1d358 commit 9d5e013
Show file tree
Hide file tree
Showing 5 changed files with 39 additions and 15 deletions.
4 changes: 2 additions & 2 deletions _posts/2018-06-25-sunspots-lstm/sunspots-lstm.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -191,11 +191,11 @@ include_graphics("images/cowplot.png")

When doing cross validation on sequential data, the time dependencies on preceding samples must be preserved. We can create a cross validation sampling plan by offsetting the window used to select sequential sub-samples. In essence, we're creatively dealing with the fact that there's no future test data available by creating multiple synthetic "futures" - a process often, esp. in finance, called "backtesting".

As mentioned in the introduction, the [rsample](https://cran.r-project.org/package=rsample) package includes facitlities for backtesting on time series. The vignette, ["Time Series Analysis Example"](https://topepo.github.io/rsample/articles/Applications/Time_Series.html), describes a procedure that uses the `rolling_origin()` function to create samples designed for time series cross validation. We'll use this approach.
As mentioned in the introduction, the [rsample](https://cran.r-project.org/package=rsample) package includes facitlities for backtesting on time series. The vignette, ["Time Series Analysis Example"](https://tidymodels.github.io/rsample/articles/Applications/Time_Series.html), describes a procedure that uses the `rolling_origin()` function to create samples designed for time series cross validation. We'll use this approach.

#### Developing a backtesting strategy

The sampling plan we create uses 50 years (`initial` = 12 x 50 samples) for the training set and ten years (`assess` = 12 x 10) for the testing (validation) set. We select a `skip` span of about twenty years (`skip` = 12 x 20 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select `cumulative = FALSE` to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the `rolling_origin_resamples`.
The sampling plan we create uses 100 years (`initial` = 12 x 100 samples) for the training set and 50 years (`assess` = 12 x 50) for the testing (validation) set. We select a `skip` span of about 22 years (`skip` = 12 x 22 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select `cumulative = FALSE` to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the `rolling_origin_resamples`.

```{r}
periods_train <- 12 * 100
Expand Down
42 changes: 33 additions & 9 deletions _posts/2018-06-25-sunspots-lstm/sunspots-lstm.html
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@
width: 100%;
}

.pandoc-table>caption {
margin-bottom: 10px;
}

.pandoc-table th:not([align]) {
text-align: left;
}
Expand All @@ -82,6 +86,10 @@
padding-right: 16px;
}

.l-screen .caption {
margin-left: 10px;
}

.shaded {
background: rgb(247, 247, 247);
padding-top: 20px;
Expand Down Expand Up @@ -601,9 +609,9 @@
var fn = $('#' + id);
var fn_p = $('#' + id + '>p');
fn_p.find('.footnote-back').remove();
var text = fn_p.text();
var text = fn_p.html();
var dtfn = $('<d-footnote></d-footnote>');
dtfn.text(text);
dtfn.html(text);
$(this).replaceWith(dtfn);
});
// remove footnotes
Expand All @@ -629,7 +637,10 @@
var clz = "";
var language = pre.attr('class');
if (language) {
if ($.inArray(language, ["r", "cpp", "c", "java"]) != -1)
// map unknown languages to "clike" (without this they just dissapear)
if ($.inArray(language, ["bash", "clike", "css", "go", "html",
"javascript", "js", "julia", "lua", "markdown",
"markup", "mathml", "python", "svg", "xml"]) == -1)
language = "clike";
language = ' language="' + language + '"';
var dt_code = $('<d-code block' + language + clz + '></d-code>');
Expand Down Expand Up @@ -844,9 +855,19 @@
// prevent underline for linked images
$('a > img').parent().css({'border-bottom' : 'none'});

// mark figures created by knitr chunks as 100% width
// mark non-body figures created by knitr chunks as 100% width
$('.layout-chunk').each(function(i, val) {
$(this).find('img, .html-widget').css('width', '100%');
var figures = $(this).find('img, .html-widget');
if ($(this).attr('data-layout') !== "l-body") {
figures.css('width', '100%');
} else {
figures.css('max-width', '100%');
figures.filter("[width]").each(function(i, val) {
var fig = $(this);
fig.css('width', fig.attr('width') + 'px');
});

}
});

// auto-append index.html to post-preview links in file: protocol
Expand All @@ -858,7 +879,7 @@

// get rid of index.html references in header
if (window.location.protocol !== "file:") {
$('.radix-site-header a').each(function(i,val) {
$('.radix-site-header a[href]').each(function(i,val) {
$(this).attr('href', $(this).attr('href').replace("index.html", "./"));
});
}
Expand All @@ -867,6 +888,8 @@
$('tr.header').parent('thead').parent('table').addClass('pandoc-table');
$('.kable-table').children('table').addClass('pandoc-table');

// add figcaption style to table captions
$('caption').parent('table').addClass("figcaption");

// initialize posts list
if (window.init_posts_list)
Expand All @@ -878,6 +901,7 @@
$('#disqus_thread').toggleClass('hidden');
if (!$('#disqus_thread').hasClass('hidden')) {
var offset = $(this).offset();
$(window).resize();
$('html, body').animate({
scrollTop: offset.top - 35
});
Expand Down Expand Up @@ -910,7 +934,7 @@
<!--radix_placeholder_front_matter-->

<script id="distill-front-matter" type="text/json">
{"title":"Predicting Sunspot Frequency with Keras","description":"In this post we will examine making time series predictions using the sunspots dataset that ships with base R. Sunspots are dark spots on the sun, associated with lower temperature. Our post will focus on both how to apply deep learning to time series forecasting, and how to properly apply cross validation in this domain.","authors":[{"author":"Matt Dancho","authorURL":"https://github.com/mdancho84","affiliation":"Business Science","affiliationURL":"https://www.business-science.io/"},{"author":"Sigrid Keydana","authorURL":"https://github.com/skeydan","affiliation":"RStudio","affiliationURL":"https://www.rstudio.com"}],"publishedDate":"2018-06-25T00:00:00.000-04:00","citationText":"Dancho & Keydana, 2018"}
{"title":"Predicting Sunspot Frequency with Keras","description":"In this post we will examine making time series predictions using the sunspots dataset that ships with base R. Sunspots are dark spots on the sun, associated with lower temperature. Our post will focus on both how to apply deep learning to time series forecasting, and how to properly apply cross validation in this domain.","authors":[{"author":"Matt Dancho","authorURL":"https://github.com/mdancho84","affiliation":"Business Science","affiliationURL":"https://www.business-science.io/"},{"author":"Sigrid Keydana","authorURL":"https://github.com/skeydan","affiliation":"RStudio","affiliationURL":"https://www.rstudio.com"}],"publishedDate":"2018-06-25T00:00:00.000+02:00","citationText":"Dancho & Keydana, 2018"}
</script>

<!--/radix_placeholder_front_matter-->
Expand Down Expand Up @@ -1053,9 +1077,9 @@ <h4 id="visualizing-sunspot-data-with-cowplot">Visualizing sunspot data with cow
</div>
<h3 id="backtesting-time-series-cross-validation">Backtesting: time series cross validation</h3>
<p>When doing cross validation on sequential data, the time dependencies on preceding samples must be preserved. We can create a cross validation sampling plan by offsetting the window used to select sequential sub-samples. In essence, we’re creatively dealing with the fact that there’s no future test data available by creating multiple synthetic “futures” - a process often, esp. in finance, called “backtesting”.</p>
<p>As mentioned in the introduction, the <a href="https://cran.r-project.org/package=rsample">rsample</a> package includes facitlities for backtesting on time series. The vignette, <a href="https://topepo.github.io/rsample/articles/Applications/Time_Series.html">“Time Series Analysis Example”</a>, describes a procedure that uses the <code>rolling_origin()</code> function to create samples designed for time series cross validation. We’ll use this approach.</p>
<p>As mentioned in the introduction, the <a href="https://cran.r-project.org/package=rsample">rsample</a> package includes facitlities for backtesting on time series. The vignette, <a href="https://tidymodels.github.io/rsample/articles/Applications/Time_Series.html">“Time Series Analysis Example”</a>, describes a procedure that uses the <code>rolling_origin()</code> function to create samples designed for time series cross validation. We’ll use this approach.</p>
<h4 id="developing-a-backtesting-strategy">Developing a backtesting strategy</h4>
<p>The sampling plan we create uses 50 years (<code>initial</code> = 12 x 50 samples) for the training set and ten years (<code>assess</code> = 12 x 10) for the testing (validation) set. We select a <code>skip</code> span of about twenty years (<code>skip</code> = 12 x 20 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select <code>cumulative = FALSE</code> to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the <code>rolling_origin_resamples</code>.</p>
<p>The sampling plan we create uses 100 years (<code>initial</code> = 12 x 100 samples) for the training set and 50 years (<code>assess</code> = 12 x 50) for the testing (validation) set. We select a <code>skip</code> span of about 22 years (<code>skip</code> = 12 x 22 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select <code>cumulative = FALSE</code> to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the <code>rolling_origin_resamples</code>.</p>
<div class="layout-chunk" data-layout="l-body">
<pre class="r"><code>
periods_train &lt;- 12 * 100
Expand Down
4 changes: 2 additions & 2 deletions docs/posts/2018-06-25-sunspots-lstm/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1470,9 +1470,9 @@ <h4 id="visualizing-sunspot-data-with-cowplot">Visualizing sunspot data with cow
</div>
<h3 id="backtesting-time-series-cross-validation">Backtesting: time series cross validation</h3>
<p>When doing cross validation on sequential data, the time dependencies on preceding samples must be preserved. We can create a cross validation sampling plan by offsetting the window used to select sequential sub-samples. In essence, we’re creatively dealing with the fact that there’s no future test data available by creating multiple synthetic “futures” - a process often, esp. in finance, called “backtesting”.</p>
<p>As mentioned in the introduction, the <a href="https://cran.r-project.org/package=rsample">rsample</a> package includes facitlities for backtesting on time series. The vignette, <a href="https://topepo.github.io/rsample/articles/Applications/Time_Series.html">“Time Series Analysis Example”</a>, describes a procedure that uses the <code>rolling_origin()</code> function to create samples designed for time series cross validation. We’ll use this approach.</p>
<p>As mentioned in the introduction, the <a href="https://cran.r-project.org/package=rsample">rsample</a> package includes facitlities for backtesting on time series. The vignette, <a href="https://tidymodels.github.io/rsample/articles/Applications/Time_Series.html">“Time Series Analysis Example”</a>, describes a procedure that uses the <code>rolling_origin()</code> function to create samples designed for time series cross validation. We’ll use this approach.</p>
<h4 id="developing-a-backtesting-strategy">Developing a backtesting strategy</h4>
<p>The sampling plan we create uses 50 years (<code>initial</code> = 12 x 50 samples) for the training set and ten years (<code>assess</code> = 12 x 10) for the testing (validation) set. We select a <code>skip</code> span of about twenty years (<code>skip</code> = 12 x 20 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select <code>cumulative = FALSE</code> to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the <code>rolling_origin_resamples</code>.</p>
<p>The sampling plan we create uses 100 years (<code>initial</code> = 12 x 100 samples) for the training set and 50 years (<code>assess</code> = 12 x 50) for the testing (validation) set. We select a <code>skip</code> span of about 22 years (<code>skip</code> = 12 x 22 - 1) to approximately evenly distribute the samples into 6 sets that span the entire 265 years of sunspots history. Last, we select <code>cumulative = FALSE</code> to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) over those operating on less recent data. The tibble return contains the <code>rolling_origin_resamples</code>.</p>
<div class="layout-chunk" data-layout="l-body">
<pre class="r"><code>
periods_train &lt;- 12 * 100
Expand Down
2 changes: 1 addition & 1 deletion docs/posts/posts.json
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@
"Time Series"
],
"preview": "posts/2018-06-25-sunspots-lstm/images/backtested_test.png",
"last_modified": "2018-09-12T12:45:46-04:00",
"last_modified": "2019-01-07T09:09:45-05:00",
"preview_width": 800,
"preview_height": 416
},
Expand Down
2 changes: 1 addition & 1 deletion docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@
</url>
<url>
<loc>https://blogs.rstudio.com/tensorflow/posts/2018-06-25-sunspots-lstm/</loc>
<lastmod>2018-09-12T12:45:46-04:00</lastmod>
<lastmod>2019-01-07T09:09:45-05:00</lastmod>
</url>
<url>
<loc>https://blogs.rstudio.com/tensorflow/posts/2018-06-06-simple-audio-classification-keras/</loc>
Expand Down

0 comments on commit 9d5e013

Please sign in to comment.