forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
testing-design.Rmd
670 lines (501 loc) · 32.6 KB
/
testing-design.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
# Designing your test suite {#sec-testing-design}
```{r, echo = FALSE}
source("common.R")
```
::: callout-important
Your test files should not include these `library()` calls.
We also explicitly request testthat edition 3, but in a real package this will be declared in DESCRIPTION.
```{r}
library(testthat)
local_edition(3)
```
:::
## What to test
> Whenever you are tempted to type something into a print statement or a debugger expression, write it as a test instead.
> --- Martin Fowler
There is a fine balance to writing tests.
Each test that you write makes your code less likely to change inadvertently; but it also can make it harder to change your code on purpose.
It's hard to give good general advice about writing tests, but you might find these points helpful:
- Focus on testing the external interface to your functions - if you test the internal interface, then it's harder to change the implementation in the future because as well as modifying the code, you'll also need to update all the tests.
- Strive to test each behaviour in one and only one test.
Then if that behaviour later changes you only need to update a single test.
- Avoid testing simple code that you're confident will work.
Instead focus your time on code that you're not sure about, is fragile, or has complicated interdependencies.
That said, we often find we make the most mistakes when we falsely assume that the problem is simple and doesn't need any tests.
- Always write a test when you discover a bug.
You may find it helpful to adopt the test-first philosophy.
There you always start by writing the tests, and then write the code that makes them pass.
This reflects an important problem solving strategy: start by establishing your success criteria, how you know if you've solved the problem.
### Test coverage {#sec-testing-design-coverage}
Another concrete way to direct your test writing efforts is to examine your test coverage.
The covr package (<https://covr.r-lib.org>) can be used to determine which lines of your package's source code are (or are not!) executed when the test suite is run.
This is most often presented as a percentage.
Generally speaking, the higher the better.
In some technical sense, 100% test coverage is the goal, however, this is rarely achieved in practice and that's often OK.
Going from 90% or 99% coverage to 100% is not always the best use of your development time and energy.
In many cases, that last 10% or 1% often requires some awkward gymnastics to cover.
Sometimes this forces you to introduce mocking or some other new complexity.
Don't sacrifice the maintainability of your test suite in the name of covering some weird edge case that hasn't yet proven to be a problem.
Also remember that not every line of code or every function is equally likely to harbor bugs.
Focus your testing energy on code that is tricky, based on your expert opinion and any empirical evidence you've accumulated about bug hot spots.
We use covr regularly, in two different ways:
- Local, interactive use. We mostly use `devtools::test_coverage_active_file()` and `devtools::test_coverage()`, for exploring the coverage of an individual file or the whole package, respectively.
- Automatic, remote use via GitHub Actions (GHA). We cover continuous integration and GHA more thoroughly in @sec-sw-dev-practices, but we will at least mention here that `usethis::use_github_action("test-coverage")` configures a GHA workflow that constantly monitors your test coverage. Test coverage can be an especially helpful metric when evaluating a pull request (either your own or from an external contributor). A proposed change that is well-covered by tests is less risky to merge.
## High-level principles for testing {#sec-testing-design-principles}
In later sections, we offer concrete strategies for how to handle common testing dilemmas in R.
Here we lay out the high-level principles that underpin these recommendations:
- A test should ideally be self-sufficient and self-contained.
- The interactive workflow is important, because you will mostly interact with your tests when they are failing.
- It's more important that test code be obvious than, e.g., as DRY as possible.
- However, the interactive workflow shouldn't "leak" into and undermine the test suite.
Writing good tests for a code base often feels more challenging than writing the code in the first place.
This can come as a bit of a shock when you're new to package development and you might be concerned that you're doing it wrong.
Don't worry, you're not!
Testing presents many unique challenges and maneuvers, which tend to get much less air time in programming communities than strategies for writing the "main code", i.e. the stuff below `R/`.
As a result, it requires more deliberate effort to develop your skills and taste around testing.
Many of the packages maintained by our team violate some of the advice you'll find here.
There are (at least) two reasons for that:
- testthat has been evolving for more than twelve years and this chapter reflects the cumulative lessons learned from that experience. The tests in many packages have been in place for a long time and reflect typical practices from different eras and different maintainers.
- These aren't hard and fast rules, but are, rather, guidelines. There will always be specific situations where it makes sense to bend the rule.
This chapter can't address all possible testing situations, but hopefully these guidelines will help your future decision-making.
### Self-sufficient tests
> All tests should strive to be hermetic: a test should contain all of the information necessary to set up, execute, and tear down its environment.
> Tests should assume as little as possible about the outside environment ....
>
> From the book Software Engineering at Google, [Chapter 11](https://abseil.io/resources/swe-book/html/ch11.html)
Recall this advice found in @sec-code-r-landscape, which covers your package's "main code", i.e. everything below `R/`:
> The `.R` files below `R/` should consist almost entirely of function definitions.
> Any other top-level code is suspicious and should be carefully reviewed for possible conversion into a function.
We have analogous advice for your test files:
> The `test-*.R` files below `tests/testthat/` should consist almost entirely of calls to `test_that()`.
> Any other top-level code is suspicious and should be carefully considered for relocation into calls to `test_that()` or to other files that get special treatment inside an R package or from testthat.
Eliminating (or at least minimizing) top-level code outside of `test_that()` will have the beneficial effect of making your tests more hermetic.
This is basically the testing analogue of the general programming advice that it's wise to avoid unstructured sharing of state.
Logic at the top-level of a test file has an awkward scope: Objects or functions defined here have what you might call "test file scope", if the definitions appear before the first call to `test_that()`.
If top-level code is interleaved between `test_that()` calls, you can even create "partial test file scope".
While writing tests, it can feel convenient to rely on these file-scoped objects, especially early in the life of a test suite, e.g. when each test file fits on one screen.
But we find that implicitly relying on objects in a test's parent environment tends to make a test suite harder to understand and maintain over time.
Consider a test file with top-level code sprinkled around it, outside of `test_that()`:
```{r, eval = FALSE}
dat <- data.frame(x = c("a", "b", "c"), y = c(1, 2, 3))
skip_if(today_is_a_monday())
test_that("foofy() does this", {
expect_equal(foofy(dat), ...)
})
dat2 <- data.frame(x = c("x", "y", "z"), y = c(4, 5, 6))
skip_on_os("windows")
test_that("foofy2() does that", {
expect_snapshot(foofy2(dat, dat2)
})
```
We recommend relocating file-scoped logic to either a narrower scope or to a broader scope.
Here's what it would look like to use a narrow scope, i.e. to inline everything inside `test_that()` calls:
```{r, eval = FALSE}
test_that("foofy() does this", {
skip_if(today_is_a_monday())
dat <- data.frame(x = c("a", "b", "c"), y = c(1, 2, 3))
expect_equal(foofy(dat), ...)
})
test_that("foofy() does that", {
skip_if(today_is_a_monday())
skip_on_os("windows")
dat <- data.frame(x = c("a", "b", "c"), y = c(1, 2, 3))
dat2 <- data.frame(x = c("x", "y", "z"), y = c(4, 5, 6))
expect_snapshot(foofy(dat, dat2)
})
```
Below we will discuss techniques for moving file-scoped logic to a broader scope.
### Self-contained tests {#sec-testing-design-self-contained}
Each `test_that()` test has its own execution environment, which makes it somewhat self-contained.
For example, an R object you create inside a test does not exist after the test exits:
```{r}
exists("thingy")
test_that("thingy exists", {
thingy <- "thingy"
expect_true(exists(thingy))
})
exists("thingy")
```
The `thingy` object lives and dies entirely within the confines of `test_that()`.
However, testthat doesn't know how to cleanup after actions that affect other aspects of the R landscape:
- The filesystem: creating and deleting files, changing the working directory, etc.
- The search path: `library()`, `attach()`.
- Global options, like `options()` and `par()`, and environment variables.
Watch how calls like `library()`, `options()`, and `Sys.setenv()` have a persistent effect *after* a test, even when they are executed inside `test_that()`:
```{r}
grep("jsonlite", search(), value = TRUE)
getOption("opt_whatever")
Sys.getenv("envvar_whatever")
test_that("landscape changes leak outside the test", {
library(jsonlite)
options(opt_whatever = "whatever")
Sys.setenv(envvar_whatever = "whatever")
expect_match(search(), "jsonlite", all = FALSE)
expect_equal(getOption("opt_whatever"), "whatever")
expect_equal(Sys.getenv("envvar_whatever"), "whatever")
})
grep("jsonlite", search(), value = TRUE)
getOption("opt_whatever")
Sys.getenv("envvar_whatever")
```
These changes to the landscape even persist beyond the current test file, i.e. they carry over into all subsequent test files.
If it's easy to avoid making such changes in your test code, that is the best strategy!
But if it's unavoidable, then you have to make sure that you clean up after yourself.
This mindset is very similar to one we advocated for in @sec-code-r-landscape, when discussing how to design well-mannered functions.
```{r, include = FALSE}
detach("package:jsonlite")
options(opt_whatever = NULL)
Sys.unsetenv("envvar_whatever")
```
We like to use the withr package (<https://withr.r-lib.org>) to make temporary changes in global state, because it automatically captures the initial state and arranges the eventual restoration.
You've already seen an example of its usage, when we explored snapshot tests:
```{r eval = FALSE}
test_that("side-by-side diffs work", {
withr::local_options(width = 20) # <-- (°_°) look here!
expect_snapshot(
waldo::compare(c("X", letters), c(letters, "X"))
)
})
```
This test requires the display width to be set at 20 columns, which is considerably less than the default width.
`withr::local_options(width = 20)` sets the `width` option to 20 and, at the end of the test, restores the option to its original value.
withr is also pleasant to use during interactive development: deferred actions are still captured on the global environment and can be executed explicitly via `withr::deferred_run()` or implicitly by restarting R.
We recommend including withr in `Suggests`, if you're only going to use it in your tests, or in `Imports`, if you also use it below `R/`.
Call withr functions as we do above, e.g. like `withr::local_whatever()`, in either case.
See @sec-dependencies-imports-vs-depends and @sec-dependencies-in-suggests-in-tests for more.
::: callout-tip
The easiest way to add a package to DESCRIPTION is with, e.g., `usethis::use_package("withr", type = "Suggests")`.
For tidyverse packages, withr is considered a "free dependency", i.e. the tidyverse uses withr so extensively that we don't hesitate to use it whenever it would be useful.
:::
withr has a large set of pre-implemented `local_*()` / `with_*()` functions that should handle most of your testing needs, so check there before you write your own.
If nothing exists that meets your need, `withr::defer()` is the general way to schedule some action at the end of a test.[^testing-design-1]
[^testing-design-1]: Base R's `on.exit()` is another alternative, but it requires more from you.
You need to capture the original state and write the restoration code yourself.
Also remember to do `on.exit(..., add = TRUE)` if there's *any* chance a second `on.exit()` call could be added in the test.
You probably also want to default to `after = FALSE`.
Here's how we would fix the problems in the previous example using withr: *Behind the scenes, we reversed the landscape changes, so we can try this again.*
```{r}
grep("jsonlite", search(), value = TRUE)
getOption("opt_whatever")
Sys.getenv("envvar_whatever")
test_that("withr makes landscape changes local to a test", {
withr::local_package("jsonlite")
withr::local_options(opt_whatever = "whatever")
withr::local_envvar(envvar_whatever = "whatever")
expect_match(search(), "jsonlite", all = FALSE)
expect_equal(getOption("opt_whatever"), "whatever")
expect_equal(Sys.getenv("envvar_whatever"), "whatever")
})
grep("jsonlite", search(), value = TRUE)
getOption("opt_whatever")
Sys.getenv("envvar_whatever")
```
testthat leans heavily on withr to make test execution environments as reproducible and self-contained as possible.
In testthat 3e, `testthat::local_reproducible_output()` is implicitly part of each `test_that()` test.
```{r, eval = FALSE}
test_that("something specific happens", {
local_reproducible_output() # <-- this happens implicitly
# your test code, which might be sensitive to ambient conditions, such as
# display width or the number of supported colors
})
```
`local_reproducible_output()` temporarily sets various options and environment variables to values favorable for testing, e.g. it suppresses colored output, turns off fancy quotes, sets the console width, and sets `LC_COLLATE = "C"`.
Usually, you can just passively enjoy the benefits of `local_reproducible_output()`.
But you may want to call it explicitly when replicating test results interactively or if you want to override the default settings in a specific test.
### Plan for test failure
We regret to inform you that most of the quality time you spend with your tests will be when they are inexplicably failing.
> In its purest form, automating testing consists of three activities: writing tests, running tests, and **reacting to test failures**....
>
> Remember that tests are often revisited only when something breaks.
> When you are called to fix a broken test that you have never seen before, you will be thankful someone took the time to make it easy to understand.
> Code is read far more than it is written, so make sure you write the test you'd like to read!
>
> From the book Software Engineering at Google, [Chapter 11](https://abseil.io/resources/swe-book/html/ch11.html)
Most of us don't work on a code base the size of Google.
But even in a team of one, tests that you wrote six months ago might as well have been written by someone else.
Especially when they are failing.
When we do reverse dependency checks, often involving hundreds or thousands of CRAN packages, we have to inspect test failures to determine if changes in our packages are to blame.
As a result, we regularly engage with failing tests in other people's packages, which leaves us with lots of opinions about practices that create unnecessary testing pain.
Test troubleshooting nirvana looks like this: In a fresh R session, you can do `devtools::load_all()` and immediately run an individual test or walk through it line-by-line.
There is no need to hunt around for setup code that has to be run manually first, that is found elsewhere in the test file or perhaps in a different file altogether.
Test-related code that lives in an unconventional location causes extra self-inflicted pain when you least need it.
Consider this extreme and abstract example of a test that is difficult to troubleshoot due to implicit dependencies on free-range code:
```{r, eval = FALSE}
# dozens or hundreds of lines of top-level code, interspersed with other tests,
# which you must read and selectively execute
test_that("f() works", {
x <- function_from_some_dependency(object_with_unknown_origin)
expect_equal(f(x), 2.5)
})
```
This test is much easier to drop in on if dependencies are invoked in the normal way, i.e. via `::`, and test objects are created inline:
```{r, eval = FALSE}
# dozens or hundreds of lines of self-sufficient, self-contained tests,
# all of which you can safely ignore!
test_that("f() works", {
useful_thing <- ...
x <- somePkg::someFunction(useful_thing)
expect_equal(f(x), 2.5)
})
```
This test is self-sufficient.
The code inside `{ ... }` explicitly creates any necessary objects or conditions and makes explicit calls to any helper functions.
This test doesn't rely on objects or dependencies that happen to be ambiently available.
Self-sufficient, self-contained tests are a win-win: It is literally safer to design tests this way and it also makes tests much easier for humans to troubleshoot later.
### Repetition is OK
One obvious consequence of our suggestion to minimize code with "file scope" is that your tests will probably have some repetition.
And that's OK!
We're going to make the controversial recommendation that you tolerate a fair amount of duplication in test code, i.e. you can relax some of your DRY ("don't repeat yourself") tendencies.
> Keep the reader in your test function.
> Good production code is well-factored; good test code is obvious.
> ... think about what will make the problem obvious when a test fails.
>
> From the blog post [Why Good Developers Write Bad Unit Tests](https://mtlynch.io/good-developers-bad-tests/)
Here's a toy example to make things concrete.
```{r}
test_that("multiplication works", {
useful_thing <- 3
expect_equal(2 * useful_thing, 6)
})
test_that("subtraction works", {
useful_thing <- 3
expect_equal(5 - useful_thing, 2)
})
```
In real life, `useful_thing` is usually a more complicated object that somehow feels burdensome to instantiate.
Notice how `useful_thing <- 3` appears in more than one place.
Conventional wisdom says we should DRY this code out.
It's tempting to just move `useful_thing`'s definition outside of the tests:
```{r}
useful_thing <- 3
test_that("multiplication works", {
expect_equal(2 * useful_thing, 6)
})
test_that("subtraction works", {
expect_equal(5 - useful_thing, 2)
})
```
But we really do think the first form, with the repetition, is often the better choice.
At this point, many readers might be thinking "but the code I might have to repeat is much longer than 1 line!".
Below we describe the use of test fixtures.
This can often reduce complicated situations back to something that resembles this simple example.
### Remove tension between interactive and automated testing {#sec-testing-design-tension}
Your test code will be executed in two different settings:
- Interactive test development and maintenance, which includes tasks like:
- Initial test creation
- Modifying tests to adapt to change
- Debugging test failure
- Automated test runs, which is accomplished with functions such as:
- Single file: `devtools::test_active_file()`, `testthat::test_file()`
- Whole package: `devtools::test()`, `devtools::check()`
Automated testing of your whole package is what takes priority.
This is ultimately the whole point of your tests.
However, the interactive experience is clearly important for the humans doing this work.
Therefore it's important to find a pleasant workflow, but also to ensure that you don't rig anything for interactive convenience that actually compromises the health of the test suite.
These two modes of test-running should not be in conflict with each other.
If you perceive tension between these two modes, this can indicate that you're not taking full advantage of some of testthat's features and the way it's designed to work with `devtools::load_all()`.
When working on your tests, use `load_all()`, just like you do when working below `R/`.
By default, `load_all()` does all of these things:
- Simulates re-building, re-installing, and re-loading your package.
- Makes everything in your package's namespace available, including unexported functions and objects and anything you've imported from another package.
- Attaches testthat, i.e. does `library(testthat)`.
- Runs test helper files, i.e. executes `test/testthat/helper.R` (more on that below).
This eliminates the need for any `library()` calls below `tests/testthat/`, for the vast majority of R packages.
Any instance of `library(testthat)` is clearly no longer necessary.
Likewise, any instance of attaching one of your dependencies via `library(somePkg)` is unnecessary.
In your tests, if you need to call functions from somePkg, do it just as you do below `R/`.
If you have imported the function into your namespace, use `fun()`.
If you have not, use `somePkg::fun()`.
It's fair to say that `library(somePkg)` in the tests should be about as rare as taking a dependency via `Depends`, i.e. there is almost always a better alternative.
Unnecessary calls to `library(somePkg)` in test files have a real downside, because they actually change the R landscape.
`library()` alters the search path.
This means the circumstances under which you are testing may not necessarily reflect the circumstances under which your package will be used.
This makes it easier to create subtle test bugs, which you will have to unravel in the future.
One other function that should almost never appear below `tests/testhat/` is `source()`.
There are several special files with an official role in testthat workflows (see below), not to mention the entire R package machinery, that provide better ways to make functions, objects, and other logic available in your tests.
## Files relevant to testing {#sec-tests-files-overview}
Here we review which package files are especially relevant to testing and, more generally, best practices for interacting with the file system from your tests.
### Hiding in plain sight: files below `R/`
The most important functions you'll need to access from your tests are clearly those in your package!
Here we're talking about everything that's defined below `R/`.
The functions and other objects defined by your package are always available when testing, regardless of whether they are exported or not.
For interactive work, `devtools::load_all()` takes care of this.
During automated testing, this is taken care of internally by testthat.
This implies that test helpers can absolutely be defined below `R/` and used freely in your tests.
It might make sense to gather such helpers in a clearly marked file, such as one of these:
```
.
├── ...
└── R
├── ...
├── test-helpers.R
├── test-utils.R
├── testthat.R
├── utils-testing.R
└── ...
```
For example, the dbplyr package uses [`R/testthat.R`](https://github.com/tidyverse/dbplyr/blob/e8bfa760a465cd7d8fa45cc53d4435ee1fbd2361/R/testthat.R) to define a couple of helpers to facilitate comparisons and expectations involving `tbl` objects, which is used to represent database tables.
```{r}
#| eval: false
compare_tbl <- function(x, y, label = NULL, expected.label = NULL) {
testthat::expect_equal(
arrange(collect(x), dplyr::across(everything())),
arrange(collect(y), dplyr::across(everything())),
label = label,
expected.label = expected.label
)
}
expect_equal_tbls <- function(results, ref = NULL, ...) {
# code that gets things ready ...
for (i in seq_along(rest)) {
compare_tbl(
rest[[i]], ref,
label = names(rest)[[i]],
expected.label = ref_name
)
}
invisible(TRUE)
}
```
### `tests/testthat.R`
Recall the initial testthat setup described in @sec-tests-mechanics-workflow: The standard `tests/testthat.R` file looks like this:
```{r eval = FALSE}
library(testthat)
library(pkg)
test_check("pkg")
```
We repeat the advice to not edit `tests/testthat.R`.
It is run during `R CMD check` (and, therefore, `devtools::check()`), but is not used in most other test-running scenarios (such as `devtools::test()` or `devtools::test_active_file()` or during interactive development).
Do not attach your dependencies here with `library()`.
Call them in your tests in the same manner as you do below `R/` (@sec-dependencies-in-imports-in-tests, @sec-dependencies-in-suggests-in-tests).
### Testthat helper files
Another type of file that is always executed by `load_all()` and at the beginning of automated testing is a helper file, defined as any file below `tests/testthat/` that begins with `helper`.
Helper files are a mighty weapon in the battle to eliminate code floating around at the top-level of test files.
Helper files are a prime example of what we mean when we recommend moving such code into a broader scope.
Objects or functions defined in a helper file are available to all of your tests.
If you have just one such file, you should probably name it `helper.R`.
If you organize your helpers into multiple files, you could include a suffix with additional info.
Here are examples of how such files might look:
```
.
├── ...
└── tests
├── testthat
│ ├── helper.R
│ ├── helper-blah.R
│ ├── helper-foo.R
│ ├── test-foofy.R
│ └── (more test files)
└── testthat.R
```
Many developers use helper files to define custom test helper functions, which we describe in detail in @sec-testing-advanced.
Compared to defining helpers below `R/`, some people find that `tests/testthat/helper.R` makes it more clear that these utilities are specifically for testing the package.
This location also feels more natural if your helpers rely on testthat functions.
For example, [usethis](https://github.com/r-lib/usethis/blob/main/tests/testthat/helper.R) and [vroom](https://github.com/tidyverse/vroom/blob/main/tests/testthat/helper.R) both have fairly extensive `tests/testthat/helper.R` files that define many custom test helpers.
Here are two very simple usethis helpers that check that the currently active project (usually an ephemeral test project) has a specific file or folder:
```{r}
expect_proj_file <- function(...) expect_true(file_exists(proj_path(...)))
expect_proj_dir <- function(...) expect_true(dir_exists(proj_path(...)))
```
A helper file is also a good location for setup code that is needed for its side effects.
This is a case where `tests/testthat/helper.R` is clearly more appropriate than a file below `R/`.
For example, in an API-wrapping package, `helper.R` is a good place to (attempt to) authenticate with the testing credentials[^testing-design-2].
[^testing-design-2]: googledrive does this in <https://github.com/tidyverse/googledrive/blob/906680f84b2cec2e4553978c9711be8d42ba33f7/tests/testthat/helper.R#L1-L10>.
### Testthat setup files
Testthat has one more special file type: setup files, defined as any file below `test/testthat/` that begins with `setup`.
Here's an example of how that might look:
```
.
├── ...
└── tests
├── testthat
│ ├── helper.R
│ ├── setup.R
│ ├── test-foofy.R
│ └── (more test files)
└── testthat.R
```
A setup file is handled almost exactly like a helper file, but with two big differences:
- Setup files are not executed by `devtools::load_all()`.
- Setup files often contain the corresponding teardown code.
Setup files are good for global test setup that is tailored for test execution in non-interactive or remote environments.
For example, you might turn off behaviour that's aimed at an interactive user, such as messaging or writing to the clipboard.
If any of your setup should be reversed after test execution, you should also include the necessary teardown code in `setup.R`[^testing-design-3].
We recommend maintaining teardown code alongside the setup code, in `setup.R`, because this makes it easier to ensure they stay in sync.
The artificial environment `teardown_env()` exists as a magical handle to use in `withr::defer()` and `withr::local_*()` / `withr::with_*()`.
[^testing-design-3]: A legacy approach (which still works, but is no longer recommended) is to put teardown code in `tests/testthat/teardown.R`.
Here's a `setup.R` example from the reprex package, where we turn off clipboard and HTML preview functionality during testing:
```{r eval = FALSE}
op <- options(reprex.clipboard = FALSE, reprex.html_preview = FALSE)
withr::defer(options(op), teardown_env())
```
Since we are just modifying options here, we can be even more concise and use the pre-built function `withr::local_options()` and pass `teardown_env()` as the `.local_envir`:
```{r eval = FALSE}
withr::local_options(
list(reprex.clipboard = FALSE, reprex.html_preview = FALSE),
.local_envir = teardown_env()
)
```
### Files ignored by testthat
testthat only automatically executes files where these are both true:
- File is a direct child of `tests/testthat/`
- File name starts with one of the specific strings:
- `helper`
- `setup`
- `test`
It is fine to have other files or directories in `tests/testthat/`, but testthat won't automatically do anything with them (other than the `_snaps` directory, which holds snapshots).
### Storing test data
Many packages contain files that hold test data.
Where should these be stored?
The best location is somewhere below `tests/testthat/`, often in a subdirectory, to keep things neat.
Below is an example, where `useful_thing1.rds` and `useful_thing2.rds` hold objects used in the test files.
```
.
├── ...
└── tests
├── testthat
│ ├── fixtures
│ │ ├── make-useful-things.R
│ │ ├── useful_thing1.rds
│ │ └── useful_thing2.rds
│ ├── helper.R
│ ├── setup.R
│ └── (all the test files)
└── testthat.R
```
Then, in your tests, use `testthat::test_path()` to build a robust filepath to such files.
```{r eval = FALSE}
test_that("foofy() does this", {
useful_thing <- readRDS(test_path("fixtures", "useful_thing1.rds"))
# ...
})
```
`testthat::test_path()` is extremely handy, because it produces the correct path in the two important modes of test execution:
- Interactive test development and maintenance, where working directory is presumably set to the top-level of the package.
- Automated testing, where working directory is usually set to something below `tests/`.
### Where to write files during testing {#sec-tests-files-where-write}
If it's easy to avoid writing files from your tests, that is definitely the best plan.
But there are many times when you really must write files.
**You should only write files inside the session temp directory.** Do not write into your package's `tests/` directory.
Do not write into the current working directory.
Do not write into the user's home directory.
Even though you are writing into the session temp directory, you should still clean up after yourself, i.e. delete any files you've written.
Most package developers don't want to hear this, because it sounds like a hassle.
But it's not that burdensome once you get familiar with a few techniques and build some new habits.
A high level of file system discipline also eliminates various testing bugs and will absolutely make your CRAN life run more smoothly.
This test is from roxygen2 and demonstrates everything we recommend:
```{r eval = FALSE}
test_that("can read from file name with utf-8 path", {
path <- withr::local_tempfile(
pattern = "Universit\u00e0-",
lines = c("#' @include foo.R", NULL)
)
expect_equal(find_includes(path), "foo.R")
})
```
`withr::local_tempfile()` creates a file within the session temp directory whose lifetime is tied to the "local" environment -- in this case, the execution environment of an individual test.
It is a wrapper around `base::tempfile()` and passes, e.g., the `pattern` argument through, so you have some control over the file name.
You can optionally provide `lines` to populate the file with at creation time or you can write to the file in all the usual ways in subsequent steps.
Finally, with no special effort on your part, the temporary file will automatically be deleted at the end of the test.
Sometimes you need even more control over the file name.
In that case, you can use `withr::local_tempdir()` to create a self-deleting temporary directory and write intentionally-named files inside this directory.