Skip to content

Commit

Permalink
Doc updates, change parameter name.
Browse files Browse the repository at this point in the history
  • Loading branch information
bschneidr committed Oct 7, 2023
1 parent bd676e9 commit bf5e9b5
Show file tree
Hide file tree
Showing 3 changed files with 71 additions and 15 deletions.
47 changes: 38 additions & 9 deletions R/subsample_replicates.R
Original file line number Diff line number Diff line change
@@ -1,13 +1,39 @@
#' @title Retain only a random subset of the replicates in a design
#' @description Randomly subsamples the replicates of survey design object,
#' @description Randomly subsamples the replicates of a survey design object,
#' to keep only a subset. The scale factor used in estimation is increased
#' to account for the subsampling.
#' @param design A survey design object, created with either the \code{survey} or \code{srvyr} packages.
#' @param n The number of replicates to keep after subsampling
#' @param n_reps The number of replicates to keep after subsampling
#'
#' @return An updated survey design object, where only a random selection
#' of the replicates has been retained. The overall 'scale' factor for the design
#' (accessed with \code{design$scale}) is increased to account for the sampling of replicates.
#'
#' @section Statistical Details:
#'
#' Suppose the initial replicate design has \eqn{L} replicates, with
#' respective constants \eqn{c_k} for \eqn{k=1,\dots,L} used to estimate variance
#' with the formula
#' \deqn{v_{R} = \sum_{k=1}^L c_k\left(\hat{T}_y^{(k)}-\hat{T}_y\right)^2}
#'
#' With subsampling of replicates, \eqn{L_0} of the original \eqn{L} replicates
#' are randomly selected, and then variances are estimated using the formula:
#' \deqn{v_{R} = \frac{L}{L_0} \sum_{k=1}^{L_0} c_k\left(\hat{T}_y^{(k)}-\hat{T}_y\right)^2}
#'
#' This subsampling is suggested for certain replicate designs in Fay (1989).
#' Kim and Wu (2013) provide a detailed theoretical justification and
#' also propose alternative methods of subsampling replicates.
#'
#' @references
#'
#' Fay, Robert. 1989.
#' "Theory And Application Of Replicate Weighting For Variance Calculations."
#' In, 495–500. Alexandria, VA: American Statistical Association.
#' http://www.asasrms.org/Proceedings/papers/1989_033.pdf
#'
#' Kim, J.K. and Wu, C. 2013.
#' "Sparse and Efficient Replication Variance Estimation for Complex Surveys."
#' \strong{Survey Methodology}, Statistics Canada, 39(1), 91-120.
#' @export
#'
#' @examples
Expand Down Expand Up @@ -38,34 +64,37 @@
#' # Inspect replicates after subsampling
#'
#' rep_design |>
#' subsample_replicates(n = 4) |>
#' subsample_replicates(n_reps = 4) |>
#' getElement("repweights")
subsample_replicates <- function(design, n) {
subsample_replicates <- function(design, n_reps) {

if (!inherits(design, "svyrep.design")) {
stop("`design` must be a replicate design object.")
}
is_compressed <- inherits(design$repweights, "repweights_compressed")

if ((length(n) > 1) || (!is.numeric(n)) || (is.na(n))) {
if (is.null(n_reps) || (length(n_reps) > 1) || (!is.numeric(n_reps)) || (is.na(n_reps))) {
stop("`n` must be a single number")
}
if ((n_reps < 2)) {
stop("`n_reps` must be at least 2.")
}

# Count the number of replicates
if (!is_compressed) {
n_reps <- ncol(design$repweights)
orig_n_reps <- ncol(design$repweights)
}
if (is_compressed) {
n_reps <- ncol(design$repweights[['weights']])
orig_n_reps <- ncol(design$repweights[['weights']])
}

# Determine the new order of the replicates
selected_subsample <- sample(
x = seq_len(n_reps), size = n, replace = FALSE
x = seq_len(n_reps), size = n_reps, replace = FALSE
)

# Update the overall scale factor
subsample_rate <- n / n_reps
subsample_rate <- n_reps / orig_n_reps
scale_adjustment <- subsample_rate^(-1)

design$scale <- scale_adjustment * design$scale
Expand Down
35 changes: 31 additions & 4 deletions man/subsample_replicates.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test-subsample_replicates.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ survey_design <- svydesign(

test_that("Correct number of replicates and `scale` attribute after subsampling", {
jkn_design <- survey_design |> as.svrepdesign(type = "JKn", compress = TRUE)
subsamp_design <- jkn_design |> subsample_replicates(n = 5)
subsamp_design <- jkn_design |> subsample_replicates(n_reps = 5)

expect_equal(
object = ncol(subsamp_design$repweights[['weights']]),
Expand All @@ -40,7 +40,7 @@ test_that("Correct number of replicates and `scale` attribute after subsampling"
)

jkn_design <- survey_design |> as.svrepdesign(type = "JKn", compress = FALSE)
subsamp_design <- jkn_design |> subsample_replicates(n = 5)
subsamp_design <- jkn_design |> subsample_replicates(n_reps = 5)

expect_equal(
object = ncol(subsamp_design$repweights),
Expand Down

0 comments on commit bf5e9b5

Please sign in to comment.