From b5f4489d97ad2ab52c44c7ea5ec0c6643d6ff28b Mon Sep 17 00:00:00 2001 From: bschneidr Date: Sun, 10 Mar 2024 00:46:01 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20bschneid?= =?UTF-8?q?r/svrep@296c960b00185d21572a179a68852fd82d6f9546=20=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- articles/nonresponse-adjustments.html | 50 +++++++++---------- articles/sample-based-calibration.html | 18 +++---- pkgdown.yml | 2 +- reference/as_bootstrap_design.html | 40 +++++++-------- reference/as_fays_gen_rep_design.html | 50 +++++++++---------- reference/as_gen_boot_design.html | 47 +++++++++-------- .../as_random_group_jackknife_design.html | 8 +-- reference/get_design_quad_form.html | 46 ++++++++--------- reference/libraries.html | 4 +- reference/lou_pums_microdata.html | 8 +-- reference/make_ppswor_approx_matrix.html | 4 +- reference/make_quad_form_matrix.html | 2 +- reference/summarize_rep_weights.html | 2 +- search.json | 2 +- 14 files changed, 141 insertions(+), 142 deletions(-) diff --git a/articles/nonresponse-adjustments.html b/articles/nonresponse-adjustments.html index b8d9c8d..2baf2c7 100644 --- a/articles/nonresponse-adjustments.html +++ b/articles/nonresponse-adjustments.html @@ -260,26 +260,26 @@

Redistributing head(10) #> RESPONSE_STATUS Rep_Column N N_NONZERO SUM MEAN CV MIN #> 1 Nonrespondent 1 498 0 0 0.000 NaN 0 -#> 2 Respondent 1 502 323 596702 1188.649 0.9949470 0 +#> 2 Respondent 1 502 318 596702 1188.649 0.9910419 0 #> 3 Nonrespondent 2 498 0 0 0.000 NaN 0 -#> 4 Respondent 2 502 333 596702 1188.649 0.9430159 0 +#> 4 Respondent 2 502 314 596702 1188.649 1.0464442 0 #> 5 Nonrespondent 3 498 0 0 0.000 NaN 0 -#> 6 Respondent 3 502 317 596702 1188.649 0.9999765 0 +#> 6 Respondent 3 502 314 596702 1188.649 1.0122793 0 #> 7 Nonrespondent 4 498 0 0 0.000 NaN 0 -#> 8 Respondent 4 502 316 596702 1188.649 1.0087387 0 +#> 8 Respondent 4 502 321 596702 1188.649 1.0189112 0 #> 9 Nonrespondent 5 498 0 0 0.000 NaN 0 -#> 10 Respondent 5 502 320 596702 1188.649 0.9701576 0 +#> 10 Respondent 5 502 325 596702 1188.649 0.9681041 0 #> MAX #> 1 0.000 -#> 2 6780.705 +#> 2 5850.020 #> 3 0.000 -#> 4 5770.812 +#> 4 8001.751 #> 5 0.000 -#> 6 5907.941 +#> 6 5850.020 #> 7 0.000 -#> 8 7306.555 +#> 8 5967.020 #> 9 0.000 -#> 10 4891.000 +#> 10 5884.635

Conducting weighting class adjustments @@ -383,16 +383,16 @@

Propensity cell adjustment N_NONZERO, SUM) |> head(10) #> PROPENSITY_CELL Rep_Column N_NONZERO SUM -#> 1 1 1 120 117668.0 -#> 2 2 1 121 118265.3 -#> 3 3 1 126 121251.8 -#> 4 4 1 119 111097.7 -#> 5 5 1 130 128419.3 -#> 6 1 2 123 114681.5 -#> 7 2 2 125 123043.7 -#> 8 3 2 123 115876.1 -#> 9 4 2 133 120654.5 -#> 10 5 2 133 122446.4 +#> 1 1 1 127 116473.4 +#> 2 2 1 122 121251.8 +#> 3 3 1 133 122446.4 +#> 4 4 1 128 117668.0 +#> 5 5 1 128 118862.6 +#> 6 1 2 122 123043.7 +#> 7 2 2 121 109305.8 +#> 8 3 2 125 131405.8 +#> 9 4 2 124 120654.5 +#> 10 5 2 125 112292.3 # Inspect weights after adjustment nr_adjusted_survey |> @@ -404,15 +404,15 @@

Propensity cell adjustment head(10) #> PROPENSITY_CELL RESPONSE_STATUS Rep_Column N_NONZERO SUM #> 1 1 Nonrespondent 1 0 0.0 -#> 2 1 Respondent 1 55 117668.0 +#> 2 1 Respondent 1 57 116473.4 #> 3 2 Nonrespondent 1 0 0.0 -#> 4 2 Respondent 1 59 118265.3 +#> 4 2 Respondent 1 56 121251.8 #> 5 3 Nonrespondent 1 0 0.0 -#> 6 3 Respondent 1 64 121251.8 +#> 6 3 Respondent 1 68 122446.4 #> 7 4 Nonrespondent 1 0 0.0 -#> 8 4 Respondent 1 65 111097.7 +#> 8 4 Respondent 1 64 117668.0 #> 9 5 Nonrespondent 1 0 0.0 -#> 10 5 Respondent 1 80 128419.3

+#> 10 5 Respondent 1 73 118862.6
diff --git a/articles/sample-based-calibration.html b/articles/sample-based-calibration.html index 6423072..3c780c4 100644 --- a/articles/sample-based-calibration.html +++ b/articles/sample-based-calibration.html @@ -793,13 +793,13 @@

Raking to estimated control totals se1 -13416.2987 -13421.0938 +13404.9222 +13400.3618 se2 -13417.8092 -13412.3713 +13411.3225 +13400.4095 se3 @@ -1086,13 +1086,13 @@

Post-stratification se1 -0.0234616 -0.0234323 +0.0234653 +0.0234701 se2 -0.0234616 -0.0234323 +0.0234653 +0.0234701 se3 @@ -1162,7 +1162,7 @@

Reproducibility size = dimension_of_control_totals) print(columns_to_perturb) -#> [1] 339 307 843 526 478 908 577 874 563 557 929 34 816 39 349 776 +#> [1] 258 355 489 325 764 697 894 903 760 917 768 33 401 465 403 799 # Perform the calibration poststratified_design <- calibrate_to_estimate( diff --git a/pkgdown.yml b/pkgdown.yml index 439dce9..d133b3e 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -6,7 +6,7 @@ articles: nonresponse-adjustments: nonresponse-adjustments.html sample-based-calibration: sample-based-calibration.html two-phase-sampling: two-phase-sampling.html -last_built: 2024-03-09T23:57Z +last_built: 2024-03-10T00:45Z urls: reference: https://bschneidr.github.io/svrep/reference article: https://bschneidr.github.io/svrep/articles diff --git a/reference/as_bootstrap_design.html b/reference/as_bootstrap_design.html index 0b0a2d5..6aedd19 100644 --- a/reference/as_bootstrap_design.html +++ b/reference/as_bootstrap_design.html @@ -93,24 +93,24 @@

Arguments (the default):
- The bootstrap method of Beaumont and Émond (2022), which is a generalization of the Rao-Wu-Yue bootstrap, - and is applicable to a wide variety of designs, including single-stage and multistage stratified designs. - The design may have different sampling methods used at different stages. - Each stage of sampling may potentially be PPS (i.e., use unequal probabilities), with or without replacement, - and may potentially use Poisson sampling.

- For a stratum with a fixed sample size of \(n\) sampling units, resampling in each replicate resamples \((n-1)\) sampling units with replacement.

+ The bootstrap method of Beaumont and Émond (2022), which is a generalization of the Rao-Wu-Yue bootstrap, + and is applicable to a wide variety of designs, including single-stage and multistage stratified designs. + The design may have different sampling methods used at different stages. + Each stage of sampling may potentially be PPS (i.e., use unequal probabilities), with or without replacement, + and may potentially use Poisson sampling.

+ For a stratum with a fixed sample size of \(n\) sampling units, resampling in each replicate resamples \((n-1)\) sampling units with replacement.

  • "Rao-Wu":
    - The basic Rao-Wu \((n-1)\) bootstrap method, which is only applicable to single-stage designs or - multistage designs where the first-stage sampling fractions are small (and can thus be ignored). - Accommodates stratified designs. All sampling within a stratum must be simple random sampling with or without replacement, - although the first-stage sampling is effectively treated as sampling without replacement.

  • + The basic Rao-Wu \((n-1)\) bootstrap method, which is only applicable to single-stage designs or + multistage designs where the first-stage sampling fractions are small (and can thus be ignored). + Accommodates stratified designs. All sampling within a stratum must be simple random sampling with or without replacement, + although the first-stage sampling is effectively treated as sampling without replacement.

  • "Preston":
    - Preston's multistage rescaled bootstrap, which is applicable to single-stage designs or multistage designs - with arbitrary sampling fractions. Accommodates stratified designs. All sampling within a stratum must be - simple random sampling with or without replacement.

  • + Preston's multistage rescaled bootstrap, which is applicable to single-stage designs or multistage designs + with arbitrary sampling fractions. Accommodates stratified designs. All sampling within a stratum must be + simple random sampling with or without replacement.

  • "Canty-Davison":
    - The Canty-Davison bootstrap, which is only applicable to single-stage designs, with arbitrary sampling fractions. - Accommodates stratified designs. All sampling with a stratum must be simple random sampling with or without replacement.

  • + The Canty-Davison bootstrap, which is only applicable to single-stage designs, with arbitrary sampling fractions. + Accommodates stratified designs. All sampling with a stratum must be simple random sampling with or without replacement.

    @@ -134,11 +134,11 @@

    Arguments -
  • "SRSWR" - Simple random sampling, with replacement

  • -
  • "PPSWOR" - Unequal probabilities of selection, without replacement

  • -
  • "PPSWR" - Unequal probabilities of selection, with replacement

  • -
  • "Poisson" - Poisson sampling: each sampling unit is selected into the sample at most once, with potentially different probabilities of inclusion for each sampling unit.

  • +Each element should be one of the following:

    • "SRSWOR" - Simple random sampling, without replacement

    • +
    • "SRSWR" - Simple random sampling, with replacement

    • +
    • "PPSWOR" - Unequal probabilities of selection, without replacement

    • +
    • "PPSWR" - Unequal probabilities of selection, with replacement

    • +
    • "Poisson" - Poisson sampling: each sampling unit is selected into the sample at most once, with potentially different probabilities of inclusion for each sampling unit.

    diff --git a/reference/as_fays_gen_rep_design.html b/reference/as_fays_gen_rep_design.html index 87f66f3..28ca585 100644 --- a/reference/as_fays_gen_rep_design.html +++ b/reference/as_fays_gen_rep_design.html @@ -102,30 +102,29 @@

    Argumentsvariance-estimators for a detailed description of each variance estimator. -Options include:

    • "Yates-Grundy": The Yates-Grundy variance estimator based on - first-order and second-order inclusion probabilities.

    • -
    • "Horvitz-Thompson": The Horvitz-Thompson variance estimator based on - first-order and second-order inclusion probabilities.

    • -
    • "Poisson Horvitz-Thompson": The Horvitz-Thompson variance estimator - based on assuming Poisson sampling, with first-order inclusion probabilities - inferred from the sampling probabilities of the survey design object.

    • -
    • "Stratified Multistage SRS": The usual stratified multistage variance estimator - based on estimating the variance of cluster totals within strata at each stage.

    • -
    • "Ultimate Cluster": The usual variance estimator based on estimating - the variance of first-stage cluster totals within first-stage strata.

    • -
    • "Deville-1": A variance estimator for unequal-probability - sampling without replacement, described in Matei and Tillé (2005) - as "Deville 1".

    • -
    • "Deville-2": A variance estimator for unequal-probability - sampling without replacement, described in Matei and Tillé (2005) - as "Deville 2".

    • -
    • "Deville-Tille": A variance estimator useful - for balanced sampling designs, proposed by Deville and Tillé (2005).

    • -
    • "SD1": The non-circular successive-differences variance estimator described by Ash (2014), - sometimes used for variance estimation for systematic sampling.

    • -
    • "SD2": The circular successive-differences variance estimator described by Ash (2014). - This estimator is the basis of the "successive-differences replication" estimator commonly used - for variance estimation for systematic sampling.

    • +Options include:

      • "Yates-Grundy":
        The Yates-Grundy variance estimator based on + first-order and second-order inclusion probabilities.

      • +
      • "Horvitz-Thompson":
        The Horvitz-Thompson variance estimator based on + first-order and second-order inclusion probabilities.

      • +
      • "Poisson Horvitz-Thompson":
        The Horvitz-Thompson variance estimator + based on assuming Poisson sampling, with first-order inclusion probabilities + inferred from the sampling probabilities of the survey design object.

      • +
      • "Stratified Multistage SRS":
        The usual stratified multistage variance estimator + based on estimating the variance of cluster totals within strata at each stage.

      • +
      • "Ultimate Cluster":
        The usual variance estimator based on estimating + the variance of first-stage cluster totals within first-stage strata.

      • +
      • "Deville-1":
        A variance estimator for unequal-probability + sampling without replacement, described in Matei and Tillé (2005) + as "Deville 1".

      • +
      • "Deville-2":
        A variance estimator for unequal-probability + sampling without replacement, described in Matei and Tillé (2005) as "Deville 2".

      • +
      • "Deville-Tille":
        A variance estimator useful + for balanced sampling designs, proposed by Deville and Tillé (2005).

      • +
      • "SD1":
        The non-circular successive-differences variance estimator described by Ash (2014), + sometimes used for variance estimation for systematic sampling.

      • +
      • "SD2":
        The circular successive-differences variance estimator described by Ash (2014). + This estimator is the basis of the "successive-differences replication" estimator commonly used + for variance estimation for systematic sampling.

      @@ -169,7 +168,8 @@

      ArgumentsArgumentsvariance-estimators for a detailed description of each variance estimator. -Options include:

      • "Yates-Grundy": The Yates-Grundy variance estimator based on - first-order and second-order inclusion probabilities.

      • -
      • "Horvitz-Thompson": The Horvitz-Thompson variance estimator based on - first-order and second-order inclusion probabilities.

      • -
      • "Poisson Horvitz-Thompson": The Horvitz-Thompson variance estimator - based on assuming Poisson sampling, with first-order inclusion probabilities - inferred from the sampling probabilities of the survey design object.

      • -
      • "Stratified Multistage SRS": The usual stratified multistage variance estimator - based on estimating the variance of cluster totals within strata at each stage.

      • -
      • "Ultimate Cluster": The usual variance estimator based on estimating - the variance of first-stage cluster totals within first-stage strata.

      • -
      • "Deville-1": A variance estimator for unequal-probability - sampling without replacement, described in Matei and Tillé (2005) - as "Deville 1".

      • -
      • "Deville-2": A variance estimator for unequal-probability - sampling without replacement, described in Matei and Tillé (2005) - as "Deville 2".

      • -
      • "Deville-Tille": A variance estimator useful - for balanced sampling designs, proposed by Deville and Tillé (2005).

      • -
      • "SD1": The non-circular successive-differences variance estimator described by Ash (2014), - sometimes used for variance estimation for systematic sampling.

      • -
      • "SD2": The circular successive-differences variance estimator described by Ash (2014). - This estimator is the basis of the "successive-differences replication" estimator commonly used - for variance estimation for systematic sampling.

      • +Options include:

        • "Yates-Grundy":
          The Yates-Grundy variance estimator based on + first-order and second-order inclusion probabilities.

        • +
        • "Horvitz-Thompson":
          The Horvitz-Thompson variance estimator based on + first-order and second-order inclusion probabilities.

        • +
        • "Poisson Horvitz-Thompson":
          The Horvitz-Thompson variance estimator + based on assuming Poisson sampling, with first-order inclusion probabilities + inferred from the sampling probabilities of the survey design object.

        • +
        • "Stratified Multistage SRS":
          The usual stratified multistage variance estimator + based on estimating the variance of cluster totals within strata at each stage.

        • +
        • "Ultimate Cluster":
          The usual variance estimator based on estimating + the variance of first-stage cluster totals within first-stage strata.

        • +
        • "Deville-1":
          A variance estimator for unequal-probability + sampling without replacement, described in Matei and Tillé (2005) + as "Deville 1".

        • +
        • "Deville-2":
          A variance estimator for unequal-probability + sampling without replacement, described in Matei and Tillé (2005) as "Deville 2".

        • +
        • "Deville-Tille":
          A variance estimator useful + for balanced sampling designs, proposed by Deville and Tillé (2005).

        • +
        • "SD1":
          The non-circular successive-differences variance estimator described by Ash (2014), + sometimes used for variance estimation for systematic sampling.

        • +
        • "SD2":
          The circular successive-differences variance estimator described by Ash (2014). + This estimator is the basis of the "successive-differences replication" estimator commonly used + for variance estimation for systematic sampling.

        diff --git a/reference/as_random_group_jackknife_design.html b/reference/as_random_group_jackknife_design.html index d7a6154..38c14ef 100644 --- a/reference/as_random_group_jackknife_design.html +++ b/reference/as_random_group_jackknife_design.html @@ -120,10 +120,10 @@

        Argumentsadj_method include:

        • "variance-stratum-psus" (the default)
          The replicate weight adjustment for a unit is based on the number of PSUs in its variance stratum.

        • -
        • "variance-units" +

        • "variance-units"
          The replicate weight adjustment for a unit is based on the number of variance units in its variance stratum.

        • @@ -133,11 +133,11 @@

          Argumentsscale_method include:

          • "variance-stratum-psus"
            The scale factor for a variance unit is based on its number of PSUs compared to the number of PSUs in its variance stratum.

          • -
          • "variance-units" +

          • "variance-units"
            The scale factor for a variance unit is based on the number of variance units in its variance stratum.

          • diff --git a/reference/get_design_quad_form.html b/reference/get_design_quad_form.html index 01df947..c9cd573 100644 --- a/reference/get_design_quad_form.html +++ b/reference/get_design_quad_form.html @@ -87,29 +87,29 @@

            Arguments
            • "Yates-Grundy": The Yates-Grundy variance estimator based on - first-order and second-order inclusion probabilities.

            • -
            • "Horvitz-Thompson": The Horvitz-Thompson variance estimator based on - first-order and second-order inclusion probabilities.

            • -
            • "Poisson Horvitz-Thompson": The Horvitz-Thompson variance estimator - based on assuming Poisson sampling with specified first-order inclusion probabilities.

            • -
            • "Stratified Multistage SRS": The usual stratified multistage variance estimator - based on estimating the variance of cluster totals within strata at each stage.

            • -
            • "Ultimate Cluster": The usual variance estimator based on estimating - the variance of first-stage cluster totals within first-stage strata.

            • -
            • "Deville-1": A variance estimator for unequal-probability - sampling without replacement, described in Matei and Tillé (2005) - as "Deville 1".

            • -
            • "Deville-2": A variance estimator for unequal-probability - sampling without replacement, described in Matei and Tillé (2005) - as "Deville 2".

            • -
            • "Deville-Tille": A variance estimator useful - for balanced sampling designs, proposed by Deville and Tillé (2005).

            • -
            • "SD1": The non-circular successive-differences variance estimator described by Ash (2014), - sometimes used for variance estimation for systematic sampling.

            • -
            • "SD2": The circular successive-differences variance estimator described by Ash (2014). - This estimator is the basis of the "successive-differences replication" estimator commonly used - for variance estimation for systematic sampling.

            • +Options include:

              • "Yates-Grundy":
                The Yates-Grundy variance estimator based on + first-order and second-order inclusion probabilities.

              • +
              • "Horvitz-Thompson":
                The Horvitz-Thompson variance estimator based on + first-order and second-order inclusion probabilities.

              • +
              • "Poisson Horvitz-Thompson":
                The Horvitz-Thompson variance estimator + based on assuming Poisson sampling, with first-order inclusion probabilities + inferred from the sampling probabilities of the survey design object.

              • +
              • "Stratified Multistage SRS":
                The usual stratified multistage variance estimator + based on estimating the variance of cluster totals within strata at each stage.

              • +
              • "Ultimate Cluster":
                The usual variance estimator based on estimating + the variance of first-stage cluster totals within first-stage strata.

              • +
              • "Deville-1":
                A variance estimator for unequal-probability + sampling without replacement, described in Matei and Tillé (2005) + as "Deville 1".

              • +
              • "Deville-2":
                A variance estimator for unequal-probability + sampling without replacement, described in Matei and Tillé (2005) as "Deville 2".

              • +
              • "Deville-Tille":
                A variance estimator useful + for balanced sampling designs, proposed by Deville and Tillé (2005).

              • +
              • "SD1":
                The non-circular successive-differences variance estimator described by Ash (2014), + sometimes used for variance estimation for systematic sampling.

              • +
              • "SD2":
                The circular successive-differences variance estimator described by Ash (2014). + This estimator is the basis of the "successive-differences replication" estimator commonly used + for variance estimation for systematic sampling.

              diff --git a/reference/libraries.html b/reference/libraries.html index 7be3e93..1979355 100644 --- a/reference/libraries.html +++ b/reference/libraries.html @@ -101,9 +101,9 @@

              Format< specifically from the Public Library System Data File. Particularly relevant variables include:

              Identifier variables and survey response status:

              • FSCSKEY: A unique identifier for libraries.

              • -
              • LIBNAME: The name of the library

              • +
              • LIBNAME: The name of the library.

              • RESPONSE_STATUS: Response status for the Public Library Survey: - indicates whether the library was a respondent, nonrespondent, or was closed.

              • + indicates whether the library was a respondent, nonrespondent, or was closed.

              Numeric summaries:

              • TOTCIR: Total circulation

              • VISITS: Total visitors

              • REGBOR: Total number of registered users

              • diff --git a/reference/lou_pums_microdata.html b/reference/lou_pums_microdata.html index 1a06c55..839bb8b 100644 --- a/reference/lou_pums_microdata.html +++ b/reference/lou_pums_microdata.html @@ -77,14 +77,14 @@

                Format<

                A data frame with 80 rows and 85 variables

                • UNIQUE_ID: Unique identifier for records

                • AGE: Age in years (copied from the AGEP variable in the ACS microdata)

                • RACE_ETHNICITY: Race and Hispanic/Latino ethnicity - derived from RAC1P and HISP variables - of ACS microdata and collapsed to a smaller number of categories.

                • + derived from RAC1P and HISP variables + of ACS microdata and collapsed to a smaller number of categories.

                • SEX: Male or Female

                • EDUC_ATTAINMENT: Highest level of education attained ('Less than high school' or 'High school or beyond') - derived from SCHL variable in ACS microdata and collapsed to a smaller number of categories.

                • + derived from SCHL variable in ACS microdata and collapsed to a smaller number of categories.

                • PWGTP: Weights for the full-sample

                • PWGTP1-PWGTP80: 80 columns of replicate weights - created using the Successive Differences Replication (SDR) method.

                • + created using the Successive Differences Replication (SDR) method.

                diff --git a/reference/make_ppswor_approx_matrix.html b/reference/make_ppswor_approx_matrix.html index 10f4c1a..f41d99b 100644 --- a/reference/make_ppswor_approx_matrix.html +++ b/reference/make_ppswor_approx_matrix.html @@ -128,9 +128,9 @@

                Details

                The constants \(c_i\) are defined for each approximation method as follows, -with the names taken directly from Matei and Tillé (2005).

                • "Deville-1": +with the names taken directly from Matei and Tillé (2005).

                  • "Deville-1": $$c_i=\left(1-\pi_i\right) \frac{n}{n-1}$$

                  • -
                  • "Deville-2": +

                  • "Deville-2": $$c_i = (1-\pi_i) \left[1 - \sum_{k=1}^{n} \left(\frac{1-\pi_k}{\sum_{k=1}^{n}(1-\pi_k)}\right)^2 \right]^{-1}$$

                  Both of the approximations "Deville-1" and "Deville-2" were shown in the simulation studies of Matei and Tillé (2005) to perform much better diff --git a/reference/make_quad_form_matrix.html b/reference/make_quad_form_matrix.html index 92607e0..5a51ddc 100644 --- a/reference/make_quad_form_matrix.html +++ b/reference/make_quad_form_matrix.html @@ -117,7 +117,7 @@

                  Argumentsstrata_ids, cluster_ids, and probs.

                • -
                • "Deville-2": A variance estimator for unequal-probability +

                • "Deville-2": A variance estimator for unequal-probability sampling without replacement, described in Matei and Tillé (2005) as "Deville 2". If this option is used, then it is necessary to also use the arguments strata_ids, cluster_ids, and probs.

                • diff --git a/reference/summarize_rep_weights.html b/reference/summarize_rep_weights.html index a45c000..0983b16 100644 --- a/reference/summarize_rep_weights.html +++ b/reference/summarize_rep_weights.html @@ -105,7 +105,7 @@

                  Value

                  "specific" summary are the following:

                  • "Rep_Column": The name of a given column of replicate weights. - If columns are unnamed, the column number is used instead

                  • + If columns are unnamed, the column number is used instead

                  • "N": The number of entries

                  • "N_NONZERO": The number of nonzero entries

                  • "SUM": The sum of the weights

                  • diff --git a/search.json b/search.json index b98c937..f7e7ec1 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"GNU General Public License","title":"GNU General Public License","text":"Version 3, 29 June 2007Copyright © 2007 Free Software Foundation, Inc.  Everyone permitted copy distribute verbatim copies license document, changing allowed.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"preamble","dir":"","previous_headings":"","what":"Preamble","title":"GNU General Public License","text":"GNU General Public License free, copyleft license software kinds works. licenses software practical works designed take away freedom share change works. contrast, GNU General Public License intended guarantee freedom share change versions program–make sure remains free software users. , Free Software Foundation, use GNU General Public License software; applies also work released way authors. can apply programs, . speak free software, referring freedom, price. General Public Licenses designed make sure freedom distribute copies free software (charge wish), receive source code can get want , can change software use pieces new free programs, know can things. protect rights, need prevent others denying rights asking surrender rights. Therefore, certain responsibilities distribute copies software, modify : responsibilities respect freedom others. example, distribute copies program, whether gratis fee, must pass recipients freedoms received. must make sure , , receive can get source code. must show terms know rights. Developers use GNU GPL protect rights two steps: (1) assert copyright software, (2) offer License giving legal permission copy, distribute /modify . developers’ authors’ protection, GPL clearly explains warranty free software. users’ authors’ sake, GPL requires modified versions marked changed, problems attributed erroneously authors previous versions. devices designed deny users access install run modified versions software inside , although manufacturer can . fundamentally incompatible aim protecting users’ freedom change software. systematic pattern abuse occurs area products individuals use, precisely unacceptable. Therefore, designed version GPL prohibit practice products. problems arise substantially domains, stand ready extend provision domains future versions GPL, needed protect freedom users. Finally, every program threatened constantly software patents. States allow patents restrict development use software general-purpose computers, , wish avoid special danger patents applied free program make effectively proprietary. prevent , GPL assures patents used render program non-free. precise terms conditions copying, distribution modification follow.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_0-definitions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"0. Definitions","title":"GNU General Public License","text":"“License” refers version 3 GNU General Public License. “Copyright” also means copyright-like laws apply kinds works, semiconductor masks. “Program” refers copyrightable work licensed License. licensee addressed “”. “Licensees” “recipients” may individuals organizations. “modify” work means copy adapt part work fashion requiring copyright permission, making exact copy. resulting work called “modified version” earlier work work “based ” earlier work. “covered work” means either unmodified Program work based Program. “propagate” work means anything , without permission, make directly secondarily liable infringement applicable copyright law, except executing computer modifying private copy. Propagation includes copying, distribution (without modification), making available public, countries activities well. “convey” work means kind propagation enables parties make receive copies. Mere interaction user computer network, transfer copy, conveying. interactive user interface displays “Appropriate Legal Notices” extent includes convenient prominently visible feature (1) displays appropriate copyright notice, (2) tells user warranty work (except extent warranties provided), licensees may convey work License, view copy License. interface presents list user commands options, menu, prominent item list meets criterion.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_1-source-code","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"1. Source Code","title":"GNU General Public License","text":"“source code” work means preferred form work making modifications . “Object code” means non-source form work. “Standard Interface” means interface either official standard defined recognized standards body, , case interfaces specified particular programming language, one widely used among developers working language. “System Libraries” executable work include anything, work whole, () included normal form packaging Major Component, part Major Component, (b) serves enable use work Major Component, implement Standard Interface implementation available public source code form. “Major Component”, context, means major essential component (kernel, window system, ) specific operating system () executable work runs, compiler used produce work, object code interpreter used run . “Corresponding Source” work object code form means source code needed generate, install, (executable work) run object code modify work, including scripts control activities. However, include work’s System Libraries, general-purpose tools generally available free programs used unmodified performing activities part work. example, Corresponding Source includes interface definition files associated source files work, source code shared libraries dynamically linked subprograms work specifically designed require, intimate data communication control flow subprograms parts work. Corresponding Source need include anything users can regenerate automatically parts Corresponding Source. Corresponding Source work source code form work.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_2-basic-permissions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"2. Basic Permissions","title":"GNU General Public License","text":"rights granted License granted term copyright Program, irrevocable provided stated conditions met. License explicitly affirms unlimited permission run unmodified Program. output running covered work covered License output, given content, constitutes covered work. License acknowledges rights fair use equivalent, provided copyright law. may make, run propagate covered works convey, without conditions long license otherwise remains force. may convey covered works others sole purpose make modifications exclusively , provide facilities running works, provided comply terms License conveying material control copyright. thus making running covered works must exclusively behalf, direction control, terms prohibit making copies copyrighted material outside relationship . Conveying circumstances permitted solely conditions stated . Sublicensing allowed; section 10 makes unnecessary.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_3-protecting-users-legal-rights-from-anti-circumvention-law","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"3. Protecting Users’ Legal Rights From Anti-Circumvention Law","title":"GNU General Public License","text":"covered work shall deemed part effective technological measure applicable law fulfilling obligations article 11 WIPO copyright treaty adopted 20 December 1996, similar laws prohibiting restricting circumvention measures. convey covered work, waive legal power forbid circumvention technological measures extent circumvention effected exercising rights License respect covered work, disclaim intention limit operation modification work means enforcing, work’s users, third parties’ legal rights forbid circumvention technological measures.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_4-conveying-verbatim-copies","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"4. Conveying Verbatim Copies","title":"GNU General Public License","text":"may convey verbatim copies Program’s source code receive , medium, provided conspicuously appropriately publish copy appropriate copyright notice; keep intact notices stating License non-permissive terms added accord section 7 apply code; keep intact notices absence warranty; give recipients copy License along Program. may charge price price copy convey, may offer support warranty protection fee.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_5-conveying-modified-source-versions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"5. Conveying Modified Source Versions","title":"GNU General Public License","text":"may convey work based Program, modifications produce Program, form source code terms section 4, provided also meet conditions: ) work must carry prominent notices stating modified , giving relevant date. b) work must carry prominent notices stating released License conditions added section 7. requirement modifies requirement section 4 “keep intact notices”. c) must license entire work, whole, License anyone comes possession copy. License therefore apply, along applicable section 7 additional terms, whole work, parts, regardless packaged. License gives permission license work way, invalidate permission separately received . d) work interactive user interfaces, must display Appropriate Legal Notices; however, Program interactive interfaces display Appropriate Legal Notices, work need make . compilation covered work separate independent works, nature extensions covered work, combined form larger program, volume storage distribution medium, called “aggregate” compilation resulting copyright used limit access legal rights compilation’s users beyond individual works permit. Inclusion covered work aggregate cause License apply parts aggregate.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_6-conveying-non-source-forms","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"6. Conveying Non-Source Forms","title":"GNU General Public License","text":"may convey covered work object code form terms sections 4 5, provided also convey machine-readable Corresponding Source terms License, one ways: ) Convey object code , embodied , physical product (including physical distribution medium), accompanied Corresponding Source fixed durable physical medium customarily used software interchange. b) Convey object code , embodied , physical product (including physical distribution medium), accompanied written offer, valid least three years valid long offer spare parts customer support product model, give anyone possesses object code either (1) copy Corresponding Source software product covered License, durable physical medium customarily used software interchange, price reasonable cost physically performing conveying source, (2) access copy Corresponding Source network server charge. c) Convey individual copies object code copy written offer provide Corresponding Source. alternative allowed occasionally noncommercially, received object code offer, accord subsection 6b. d) Convey object code offering access designated place (gratis charge), offer equivalent access Corresponding Source way place charge. need require recipients copy Corresponding Source along object code. place copy object code network server, Corresponding Source may different server (operated third party) supports equivalent copying facilities, provided maintain clear directions next object code saying find Corresponding Source. Regardless server hosts Corresponding Source, remain obligated ensure available long needed satisfy requirements. e) Convey object code using peer--peer transmission, provided inform peers object code Corresponding Source work offered general public charge subsection 6d. separable portion object code, whose source code excluded Corresponding Source System Library, need included conveying object code work. “User Product” either (1) “consumer product”, means tangible personal property normally used personal, family, household purposes, (2) anything designed sold incorporation dwelling. determining whether product consumer product, doubtful cases shall resolved favor coverage. particular product received particular user, “normally used” refers typical common use class product, regardless status particular user way particular user actually uses, expects expected use, product. product consumer product regardless whether product substantial commercial, industrial non-consumer uses, unless uses represent significant mode use product. “Installation Information” User Product means methods, procedures, authorization keys, information required install execute modified versions covered work User Product modified version Corresponding Source. information must suffice ensure continued functioning modified object code case prevented interfered solely modification made. convey object code work section , , specifically use , User Product, conveying occurs part transaction right possession use User Product transferred recipient perpetuity fixed term (regardless transaction characterized), Corresponding Source conveyed section must accompanied Installation Information. requirement apply neither third party retains ability install modified object code User Product (example, work installed ROM). requirement provide Installation Information include requirement continue provide support service, warranty, updates work modified installed recipient, User Product modified installed. Access network may denied modification materially adversely affects operation network violates rules protocols communication across network. Corresponding Source conveyed, Installation Information provided, accord section must format publicly documented (implementation available public source code form), must require special password key unpacking, reading copying.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_7-additional-terms","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"7. Additional Terms","title":"GNU General Public License","text":"“Additional permissions” terms supplement terms License making exceptions one conditions. Additional permissions applicable entire Program shall treated though included License, extent valid applicable law. additional permissions apply part Program, part may used separately permissions, entire Program remains governed License without regard additional permissions. convey copy covered work, may option remove additional permissions copy, part . (Additional permissions may written require removal certain cases modify work.) may place additional permissions material, added covered work, can give appropriate copyright permission. Notwithstanding provision License, material add covered work, may (authorized copyright holders material) supplement terms License terms: ) Disclaiming warranty limiting liability differently terms sections 15 16 License; b) Requiring preservation specified reasonable legal notices author attributions material Appropriate Legal Notices displayed works containing ; c) Prohibiting misrepresentation origin material, requiring modified versions material marked reasonable ways different original version; d) Limiting use publicity purposes names licensors authors material; e) Declining grant rights trademark law use trade names, trademarks, service marks; f) Requiring indemnification licensors authors material anyone conveys material (modified versions ) contractual assumptions liability recipient, liability contractual assumptions directly impose licensors authors. non-permissive additional terms considered “restrictions” within meaning section 10. Program received , part , contains notice stating governed License along term restriction, may remove term. license document contains restriction permits relicensing conveying License, may add covered work material governed terms license document, provided restriction survive relicensing conveying. add terms covered work accord section, must place, relevant source files, statement additional terms apply files, notice indicating find applicable terms. Additional terms, permissive non-permissive, may stated form separately written license, stated exceptions; requirements apply either way.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_8-termination","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"8. Termination","title":"GNU General Public License","text":"may propagate modify covered work except expressly provided License. attempt otherwise propagate modify void, automatically terminate rights License (including patent licenses granted third paragraph section 11). However, cease violation License, license particular copyright holder reinstated () provisionally, unless copyright holder explicitly finally terminates license, (b) permanently, copyright holder fails notify violation reasonable means prior 60 days cessation. Moreover, license particular copyright holder reinstated permanently copyright holder notifies violation reasonable means, first time received notice violation License (work) copyright holder, cure violation prior 30 days receipt notice. Termination rights section terminate licenses parties received copies rights License. rights terminated permanently reinstated, qualify receive new licenses material section 10.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_9-acceptance-not-required-for-having-copies","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"9. Acceptance Not Required for Having Copies","title":"GNU General Public License","text":"required accept License order receive run copy Program. Ancillary propagation covered work occurring solely consequence using peer--peer transmission receive copy likewise require acceptance. However, nothing License grants permission propagate modify covered work. actions infringe copyright accept License. Therefore, modifying propagating covered work, indicate acceptance License .","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_10-automatic-licensing-of-downstream-recipients","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"10. Automatic Licensing of Downstream Recipients","title":"GNU General Public License","text":"time convey covered work, recipient automatically receives license original licensors, run, modify propagate work, subject License. responsible enforcing compliance third parties License. “entity transaction” transaction transferring control organization, substantially assets one, subdividing organization, merging organizations. propagation covered work results entity transaction, party transaction receives copy work also receives whatever licenses work party’s predecessor interest give previous paragraph, plus right possession Corresponding Source work predecessor interest, predecessor can get reasonable efforts. may impose restrictions exercise rights granted affirmed License. example, may impose license fee, royalty, charge exercise rights granted License, may initiate litigation (including cross-claim counterclaim lawsuit) alleging patent claim infringed making, using, selling, offering sale, importing Program portion .","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_11-patents","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"11. Patents","title":"GNU General Public License","text":"“contributor” copyright holder authorizes use License Program work Program based. work thus licensed called contributor’s “contributor version”. contributor’s “essential patent claims” patent claims owned controlled contributor, whether already acquired hereafter acquired, infringed manner, permitted License, making, using, selling contributor version, include claims infringed consequence modification contributor version. purposes definition, “control” includes right grant patent sublicenses manner consistent requirements License. contributor grants non-exclusive, worldwide, royalty-free patent license contributor’s essential patent claims, make, use, sell, offer sale, import otherwise run, modify propagate contents contributor version. following three paragraphs, “patent license” express agreement commitment, however denominated, enforce patent (express permission practice patent covenant sue patent infringement). “grant” patent license party means make agreement commitment enforce patent party. convey covered work, knowingly relying patent license, Corresponding Source work available anyone copy, free charge terms License, publicly available network server readily accessible means, must either (1) cause Corresponding Source available, (2) arrange deprive benefit patent license particular work, (3) arrange, manner consistent requirements License, extend patent license downstream recipients. “Knowingly relying” means actual knowledge , patent license, conveying covered work country, recipient’s use covered work country, infringe one identifiable patents country reason believe valid. , pursuant connection single transaction arrangement, convey, propagate procuring conveyance , covered work, grant patent license parties receiving covered work authorizing use, propagate, modify convey specific copy covered work, patent license grant automatically extended recipients covered work works based . patent license “discriminatory” include within scope coverage, prohibits exercise , conditioned non-exercise one rights specifically granted License. may convey covered work party arrangement third party business distributing software, make payment third party based extent activity conveying work, third party grants, parties receive covered work , discriminatory patent license () connection copies covered work conveyed (copies made copies), (b) primarily connection specific products compilations contain covered work, unless entered arrangement, patent license granted, prior 28 March 2007. Nothing License shall construed excluding limiting implied license defenses infringement may otherwise available applicable patent law.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_12-no-surrender-of-others-freedom","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"12. No Surrender of Others’ Freedom","title":"GNU General Public License","text":"conditions imposed (whether court order, agreement otherwise) contradict conditions License, excuse conditions License. convey covered work satisfy simultaneously obligations License pertinent obligations, consequence may convey . example, agree terms obligate collect royalty conveying convey Program, way satisfy terms License refrain entirely conveying Program.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_13-use-with-the-gnu-affero-general-public-license","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"13. Use with the GNU Affero General Public License","title":"GNU General Public License","text":"Notwithstanding provision License, permission link combine covered work work licensed version 3 GNU Affero General Public License single combined work, convey resulting work. terms License continue apply part covered work, special requirements GNU Affero General Public License, section 13, concerning interaction network apply combination .","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_14-revised-versions-of-this-license","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"14. Revised Versions of this License","title":"GNU General Public License","text":"Free Software Foundation may publish revised /new versions GNU General Public License time time. new versions similar spirit present version, may differ detail address new problems concerns. version given distinguishing version number. Program specifies certain numbered version GNU General Public License “later version” applies , option following terms conditions either numbered version later version published Free Software Foundation. Program specify version number GNU General Public License, may choose version ever published Free Software Foundation. Program specifies proxy can decide future versions GNU General Public License can used, proxy’s public statement acceptance version permanently authorizes choose version Program. Later license versions may give additional different permissions. However, additional obligations imposed author copyright holder result choosing follow later version.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_15-disclaimer-of-warranty","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"15. Disclaimer of Warranty","title":"GNU General Public License","text":"WARRANTY PROGRAM, EXTENT PERMITTED APPLICABLE LAW. EXCEPT OTHERWISE STATED WRITING COPYRIGHT HOLDERS /PARTIES PROVIDE PROGRAM “” WITHOUT WARRANTY KIND, EITHER EXPRESSED IMPLIED, INCLUDING, LIMITED , IMPLIED WARRANTIES MERCHANTABILITY FITNESS PARTICULAR PURPOSE. ENTIRE RISK QUALITY PERFORMANCE PROGRAM . PROGRAM PROVE DEFECTIVE, ASSUME COST NECESSARY SERVICING, REPAIR CORRECTION.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_16-limitation-of-liability","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"16. Limitation of Liability","title":"GNU General Public License","text":"EVENT UNLESS REQUIRED APPLICABLE LAW AGREED WRITING COPYRIGHT HOLDER, PARTY MODIFIES /CONVEYS PROGRAM PERMITTED , LIABLE DAMAGES, INCLUDING GENERAL, SPECIAL, INCIDENTAL CONSEQUENTIAL DAMAGES ARISING USE INABILITY USE PROGRAM (INCLUDING LIMITED LOSS DATA DATA RENDERED INACCURATE LOSSES SUSTAINED THIRD PARTIES FAILURE PROGRAM OPERATE PROGRAMS), EVEN HOLDER PARTY ADVISED POSSIBILITY DAMAGES.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_17-interpretation-of-sections-15-and-16","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"17. Interpretation of Sections 15 and 16","title":"GNU General Public License","text":"disclaimer warranty limitation liability provided given local legal effect according terms, reviewing courts shall apply local law closely approximates absolute waiver civil liability connection Program, unless warranty assumption liability accompanies copy Program return fee. END TERMS CONDITIONS","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"how-to-apply-these-terms-to-your-new-programs","dir":"","previous_headings":"","what":"How to Apply These Terms to Your New Programs","title":"GNU General Public License","text":"develop new program, want greatest possible use public, best way achieve make free software everyone can redistribute change terms. , attach following notices program. safest attach start source file effectively state exclusion warranty; file least “copyright” line pointer full notice found. Also add information contact electronic paper mail. program terminal interaction, make output short notice like starts interactive mode: hypothetical commands show w show c show appropriate parts General Public License. course, program’s commands might different; GUI interface, use “box”. also get employer (work programmer) school, , sign “copyright disclaimer” program, necessary. information , apply follow GNU GPL, see . GNU General Public License permit incorporating program proprietary programs. program subroutine library, may consider useful permit linking proprietary applications library. want , use GNU Lesser General Public License instead License. first, please read .","code":" Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'. This is free software, and you are welcome to redistribute it under certain conditions; type 'show c' for details."},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"choosing-a-bootstrap-method","dir":"Articles","previous_headings":"","what":"Choosing a Bootstrap Method","title":"Bootstrap Methods for Surveys","text":"Essentially every bootstrap method commonly used surveys can used simple random sampling replacement can easily applied stratified sampling (simply repeat method separately stratum). However, things become complicated types sampling care needed use bootstrap method appropriate survey design. many common designs used practice, possible (easy!) use one bootstrap methods described section vignette titled “Basic Bootstrap Methods.” design isn’t appropriate one basic bootstrap methods, may possible use generalized survey bootstrap described later section vignette. generalized survey bootstrap method can used especially complex designs, systematic sampling two-phase sampling designs. interested reader encouraged read Mashreghi, Haziza, Léger (2016) overview bootstrap methods developed survey samples.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"basic-bootstrap-methods","dir":"Articles","previous_headings":"","what":"Basic Bootstrap Methods","title":"Bootstrap Methods for Surveys","text":"sample designs used practice, three basic survey design features must considered choosing bootstrap method: Whether multiple stages sampling Whether design uses without-replacement sampling large sampling fractions Whether design uses unequal-probability sampling (commonly referred “probability proportional size (PPS)” sampling statistics jargon) ‘svrep’ ‘survey’ packages implement four basic bootstrap methods, can handle one survey design features. four methods, Rao-Wu-Yue-Beaumont bootstrap method (Beaumont Émond 2022) one able directly handle three design features thus default method used function as_bootstrap_design().1 following table summarizes four basic bootstrap methods appropriateness common design features described earlier. Designs Covered Bootstrap Method Data Required Bootstrap Method","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"implementation","dir":"Articles","previous_headings":"Basic Bootstrap Methods","what":"Implementation","title":"Bootstrap Methods for Surveys","text":"implement basic bootstrap methods, can create survey design object svydesign() function survey package, convert object bootstrap replicate design using as_bootstrap_design(). method can used multistage, stratified designs one different kinds sampling, provided “Rao-Wu-Yue-Beaumont” method used. Example 1: Multistage Simple Random Sampling without Replacement (SRSWOR) Example 2: Single-stage unequal probability sampling without replacement Example 3: Multistage Sampling Different Sampling Methods Stage designs use different sampling methods different stages, can use argument samp_method_by_stage ensure correct method used form bootstrap weights. general, multistage design uses unequal probability sampling stages, creating initial design object, stage-specific sampling probabilities supplied fpc argument svydesign() function, user specify pps = \"brewer\".","code":"library(survey) # For complex survey analysis library(svrep) set.seed(2022) # Load an example dataset from a multistage sample, with two stages of SRSWOR data(\"mu284\", package = 'survey') multistage_srswor_design <- svydesign(data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2) bootstrap_rep_design <- as_bootstrap_design(multistage_srswor_design, type = \"Rao-Wu-Yue-Beaumont\", replicates = 500) svytotal(x = ~ y1, design = multistage_srswor_design) #> total SE #> y1 15080 2274.3 svytotal(x = ~ y1, design = bootstrap_rep_design) #> total SE #> y1 15080 2311.1 # Load example dataset of U.S. counties and states with 2004 Presidential vote counts data(\"election\", package = 'survey') pps_wor_design <- svydesign(data = election_pps, pps = HR(), fpc = ~ p, # Inclusion probabilities ids = ~ 1) bootstrap_rep_design <- as_bootstrap_design(pps_wor_design, type = \"Rao-Wu-Yue-Beaumont\", replicates = 100) svytotal(x = ~ Bush + Kerry, design = pps_wor_design) svytotal(x = ~ Bush + Kerry, design = bootstrap_rep_design) # Declare a multistage design # where first-stage probabilities are PPSWOR sampling # and second-stage probabilities are based on SRSWOR multistage_design <- svydesign( data = library_multistage_sample, ids = ~ PSU_ID + SSU_ID, probs = ~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, pps = \"brewer\" ) # Convert to a bootstrap replicate design boot_design <- as_bootstrap_design( design = multistage_design, type = \"Rao-Wu-Yue-Beaumont\", samp_method_by_stage = c(\"PPSWOR\", \"SRSWOR\"), replicates = 1000 ) # Compare variance estimates svytotal(x = ~ TOTCIR, na.rm = TRUE, design = multistage_design) #> total SE #> TOTCIR 1634739229 250890030 svytotal(x = ~ TOTCIR, na.rm = TRUE, design = boot_design) #> total SE #> TOTCIR 1634739229 264207604"},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"generalized-survey-bootstrap","dir":"Articles","previous_headings":"","what":"Generalized Survey Bootstrap","title":"Bootstrap Methods for Surveys","text":"sample designs additional complex features beyond three highlighted , generalized survey bootstrap method can used. especially useful systematic samples, two-phase samples, complex designs one wishes use general-purpose estimator Horvitz-Thompson Yates-Grundy estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"statistical-background","dir":"Articles","previous_headings":"Generalized Survey Bootstrap","what":"Statistical Background","title":"Bootstrap Methods for Surveys","text":"generalized survey bootstrap based remarkable observation Fay (1984), summarized nicely Dippo, Fay, Morganstein (1984): …variance estimator based sums squares cross-products represented resampling plan. -- Dippo, Fay, Morganstein (1984) words, sample design textbook variance estimator totals can represented quadratic form (.e., sums squares cross-products), can make replication estimator . Fay developed general methodology producing replication estimators textbook estimator’s quadratic form, encompassing jackknife, bootstrap, balanced repeated replication special cases. Within framework, “generalized survey bootstrap” developed Bertail Combris (1997) one specific strategy making bootstrap replication estimators textbook variance estimators. See Beaumont Patak (2012) clear overview generalized survey bootstrap. starting point implementing generalized survey bootstrap method choose textbook variance estimator appropriate sampling design can represented quadratic form. Luckily, many useful variance estimators can represented quadratic forms. highlight prominent examples : stratified, multistage cluster samples: usual multistage variance estimator used ‘survey’ package, based adding variance contributions stage. estimator can used number sampling stages. Highly-general variance estimators work ‘measurable’ survey design (.e., designs every pair units population nonzero probability appearing sample). covers designs used practice, primary exceptions “one-PSU-per-stratum” designs systematic sampling designs. Horvitz-Thompson estimator Sen-Yates-Grundy estimator systematic samples: SD1 SD2 successive-differences estimators, basis commonly-used “successive-differences replication” (SDR) estimator (see Ash (2014) overview SDR). two-phase samples: double-expansion variance estimator described Section 9.3 Särndal, Swensson, Wretman (1992). textbook variance estimator selected quadratic form identified, generalized survey bootstrap method consists randomly generating set replicate weights multivariate distribution whose expectation \\(n\\)-vector \\(\\mathbf{1}_n\\) whose variance-covariance matrix matrix quadratic form used textbook variance estimator. ensures , expectation, bootstrap variance estimator total equals textbook variance estimator thus inherits properties design-unbiasedness design-consistency.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"details-and-notation-for-the-generalized-survey-bootstrap-method","dir":"Articles","previous_headings":"Generalized Survey Bootstrap","what":"Details and Notation for the Generalized Survey Bootstrap Method","title":"Bootstrap Methods for Surveys","text":"section, describe generalized survey bootstrap greater detail, using notation Beaumont Patak (2012).","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"quadratic-forms","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Details and Notation for the Generalized Survey Bootstrap Method","what":"Quadratic Forms","title":"Bootstrap Methods for Surveys","text":"Let \\(v( \\hat{T_y})\\) textbook variance estimator estimated population total \\(\\hat{T}_y\\) variable \\(y\\). base weight case \\(\\) sample \\(w_i\\), let \\(\\breve{y}_i\\) denote weighted value \\(w_iy_i\\). Suppose can represent textbook variance estimator quadratic form: \\(v(\\hat{T}_y) = \\breve{y}\\Sigma\\breve{y}^T\\), \\(n \\times n\\) matrix \\(\\Sigma\\). constraint \\(\\Sigma\\) , sample, must symmetric positive semi-definite (words, never lead negative variance estimate, matter value \\(\\breve{y}\\) ). example, popular Horvitz-Thompson estimator based first-order inclusion probabilities \\(\\pi_k\\) second-order inclusion probabilities \\(\\pi_{kl}\\) can represented positive semi-definite matrix entries \\((1-\\pi_k)\\) along main diagonal entries \\((1 - \\frac{\\pi_k \\pi_l}{\\pi_{kl}})\\) everywhere else. illustration sample \\(n=3\\) shown : \\[ \\Sigma_{HT} = \\begin{bmatrix} (1-\\pi_1) & (1 - \\frac{\\pi_1 \\pi_2}{\\pi_{12}}) & (1 - \\frac{\\pi_1 \\pi_3}{\\pi_{13}}) \\\\ (1 - \\frac{\\pi_2 \\pi_1}{\\pi_{21}}) & (1 - \\pi_2) & (1 - \\frac{\\pi_2 \\pi_3}{\\pi_{23}}) \\\\ (1 - \\frac{\\pi_3 \\pi_1}{\\pi_{31}}) & (1 - \\frac{\\pi_3 \\pi_2}{\\pi_{32}}) & (1 - \\pi_3) \\end{bmatrix} \\] another example, successive-difference variance estimator systematic sample can represented positive semi-definite matrix whose diagonal entries \\(1\\), whose superdiagonal subdiagonal entries \\(-1/2\\), whose top right bottom left entries \\(-1/2\\) (Ash 2014). illustration sample \\(n=5\\) shown : \\[ \\Sigma_{SD2} = \\begin{bmatrix} 1 & -1/2 & 0 & 0 & -1/2\\\\ -1/2 & 1 & -1/2 & 0 & 0 \\\\ 0 & -1/2 & 1 & -1/2 & 0 \\\\ 0 & 0 & -1/2 & 1& -1/2 \\\\ -1/2 & 0 & 0 & -1/2 & 1 \\end{bmatrix} \\] obtain quadratic form matrix variance estimator, can use function make_quad_form_matrix(), takes inputs name variance estimator relevant survey design information. example, following code produces quadratic form matrix “SD2” variance estimator saw earlier. following example, use method estimate variance stratified systematic sample U.S. public libraries. First, create quadratic form matrix represent SD2 successive-difference estimator. can done using svydesign() function describe survey design using get_design_quad_form() obtain quadratic form specified variance estimator. Next, estimate sampling variance estimated total TOTCIR variable using quadratic form.","code":"make_quad_form_matrix( variance_estimator = \"SD2\", cluster_ids = c(1,2,3,4,5) |> data.frame(), strata_ids = c(1,1,1,1,1) |> data.frame(), sort_order = c(1,2,3,4,5) ) #> 5 x 5 sparse Matrix of class \"dsCMatrix\" #> #> [1,] 1.0 -0.5 . . -0.5 #> [2,] -0.5 1.0 -0.5 . . #> [3,] . -0.5 1.0 -0.5 . #> [4,] . . -0.5 1.0 -0.5 #> [5,] -0.5 . . -0.5 1.0 # Load an example dataset of a stratified systematic sample data('library_stsys_sample', package = 'svrep') # First, sort the rows in the order used in sampling library_stsys_sample <- library_stsys_sample |> dplyr::arrange(SAMPLING_SORT_ORDER) # Create a survey design object survey_design <- svydesign( data = library_stsys_sample, ids = ~ 1, strata = ~ SAMPLING_STRATUM, fpc = ~ STRATUM_POP_SIZE ) # Obtain the quadratic form for the target estimator sd2_quad_form <- get_design_quad_form( design = survey_design, variance_estimator = \"SD2\" ) #> For `variance_estimator='SD2', assumes rows of data are sorted in the same order used in sampling. class(sd2_quad_form) #> [1] \"dsCMatrix\" #> attr(,\"package\") #> [1] \"Matrix\" dim(sd2_quad_form) #> [1] 219 219 # Obtain weighted values wtd_y <- as.matrix(library_stsys_sample[['LIBRARIA']] / library_stsys_sample[['SAMPLING_PROB']]) wtd_y[is.na(wtd_y)] <- 0 # Obtain point estimate for a population total point_estimate <- sum(wtd_y) # Obtain the variance estimate using the quadratic form variance_estimate <- t(wtd_y) %*% sd2_quad_form %*% wtd_y std_error <- sqrt(variance_estimate[1,1]) # Summarize results sprintf(\"Estimate: %s\", round(point_estimate)) #> [1] \"Estimate: 65642\" sprintf(\"Standard Error: %s\", round(std_error)) #> [1] \"Standard Error: 13972\""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"forming-adjustment-factors","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Details and Notation for the Generalized Survey Bootstrap Method","what":"Forming Adjustment Factors","title":"Bootstrap Methods for Surveys","text":"goal form \\(B\\) sets bootstrap weights, \\(b\\)-th set bootstrap weights vector length \\(n\\) denoted \\(\\mathbf{}^{(b)}\\), whose \\(k\\)-th value denoted \\(a_k^{(b)}\\). gives us \\(B\\) replicate estimates population total, \\(\\hat{T}_y^{*(b)}=\\sum_{k \\s} a_k^{(b)} \\breve{y}_k\\), \\(b=1, \\ldots B\\), can easily calculate estimate sampling variance. \\[ v_B\\left(\\hat{T}_y\\right)=\\frac{\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2}{B} \\] can write bootstrap variance estimator quadratic form: \\[ \\begin{aligned} v_B\\left(\\hat{T}_y\\right) &=\\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}} \\\\ \\textit{}& \\\\ \\boldsymbol{\\Sigma}_B &= \\frac{\\sum_{b=1}^B\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)^{\\prime}}{B} \\end{aligned} \\] Note vector adjustment factors \\(\\mathbf{}^{(b)}\\) expectation \\(\\mathbf{1}_n\\) variance-covariance matrix \\(\\boldsymbol{\\Sigma}\\), bootstrap expectation \\(E_{*}\\left( \\boldsymbol{\\Sigma}_B \\right) = \\boldsymbol{\\Sigma}\\). Since bootstrap process takes sample values \\(\\breve{y}\\) fixed, bootstrap expectation variance estimator \\(E_{*} \\left( \\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}}\\right)= \\mathbf{\\breve{y}}^{\\prime}\\Sigma \\mathbf{\\breve{y}}\\). Thus, can produce bootstrap variance estimator expectation textbook variance estimator simply randomly generating \\(\\mathbf{}^{(b)}\\) distribution following two conditions: Condition 1: \\(\\quad \\mathbf{E}_*(\\mathbf{})=\\mathbf{1}_n\\) Condition 2: \\(\\quad \\mathbf{E}_*\\left(\\mathbf{}-\\mathbf{1}_n\\right)\\left(\\mathbf{}-\\mathbf{1}_n\\right)^{\\prime}=\\mathbf{\\Sigma}\\) simplest, general way generate adjustment factors simulate multivariate normal distribution \\(\\mathbf{} \\sim MVN(\\mathbf{1}_n, \\boldsymbol{\\Sigma})\\), method used package. However, method can lead negative adjustment factors hence negative bootstrap weights, –perfectly valid variance estimation–may undesirable practical point view. Thus, following subsection, describe one method adjusting replicate factors nonnegative \\(E_{*} \\left( \\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}}\\right) =\\mathbf{\\breve{y}}^{\\prime}\\Sigma \\mathbf{\\breve{y}}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"adjusting-generalized-survey-bootstrap-replicates-to-avoid-negative-weights","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Details and Notation for the Generalized Survey Bootstrap Method","what":"Adjusting Generalized Survey Bootstrap Replicates to Avoid Negative Weights","title":"Bootstrap Methods for Surveys","text":"Let \\(\\mathbf{} = \\left[ \\mathbf{}^{(1)} \\cdots \\mathbf{}^{(b)} \\cdots \\mathbf{}^{(B)} \\right]\\) denote \\((n \\times B)\\) matrix bootstrap adjustment factors. eliminate negative adjustment factors, Beaumont Patak (2012) propose forming rescaled matrix nonnegative replicate factors \\(\\mathbf{}^S\\) rescaling adjustment factor \\(a_k^{(b)}\\) follows: \\[ \\begin{aligned} a_k^{S,(b)} &= \\frac{a_k^{(b)} + \\tau - 1}{\\tau} \\\\ \\textit{} \\tau &\\geq 1 - a_k^{(b)} \\geq 1 \\\\ &\\textit{}k \\textit{ } \\left\\{ 1,\\ldots,n \\right\\} \\\\ &\\textit{}b \\textit{ } \\left\\{1, \\ldots, B\\right\\} \\\\ \\end{aligned} \\] value \\(\\tau\\) can set based realized adjustment factor matrix \\(\\mathbf{}\\) choosing \\(\\tau\\) prior generating adjustment factor matrix \\(\\mathbf{}\\) \\(\\tau\\) likely large enough prevent negative bootstrap weights. adjustment factors rescaled manner, important adjust scale factor used estimating variance bootstrap replicates, becomes \\(\\frac{\\tau^2}{B}\\) instead \\(\\frac{1}{B}\\). \\[ \\begin{aligned} \\textbf{Prior rescaling: } v_B\\left(\\hat{T}_y\\right) &= \\frac{1}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2 \\\\ \\textbf{rescaling: } v_B\\left(\\hat{T}_y\\right) &= \\frac{\\tau^2}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{S*(b)}-\\hat{T}_y\\right)^2 \\\\ \\end{aligned} \\] sharing dataset uses rescaled weights generalized survey bootstrap, documentation dataset instruct user use replication scale factor \\(\\frac{\\tau^2}{B}\\) rather \\(\\frac{1}{B}\\) estimating sampling variances.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"implementation-1","dir":"Articles","previous_headings":"Generalized Survey Bootstrap","what":"Implementation","title":"Bootstrap Methods for Surveys","text":"two ways implement generalized survey bootstrap.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"option-1-convert-an-existing-design-to-a-generalized-bootstrap-design","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Implementation","what":"Option 1: Convert an existing design to a generalized bootstrap design","title":"Bootstrap Methods for Surveys","text":"simplest method convert existing survey design object generalized bootstrap design. approach, create survey design object using svydesign() function. allows us represent information stratification clustering (potentially multiple stages), well information finite population corrections. Next, convert survey design object replicate design using function as_gen_boot_design(). function argument variance_estimator allows us specify name variance estimator use basis creating replicate weights. PPS design uses Horvitz-Thompson Yates-Grundy estimator, can create generalized bootstrap estimator expectation. example , create PPS design ‘survey’ package convert generalied bootstrap design. can also use generalized bootstrap designs use multistage, stratified simple random sampling without replacement. Unless specified otherwise, as_gen_boot_design() automatically selects rescaling value \\(\\tau\\) use eliminating negative adjustment factors. scale attribute resulting replicate survey design object thus set equal \\(\\tau^2/B\\). specific value \\(\\tau\\) can retrieved replicate design object, follows.","code":"# Load example data from stratified systematic sample data('library_stsys_sample', package = 'svrep') # First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] # Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) # Convert to generalized bootstrap replicate design gen_boot_design_sd2 <- as_gen_boot_design( design = design_obj, variance_estimator = \"SD2\", replicates = 2000 ) #> For `variance_estimator='SD2', assumes rows of data are sorted in the same order used in sampling. # Estimate sampling variances svymean(x = ~ TOTSTAFF, na.rm = TRUE, design = gen_boot_design_sd2) #> mean SE #> TOTSTAFF 19.756 4.238 # Load example data of a PPS survey of counties and states data('election', package = 'survey') # Create survey design object pps_design_ht <- svydesign( data = election_pps, id = ~1, fpc = ~p, pps = ppsmat(election_jointprob), variance = \"HT\" ) # Convert to generalized bootstrap replicate design gen_boot_design_ht <- pps_design_ht |> as_gen_boot_design(variance_estimator = \"Horvitz-Thompson\", replicates = 5000, tau = \"auto\") # Compare sampling variances from bootstrap vs. Horvitz-Thompson estimator svytotal(x = ~ Bush + Kerry, design = pps_design_ht) svytotal(x = ~ Bush + Kerry, design = gen_boot_design_ht) library(dplyr) # For data manipulation # Create a multistage survey design multistage_design <- svydesign( data = library_multistage_sample |> mutate(Weight = 1/SAMPLING_PROB), ids = ~ PSU_ID + SSU_ID, fpc = ~ PSU_POP_SIZE + SSU_POP_SIZE, weights = ~ Weight ) # Convert to a generalized bootstrap design multistage_boot_design <- as_gen_boot_design( design = multistage_design, variance_estimator = \"Stratified Multistage SRS\" ) # Compare variance estimates svytotal(x = ~ TOTCIR, na.rm = TRUE, design = multistage_design) #> total SE #> TOTCIR 1634739229 251589313 svytotal(x = ~ TOTCIR, na.rm = TRUE, design = multistage_boot_design) #> total SE #> TOTCIR 1634739229 250754550 # View overall scale factor overall_scale_factor <- multistage_boot_design$scale print(overall_scale_factor) #> [1] 0.0458882 # Check that the scale factor was calculated correctly tau <- multistage_boot_design$tau print(tau) #> [1] 4.79 B <- ncol(multistage_boot_design$repweights) print(B) #> [1] 500 print( (tau^2) / B ) #> [1] 0.0458882"},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"option-2-create-the-quadratic-form-matrix-and-then-use-it-to-create-bootstrap-weights","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Implementation","what":"Option 2: Create the quadratic form matrix and then use it to create bootstrap weights","title":"Bootstrap Methods for Surveys","text":"generalized survey bootstrap can implemented two-step process: Step 1: Use make_quad_form_matrix() represent variance estimator quadratic form’s matrix. Step 2: Use make_gen_boot_factors() generate replicate factors based target quadratic form. function argument tau can used avoid negative adjustment factors using previously-described method. actual value tau used can extracted function’s output using attr() function. convenience, values use scale rscales arguments svrepdesign() included attributes adjustment factors created make_gen_boot_factors(). Using adjustment factors thus created, can create replicate survey design object using function svrepdesign(), arguments type = \"\" specifying scale argument use factor \\(\\tau^2/B\\). allows us estimate sampling variances, even quite complex sampling designs.","code":"# Load an example dataset of a stratified systematic sample data('library_stsys_sample', package = 'svrep') # Represent the SD2 successive-difference estimator as a quadratic form, # and obtain the matrix of that quadratic form sd2_quad_form <- make_quad_form_matrix( variance_estimator = 'SD2', cluster_ids = library_stsys_sample |> select(FSCSKEY), strata_ids = library_stsys_sample |> select(SAMPLING_STRATUM), strata_pop_sizes = library_stsys_sample |> select(STRATUM_POP_SIZE), sort_order = library_stsys_sample |> pull(\"SAMPLING_SORT_ORDER\") ) rep_adj_factors <- make_gen_boot_factors( Sigma = sd2_quad_form, num_replicates = 500, tau = \"auto\" ) tau <- attr(rep_adj_factors, 'tau') B <- ncol(rep_adj_factors) # Retrieve value of 'scale' rep_adj_factors |> attr('scale') #> [1] 0.041405 # Compare to manually-calculated value (tau^2) / B #> [1] 0.041405 # Retrieve value of 'rscales' rep_adj_factors |> attr('rscales') |> head() # Only show first 5 values #> [1] 1 1 1 1 1 1 gen_boot_design <- svrepdesign( data = library_stsys_sample |> mutate(SAMPLING_WEIGHT = 1/SAMPLING_PROB), repweights = rep_adj_factors, weights = ~ SAMPLING_WEIGHT, combined.weights = FALSE, type = \"other\", scale = attr(rep_adj_factors, 'scale'), rscales = attr(rep_adj_factors, 'rscales') ) gen_boot_design |> svymean(x = ~ TOTSTAFF, na.rm = TRUE, deff = TRUE) #> mean SE DEff #> TOTSTAFF 19.756 4.149 0.9455"},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"choosing-the-number-of-bootstrap-replicates","dir":"Articles","previous_headings":"","what":"Choosing the Number of Bootstrap Replicates","title":"Bootstrap Methods for Surveys","text":"bootstrap suffers unavoidable “simulation error” (also referred “Monte Carlo” error) caused using finite number replicates simulate ideal bootstrap estimate obtain used infinite number replicates. general, simulation error can reduced using larger number bootstrap replicates.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"general-strategy","dir":"Articles","previous_headings":"Choosing the Number of Bootstrap Replicates","what":"General Strategy","title":"Bootstrap Methods for Surveys","text":"many rule--thumb values number replicates used (say 500, others say 1,000), advisable instead use principled strategy choosing number replicates. One general strategy proposed Beaumont Patak (2012) follows: Step 1: Determine largest acceptable level simulation error key survey estimates. example, one might determine , average, bootstrap standard error estimate \\(\\pm 5\\%\\) different ideal bootstrap estimate. Step 2: Estimate key statistics interest using large number bootstrap replicates (5,000) save estimates bootstrap replicate. can conveniently done using function ‘survey’ package svymean(..., return.replicates = TRUE) withReplicates(..., return.replicates = TRUE). Step 3: Estimate minimum number bootstrap replicates needed reduce level simulation error target level. can done using ‘svrep’ function estimate_boot_reps_for_target_cv().","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"measuring-and-estimating-simulation-error","dir":"Articles","previous_headings":"Choosing the Number of Bootstrap Replicates","what":"Measuring and Estimating Simulation Error","title":"Bootstrap Methods for Surveys","text":"Simulation error can measured “simulation coefficient variation” (CV), ratio standard error bootstrap estimator expectation bootstrap estimator, expectation standard error evaluated respect bootstrapping process given selected sample. statistic \\(\\hat{\\theta}\\), simulation CV bootstrap variance estimator \\(v_{B}(\\hat{\\theta})\\) based \\(B\\) replicate estimates \\(\\hat{\\theta}^{\\star}_1,\\dots,\\hat{\\theta}^{\\star}_B\\) defined follows: simulation CV statistic, denoted \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\), can estimated given number replicates \\(B\\) estimating \\(CV_{\\star}(E_2)\\) using observed values dividing \\(\\sqrt{B}\\). result, one can thereby estimate number bootstrap replicates needed obtain target simulation CV, useful strategy determining number bootstrap replicates use survey. ‘svrep’ package, possible estimate number bootstrap replicates required obtain target simulation CV statistic. estimate simulation CV current number replicates used, possible use function estimate_boot_sim_cv().","code":"library(survey) data('api', package = 'survey') # Declare a bootstrap survey design object ---- boot_design <- svydesign( data = apistrat, weights = ~pw, id = ~1, strata = ~stype, fpc = ~fpc ) |> svrep::as_bootstrap_design(replicates = 5000) # Produce estimates of interest, and save the estimate from each replicate ---- estimated_means_and_proportions <- svymean(x = ~ api00 + api99 + stype, design = boot_design, return.replicates = TRUE) # Estimate the number of replicates needed to obtain a target simulation CV ---- estimate_boot_reps_for_target_cv( svrepstat = estimated_means_and_proportions, target_cv = c(0.01, 0.05, 0.10) ) #> TARGET_CV MAX_REPS api00 api99 stypeE stypeH stypeM #> 1 0.01 15068 6651 6650 15068 15068 15068 #> 2 0.05 603 267 266 603 603 603 #> 3 0.10 151 67 67 151 151 151 estimate_boot_sim_cv(estimated_means_and_proportions) #> STATISTIC SIMULATION_CV N_REPLICATES #> 1 api00 0.01153261 5000 #> 2 api99 0.01153177 5000 #> 3 stypeE 0.01735956 5000 #> 4 stypeH 0.01735950 5000 #> 5 stypeM 0.01735951 5000"},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"the-bootstrap-vs--other-replication-methods","dir":"Articles","previous_headings":"","what":"The Bootstrap vs. Other Replication Methods","title":"Bootstrap Methods for Surveys","text":"Far better approximate answer right question, often vague, exact answer wrong question, can always made precise. -- John Tukey Ok, approximate answer , like, really approximate requires whole lot computing? -- Survey sampling statisticians Survey bootstrap methods directly applicable wider variety sample designs jackknife balanced repeated replication (BRR). Nonetheless, complex survey designs often shoehorned jackknife BRR variance estimation pretending actual survey design something simpler. BRR method, instance, applicable samples two clusters sampled stratum, statisticians frequently use designs three sampled clusters grouping actual clusters two pseudo-clusters. designs large number sampling units stratum, exact jackknife (JK1 JKn) requires large number replicates often replaced “delete--group jackknife” (DAGJK) clusters randomly grouped larger pseudo-clusters. statisticians go effort shoehorn variance estimation problem jackknife BRR methods just use bootstrap? simple answer bootstrap methods generally require many replicates methods order obtain stable variance estimate. using large number replicates can problem large amount computing dataset large ’re concerned storage costs. Statistical agencies particularly sensitive concerns publish microdata, since agencies often serve large number end-users varying computational resources. use bootstrap? bootstrap tends works well larger class statistics jackknife. example, estimating sampling variance estimated median quantiles, jackknife tends perform poorly bootstrap methods least adequate job. Bootstrap methods enable different options forming confidence intervals. standard replication methods (BRR, Jackknife, etc.), confidence intervals generally formed using Wald interval (\\(\\hat{\\theta} \\pm \\hat{se}(\\hat{\\theta}) \\times z_{1-\\frac{\\alpha}{2}}\\)).2 certain bootstrap methods, possible also form confidence intervals using approaches, bootstrap percentile method. can analyze design rather approximation design, can reduce costs better control errors. use BRR general survey designs, approximate actual survey design “two PSUs per stratum” design. works surprisingly well many cases, requires careful work part specially-trained statistician. jackknife large number sampling units, either end number replicates bootstrap method randomly group sampling units smaller number can use DAGJK method essentially approximate actual survey design simpler one. , takes careful work part specially-trained statistician. analyzing design , don’t pay specially-trained statistician meticulously approximate design can shoehorned jackknife BRR variance estimation problem, perhaps best use limited budget. variance estimation based bootstrap method tailored actual survey design, replication error variance estimates key statistics unbiased can quantified controlled function number replicates. contrast, variance estimation based approximating design can shoehorned jackknife BRR variance estimation problem, replication error variance estimates difficult quantify can consist noise bias. statisticians, ’s probably easier learn. bootstrap well-known replication method among general statisticians, point ’s often taught first-year undergraduate statistics courses. basic idea already familiar even statisticians passing familiarity complex survey sampling. BRR, contrast, takes specialized training learn entails pre-requisite concepts Hadamard matrices, partial balancing, . Outside survey statistics, jackknife tends much less used (taught) compared bootstrap, due limitations non-smooth statistics complexity required make work efficiently large sample sizes.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"creating-initial-replicate-weights","dir":"Articles","previous_headings":"","what":"Creating initial replicate weights","title":"Nonresponse Adjustments","text":"begin , ’ll create bootstrap replicate weights. cases, can simply describing survey design using svydesign() function using function create appropriate replicate weights. function .svrepdesign() ‘survey’ package can used create several types replicate weights, using argument type (options 'JK1', 'JKn', 'bootstrap', 'BRR', 'Fay', etc.) addition, function as_bootstrap_design() can used create bootstrap weights using additional methods supported ‘survey’ package. convenience, ’ll convert survey design object object class tbl_svy, allows us use convenient tidyverse/dplyr syntax (group_by(), summarize(), etc.) well helpful functions srvyr package.","code":"# Describe the survey design lou_vax_survey <- svydesign(ids = ~ 1, weights = ~ SAMPLING_WEIGHT, data = lou_vax_survey) print(lou_vax_survey) #> Independent Sampling design (with replacement) #> svydesign(ids = ~1, weights = ~SAMPLING_WEIGHT, data = lou_vax_survey) # Create appropriate replicate weights lou_vax_survey <- lou_vax_survey |> as_bootstrap_design(replicates = 100, mse = TRUE, type = \"Rao-Wu-Yue-Beaumont\") print(lou_vax_survey) #> Call: as_bootstrap_design(lou_vax_survey, replicates = 100, mse = TRUE, #> type = \"Rao-Wu-Yue-Beaumont\") #> Survey bootstrap with 100 replicates and MSE variances. lou_vax_survey <- lou_vax_survey |> as_survey() print(lou_vax_survey) #> Call: Called via srvyr #> Survey bootstrap with 100 replicates and MSE variances. #> Data variables: RESPONSE_STATUS (chr), RACE_ETHNICITY (chr), SEX (chr), #> EDUC_ATTAINMENT (chr), VAX_STATUS (chr), SAMPLING_WEIGHT (dbl)"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"redistributing-weight-from-nonrespondents-to-respondents","dir":"Articles","previous_headings":"","what":"Redistributing weight from nonrespondents to respondents","title":"Nonresponse Adjustments","text":"common form nonresponse adjustment simply ‘redistribute’ weight nonrespondents respondents. words, weight nonrespondent set \\(0\\), weight respondent increased factor greater one sum adjusted weights sample respondents equals sum unadjusted weights full sample. example, sum weights among respondents \\(299,544.4\\) sum weights among nonrespondents \\(297,157.6\\), basic nonresponse adjustment set weights among nonrespondents \\(0\\) multiply weight respondent adjustment factor equal \\(1 + (297,157.6/299,544.4)\\). type adjustment succinctly described mathematical notation . ’ll illustrate type adjustment Louisville vaccination survey. First, ’ll inspect sum sampling weights respondents, nonrespondents, overall sample. Next, ’ll redistribute weight nonrespondents respondents using redistribute_weights() function, adjusts full-sample weights well set replicate weights. specify subset data weights reduced, supply logical expression argument reduce_if. specify subset data weights increased, supply logical expression argument increase_if. making adjustment, can check weight nonrespondents redistributed respondents.","code":"# Weights before adjustment lou_vax_survey |> group_by(RESPONSE_STATUS) |> cascade( `Sum of Weights` = sum(cur_svy_wts()), .fill = \"TOTAL\" ) #> # A tibble: 3 × 2 #> RESPONSE_STATUS `Sum of Weights` #> #> 1 Nonrespondent 297158. #> 2 Respondent 299544. #> 3 TOTAL 596702 # Conduct a basic nonresponse adjustment nr_adjusted_survey <- lou_vax_survey |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\" ) # Check the sum of full-sample weights by response status nr_adjusted_survey |> group_by(RESPONSE_STATUS) |> cascade( `Sum of Weights` = sum(cur_svy_wts()), .fill = \"TOTAL\" ) #> # A tibble: 3 × 2 #> RESPONSE_STATUS `Sum of Weights` #> #> 1 Nonrespondent 0 #> 2 Respondent 596702 #> 3 TOTAL 596702 # Check sums of replicate weights by response status nr_adjusted_survey |> summarize_rep_weights( type = \"specific\", by = \"RESPONSE_STATUS\" ) |> arrange(Rep_Column, RESPONSE_STATUS) |> head(10) #> RESPONSE_STATUS Rep_Column N N_NONZERO SUM MEAN CV MIN #> 1 Nonrespondent 1 498 0 0 0.000 NaN 0 #> 2 Respondent 1 502 323 596702 1188.649 0.9949470 0 #> 3 Nonrespondent 2 498 0 0 0.000 NaN 0 #> 4 Respondent 2 502 333 596702 1188.649 0.9430159 0 #> 5 Nonrespondent 3 498 0 0 0.000 NaN 0 #> 6 Respondent 3 502 317 596702 1188.649 0.9999765 0 #> 7 Nonrespondent 4 498 0 0 0.000 NaN 0 #> 8 Respondent 4 502 316 596702 1188.649 1.0087387 0 #> 9 Nonrespondent 5 498 0 0 0.000 NaN 0 #> 10 Respondent 5 502 320 596702 1188.649 0.9701576 0 #> MAX #> 1 0.000 #> 2 6780.705 #> 3 0.000 #> 4 5770.812 #> 5 0.000 #> 6 5907.941 #> 7 0.000 #> 8 7306.555 #> 9 0.000 #> 10 4891.000"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"conducting-weighting-class-adjustments","dir":"Articles","previous_headings":"","what":"Conducting weighting class adjustments","title":"Nonresponse Adjustments","text":"Nonresponse bias liable occur different subpopulations systematically differ terms response rates survey also differ terms survey trying measure (case, vaccination status). example, can see fairly large differences response rates across different race/ethnicity groups. Weighting adjustments may able help reduce nonresponse bias caused differences response rates. One standard form adjustment known weighting class adjustment redistribute weights nonrespondents respondents separately different categories auxiliary variables (race/ethnicity). survey textbook Heeringa, West, Berglund (2017) provides excellent overview weighting class adjustments. implement weighting class adjustment svrep package, can simply use argument redistribute_weights(). Multiple grouping variables may supplied argument. example, one can specify = c(\"STRATUM\", \"RACE_ETHNICITY\") redistribute weights separately combinations stratum race/ethnicity category.","code":"lou_vax_survey |> group_by(RACE_ETHNICITY) |> summarize(Response_Rate = mean(RESPONSE_STATUS == \"Respondent\"), Sample_Size = n(), n_Respondents = sum(RESPONSE_STATUS == \"Respondent\")) #> # A tibble: 4 × 4 #> RACE_ETHNICITY Response_Rate Sample_Size n_Respondents #> #> 1 Black or African American alone, not … 0.452 188 85 #> 2 Hispanic or Latino 0.378 45 17 #> 3 Other Race, not Hispanic or Latino 0.492 59 29 #> 4 White alone, not Hispanic or Latino 0.524 708 371 nr_adjusted_survey <- lou_vax_survey |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\", by = c(\"RACE_ETHNICITY\") )"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"propensity-cell-adjustment","dir":"Articles","previous_headings":"Conducting weighting class adjustments","what":"Propensity cell adjustment","title":"Nonresponse Adjustments","text":"popular method forming weighting classes based estimated response propensities (known propensity cell adjustment) can also used, example adding variable PROPENSITY_CELL data using redistribute_weights(..., = \"PROPENSITY_CELL\").","code":"# Fit a response propensity model response_propensity_model <- lou_vax_survey |> mutate(IS_RESPONDENT = ifelse(RESPONSE_STATUS == \"Respondent\", 1, 0)) |> svyglm(formula = IS_RESPONDENT ~ RACE_ETHNICITY + EDUC_ATTAINMENT, family = quasibinomial(link = 'logit')) # Predict response propensities for individual cases lou_vax_survey <- lou_vax_survey |> mutate( RESPONSE_PROPENSITY = predict(response_propensity_model, newdata = cur_svy(), type = \"response\") ) # Divide sample into propensity classes lou_vax_survey <- lou_vax_survey |> mutate(PROPENSITY_CELL = ntile(x = RESPONSE_PROPENSITY, n = 5)) lou_vax_survey |> group_by(PROPENSITY_CELL) |> summarize(n = n(), min = min(RESPONSE_PROPENSITY), mean = mean(RESPONSE_PROPENSITY), max = max(RESPONSE_PROPENSITY)) #> # A tibble: 5 × 5 #> PROPENSITY_CELL n min mean max #> #> 1 1 200 0.357 0.424 0.459 #> 2 2 200 0.459 0.484 0.488 #> 3 3 200 0.488 0.488 0.512 #> 4 4 200 0.512 0.551 0.564 #> 5 5 200 0.564 0.564 0.564 # Redistribute weights by propensity class nr_adjusted_survey <- lou_vax_survey |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\", by = \"PROPENSITY_CELL\" ) # Inspect weights before adjustment lou_vax_survey |> summarize_rep_weights(type = \"specific\", by = c(\"PROPENSITY_CELL\")) |> arrange(Rep_Column, PROPENSITY_CELL) |> select(PROPENSITY_CELL, Rep_Column, N_NONZERO, SUM) |> head(10) #> PROPENSITY_CELL Rep_Column N_NONZERO SUM #> 1 1 1 120 117668.0 #> 2 2 1 121 118265.3 #> 3 3 1 126 121251.8 #> 4 4 1 119 111097.7 #> 5 5 1 130 128419.3 #> 6 1 2 123 114681.5 #> 7 2 2 125 123043.7 #> 8 3 2 123 115876.1 #> 9 4 2 133 120654.5 #> 10 5 2 133 122446.4 # Inspect weights after adjustment nr_adjusted_survey |> summarize_rep_weights(type = \"specific\", by = c(\"PROPENSITY_CELL\", \"RESPONSE_STATUS\")) |> arrange(Rep_Column, PROPENSITY_CELL, RESPONSE_STATUS) |> select(PROPENSITY_CELL, RESPONSE_STATUS, Rep_Column, N_NONZERO, SUM) |> head(10) #> PROPENSITY_CELL RESPONSE_STATUS Rep_Column N_NONZERO SUM #> 1 1 Nonrespondent 1 0 0.0 #> 2 1 Respondent 1 55 117668.0 #> 3 2 Nonrespondent 1 0 0.0 #> 4 2 Respondent 1 59 118265.3 #> 5 3 Nonrespondent 1 0 0.0 #> 6 3 Respondent 1 64 121251.8 #> 7 4 Nonrespondent 1 0 0.0 #> 8 4 Respondent 1 65 111097.7 #> 9 5 Nonrespondent 1 0 0.0 #> 10 5 Respondent 1 80 128419.3"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"saving-the-final-weights-to-a-data-file","dir":"Articles","previous_headings":"","what":"Saving the final weights to a data file","title":"Nonresponse Adjustments","text":"’re satisfied weights, can create data frame analysis variables columns replicate weights. format easy export data files can loaded R software later.","code":"data_frame_with_nr_adjusted_weights <- nr_adjusted_survey |> as_data_frame_with_weights( full_wgt_name = \"NR_ADJ_WGT\", rep_wgt_prefix = \"NR_ADJ_REP_WGT_\" ) # Preview first few column names colnames(data_frame_with_nr_adjusted_weights) |> head(12) #> [1] \"RESPONSE_STATUS\" \"RACE_ETHNICITY\" \"SEX\" #> [4] \"EDUC_ATTAINMENT\" \"VAX_STATUS\" \"SAMPLING_WEIGHT\" #> [7] \"RESPONSE_PROPENSITY\" \"PROPENSITY_CELL\" \"NR_ADJ_WGT\" #> [10] \"NR_ADJ_REP_WGT_1\" \"NR_ADJ_REP_WGT_2\" \"NR_ADJ_REP_WGT_3\" # Write the data to a CSV file write.csv( x = data_frame_with_nr_adjusted_weights, file = \"survey-data-with-nonresponse-adjusted-weights.csv\" )"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"statistical-background","dir":"Articles","previous_headings":"","what":"Statistical background","title":"Nonresponse Adjustments","text":"motivation making adjustment standard methods statistical inference assume every person population known, nonzero probability participating survey (.e. nonzero chance sampled nonzero chance responding sampled), denoted \\(p_{,overall}\\). Basic results survey sampling theory guarantee assumption true, can produce unbiased estimates population means totals weighting data respondent weight \\(1/{p_{,overall}}\\). Crucially, overall probability participation \\(p_{,overall}\\) product two components: probability person sampled (denoted \\(\\pi_i\\)), probability person respond survey sampled (denoted \\(p_i\\) referred “response propensity”). sampling probability \\(\\pi_i\\) known since can control method sampling, response propensity \\(p_i\\) unknown can estimated. \\[ \\begin{aligned} w^{*}_i &= 1/p_{,overall} \\text{ (weights needed unbiased estimation)} \\\\ p_{,overall} &= \\pi_i \\times p_i \\\\ \\pi_i &= \\textbf{Sampling probability} \\\\ &\\textit{.e. probability case }\\textit{ randomly sampled } \\text{ (}\\textit{Known}\\text{)} \\\\ p_i &= \\textbf{Response propensity} \\\\ &\\textit{.e. probability case }\\textit{ responds, sampled } \\text{ (}\\textit{Unknown}\\text{)} \\\\ \\end{aligned} \\] component \\(p_i\\) must estimated using data (estimate \\(\\hat{p}_i\\)) nonresponse-adjusted weights respondents can formed \\(w_{NR,} = 1/(\\pi_i \\times \\hat{p}_i)\\) used obtain approximately unbiased estimates population means totals. use earlier notation, nonresponse adjustment factor respondents \\(f_{NR,}\\) actually defined using \\(1/\\hat{p}_i\\). \\[ \\begin{aligned} w_i &= \\textit{Original sampling weight case }\\\\ &= 1/\\pi_i, \\textit{ } \\pi_i \\textit{ probability case }\\textit{sampled}\\\\ w_{NR, } &= w_i \\times f_{NR,} = \\textit{Weight case }\\textit{ nonresponse adjustment} \\\\ \\\\ f_{NR,} &= \\begin{cases} 0 & \\text{case } \\text{ nonrespondent} \\\\ 1 / \\hat{p}_i & \\text{case } \\text{ respondent} \\\\ \\end{cases} \\\\ \\hat{p}_i &= \\textbf{Estimated response propensity} \\end{aligned} \\] essence, different methods nonresponse weighting adjustments vary terms estimate \\(\\hat{p}_i\\). basic weight redistribution method effect estimates \\(p_i\\) constant across \\(\\), equal overall weighted response rate, uses form weights. words, basic weight redistribution essentially way forming adjustment factor \\(f_{NR,}\\) based estimated response propensity \\(\\hat{p}_i = \\frac{\\sum_{\\s_{resp}}w_i}{\\sum_{\\s}w_i}\\). Weighting class adjustments propensity cell adjustments essentially refined ways forming \\(f_{NR,}\\) estimating \\(p_i\\) realistic model, \\(p_i\\) constant across entire sample instead varies among weighting classes propensity cells. reason conducting weighting adjustments full-sample weights replicate weights account nonresponse adjustment process estimating sampling variances inferential statistics confidence intervals. random sampling, adjustment factors used nonresponse adjustment vary one sample next, applying weighting adjustments separately replicate reflects variability. ’ve seen vignette, redistribute_weights() function handles us: nonresponse adjustment, weight replicate redistributed manner weight redistributed full-sample weights.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"recommended-reading","dir":"Articles","previous_headings":"","what":"Recommended Reading","title":"Nonresponse Adjustments","text":"See Chapter 2, Section 2.7.3 “Applied Survey Data Analysis” statistical explanation weighting adjustments described vignette. Heeringa, S., West, B., Berglund, P. (2017). Applied Survey Data Analysis, 2nd edition. Boca Raton, FL: CRC Press. Chapter 13 “Practical Tools Designing Weighting Survey Samples” also provides excellent overview nonresponse adjustment methods. Valliant, R., Dever, J., Kreuter, F. (2018). Practical Tools Designing Weighting Survey Samples, 2nd edition. New York: Springer.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"sample-based-calibration-an-introduction","dir":"Articles","previous_headings":"","what":"Sample-based Calibration: An Introduction","title":"Calibrating to Estimated Control Totals","text":"Calibration weighting adjustments post-stratification raking often helpful reducing sampling variance non-sampling errors nonresponse bias. Typically, benchmark data used calibration adjustments estimates published agencies United States Census Bureau. example, pollsters United States frequently rake polling data estimates variables age educational attainment match benchmark estimates American Community Survey (ACS). benchmark data (also known control totals) raking calibration often treated “true” population values, usually estimates sampling variance margin error. calibrate estimated control totals rather “true” population values, may need account variance estimated control totals ensure calibrated estimates appropriately reflect sampling error primary survey interest survey control totals estimated. especially important control totals large margins error. handful statistical methods developed problem conducting replication variance estimation sample-based calibration; see Opsomer Erciulescu (2021) clear overview literature topic. methods apply calibration weighting adjustment full-sample weights column replicate weights. key “trick” methods adjust column replicate weights slightly different set control totals, varying control totals used across replicates way variation across columns sense proportionate sampling variance control totals. statistical methods differ way generate different control totals column replicate weights type data require analyst use. method Fuller (1998) requires analyst variance-covariance matrix estimated control totals, method Opsomer Erciulescu (2021) requires analyst use full dataset control survey along associated replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"functions-for-implementing-sample-based-calibration","dir":"Articles","previous_headings":"","what":"Functions for Implementing Sample-Based Calibration","title":"Calibrating to Estimated Control Totals","text":"‘svrep’ package provides two functions implement sample-based calibration. function calibrate_to_estimate(), adjustments replicate weights conducted using method Fuller (1998), requiring variance-covariance matrix estimated control totals. function calibrate_to_sample(), adjustments replicate weights conducted using method proposed Opsomer Erciulescu (2021), requiring dataset replicate weights use estimating control totals sampling variance. functions, possible use variety calibration options survey package’s calibrate() function. example, user can specify specific calibration function use, calfun = survey::cal.linear implement post-stratification calfun = survey::cal.raking implement raking. bounds argument can used specify bounds calibration weights, arguments maxit epsilon allow finer control Newton-Raphson algorithm used implement calibration.","code":"calibrate_to_estimate( rep_design = rep_design, estimate = vector_of_control_totals, vcov_estimate = variance_covariance_matrix_for_controls, cal_formula = ~ CALIBRATION_VARIABLE_1 + CALIBRATION_VARIABLE_2 + ..., ) calibrate_to_sample( primary_rep_design = primary_rep_design, control_rep_design = control_rep_design cal_formula = ~ CALIBRATION_VARIABLE_1 + CALIBRATION_VARIABLE_2 + ..., )"},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"an-example-using-a-vaccination-survey","dir":"Articles","previous_headings":"","what":"An Example Using a Vaccination Survey","title":"Calibrating to Estimated Control Totals","text":"illustrate different methods conducting sample-based calibration, ’ll use example survey measuring Covid-19 vaccination status handful demographic variables, based simple random sample 1,000 residents Louisville, Kentucky. purpose variance estimation, ’ll create jackknife replicate weights. survey’s key outcome, vaccination status, measured respondents, ’ll quick nonresponse weighting adjustment help make reasonable estimates outcome. work far given us replicate design primary survey, prepared calibration. Now need obtain benchmark data can use calibration. ’ll use Public-Use Microdata Sample (PUMS) dataset ACS source benchmark data race/ethnicity, sex, educational attainment. Next, ’ll prepare PUMS data use replication variance estimation using provided replicate weights. conduction calibration, make sure data control survey represent population primary survey. Since Louisville vaccination survey represents adults, need subset control survey design adults. addition, need ensure control survey design calibration variables align variables primary survey design interest. may require data manipulation.","code":"# Load the data library(svrep) data(\"lou_vax_survey\") # Inspect the first few rows head(lou_vax_survey) |> knitr::kable() suppressPackageStartupMessages( library(survey) ) lou_vax_survey_rep <- svydesign( data = lou_vax_survey, ids = ~ 1, weights = ~ SAMPLING_WEIGHT ) |> as.svrepdesign(type = \"JK1\", mse = TRUE) #> Call: as.svrepdesign.default(svydesign(data = lou_vax_survey, ids = ~1, #> weights = ~SAMPLING_WEIGHT), type = \"JK1\", mse = TRUE) #> Unstratified cluster jacknife (JK1) with 1000 replicates and MSE variances. # Conduct nonresponse weighting adjustment nr_adjusted_design <- lou_vax_survey_rep |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\" ) |> subset(RESPONSE_STATUS == \"Respondent\") # Inspect the result of the adjustment rbind( 'Original' = summarize_rep_weights(lou_vax_survey_rep, type = 'overall'), 'NR-adjusted' = summarize_rep_weights(nr_adjusted_design, type = 'overall') )[,c(\"nrows\", \"rank\", \"avg_wgt_sum\", \"sd_wgt_sums\")] #> nrows rank avg_wgt_sum sd_wgt_sums #> Original 1000 1000 596702 0.000000e+00 #> NR-adjusted 502 502 596702 8.219437e-11 data(\"lou_pums_microdata\") # Inspect some of the rows/columns of data ---- tail(lou_pums_microdata, n = 5) |> dplyr::select(AGE, SEX, RACE_ETHNICITY, EDUC_ATTAINMENT) |> knitr::kable() # Convert to a survey design object ---- pums_rep_design <- svrepdesign( data = lou_pums_microdata, weights = ~ PWGTP, repweights = \"PWGTP\\\\d{1,2}\", type = \"successive-difference\", variables = ~ AGE + SEX + RACE_ETHNICITY + EDUC_ATTAINMENT, mse = TRUE ) pums_rep_design #> Call: svrepdesign.default(data = lou_pums_microdata, weights = ~PWGTP, #> repweights = \"PWGTP\\\\d{1,2}\", type = \"successive-difference\", #> variables = ~AGE + SEX + RACE_ETHNICITY + EDUC_ATTAINMENT, #> mse = TRUE) #> with 80 replicates and MSE variances. # Subset to only include adults pums_rep_design <- pums_rep_design |> subset(AGE >= 18) suppressPackageStartupMessages( library(dplyr) ) # Check that variables match across data sources ---- pums_rep_design$variables |> dplyr::distinct(RACE_ETHNICITY) #> RACE_ETHNICITY #> 1 Black or African American alone, not Hispanic or Latino #> 2 White alone, not Hispanic or Latino #> 3 Hispanic or Latino #> 4 Other Race, not Hispanic or Latino setdiff(lou_vax_survey_rep$variables$RACE_ETHNICITY, pums_rep_design$variables$RACE_ETHNICITY) #> character(0) setdiff(lou_vax_survey_rep$variables$SEX, pums_rep_design$variables$SEX) #> character(0) setdiff(lou_vax_survey_rep$variables$EDUC_ATTAINMENT, pums_rep_design$variables$EDUC_ATTAINMENT) #> character(0) # Estimates from the control survey (ACS) svymean( design = pums_rep_design, x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) #> mean #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 0.19950 #> RACE_ETHNICITYHispanic or Latino 0.04525 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 0.04631 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 0.70894 #> SEXMale 0.47543 #> SEXFemale 0.52457 #> EDUC_ATTAINMENTHigh school or beyond 0.38736 #> EDUC_ATTAINMENTLess than high school 0.61264 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 0.0010 #> RACE_ETHNICITYHispanic or Latino 0.0002 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 0.0008 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 0.0007 #> SEXMale 0.0007 #> SEXFemale 0.0007 #> EDUC_ATTAINMENTHigh school or beyond 0.0033 #> EDUC_ATTAINMENTLess than high school 0.0033 # Estimates from the primary survey (Louisville vaccination survey) svymean( design = nr_adjusted_design, x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) #> mean #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 0.169323 #> RACE_ETHNICITYHispanic or Latino 0.033865 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 0.057769 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 0.739044 #> SEXFemale 0.535857 #> SEXMale 0.464143 #> EDUC_ATTAINMENTHigh school or beyond 0.458167 #> EDUC_ATTAINMENTLess than high school 0.541833 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 0.0168 #> RACE_ETHNICITYHispanic or Latino 0.0081 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 0.0104 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 0.0196 #> SEXFemale 0.0223 #> SEXMale 0.0223 #> EDUC_ATTAINMENTHigh school or beyond 0.0223 #> EDUC_ATTAINMENTLess than high school 0.0223"},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"raking-to-estimated-control-totals","dir":"Articles","previous_headings":"An Example Using a Vaccination Survey","what":"Raking to estimated control totals","title":"Calibrating to Estimated Control Totals","text":"’ll start raking estimates ACS race/ethnicity, sex, educational attainment, first using calibrate_to_sample() method using calibrate_to_estimate() method. calibrate_to_sample() method, need obtain vector point estimates control totals, accompanying variance-covariance matrix estimates. Crucially, note vector control totals names estimates produced using svytotal() primary survey design object whose weights plan adjust. calibrate design estimates, supply estimates variance-covariance matrix calibrate_to_estimate(), supply cal_formula argument formula use svytotal(). use raking adjustment, specify calfun = survey::cal.raking. Now can compare estimated totals calibration variables actual control totals. might intuitively expect, estimated totals survey now match control totals, standard errors estimated totals match standard errors control totals. can now see effect raking adjustment primary estimate interest, overall Covid-19 vaccination rate. raking adjustment reduced estimate vaccination rate one percentage point results similar standard error estimate. Instead raking using vector control totals variance-covariance matrix, instead done raking simply supplying two replicate design objects function calibrate_to_sample(). uses Opsomer-Erciulescu method adjusting replicate weights, contrast calibrate_to_estimate(), uses Fuller’s method adjusting replicate weights. can see two methods yield identical point estimates full-sample weights, standard errors match nearly exactly calibration variables (race/ethnicity, sex, educational attainment). However, small slightly noticeable differences standard errors variables, VAX_STATUS, resulting fact two methods different methods adjusting replicate weights. Opsomer Erciulescu (2021) explain differences two methods discuss Opsomer-Erciulescu method used calibrate_to_sample() may better statistical properties Fuller method used calibrate_to_estimate().","code":"acs_control_totals <- svytotal( x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, design = pums_rep_design ) control_totals_for_raking <- list( 'estimates' = coef(acs_control_totals), 'variance-covariance' = vcov(acs_control_totals) ) # Inspect point estimates control_totals_for_raking$estimates #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino #> 119041 #> RACE_ETHNICITYHispanic or Latino #> 27001 #> RACE_ETHNICITYOther Race, not Hispanic or Latino #> 27633 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino #> 423027 #> SEXMale #> 283688 #> SEXFemale #> 313014 #> EDUC_ATTAINMENTHigh school or beyond #> 231136 #> EDUC_ATTAINMENTLess than high school #> 365566 # Inspect a few rows of the control totals' variance-covariance matrix control_totals_for_raking$`variance-covariance`[5:8,5:8] |> `colnames<-`(NULL) #> [,1] [,2] [,3] [,4] #> SEXMale 355572.45 -29522.95 129208.95 196840.6 #> SEXFemale -29522.95 379494.65 81455.95 268515.8 #> EDUC_ATTAINMENTHigh school or beyond 129208.95 81455.95 4019242.10 -3808577.2 #> EDUC_ATTAINMENTLess than high school 196840.55 268515.75 -3808577.20 4273933.5 svytotal(x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, design = nr_adjusted_design) #> total #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 101035 #> RACE_ETHNICITYHispanic or Latino 20207 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 34471 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 440989 #> SEXFemale 319747 #> SEXMale 276955 #> EDUC_ATTAINMENTHigh school or beyond 273389 #> EDUC_ATTAINMENTLess than high school 323313 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 10003.0 #> RACE_ETHNICITYHispanic or Latino 4824.4 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 6222.7 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 11713.1 #> SEXFemale 13301.6 #> SEXMale 13301.6 #> EDUC_ATTAINMENTHigh school or beyond 13289.2 #> EDUC_ATTAINMENTLess than high school 13289.2 raked_design <- calibrate_to_estimate( rep_design = nr_adjusted_design, estimate = control_totals_for_raking$estimates, vcov_estimate = control_totals_for_raking$`variance-covariance`, cal_formula = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, calfun = survey::cal.raking, # Required for raking epsilon = 1e-9 ) #> Selection of replicate columns whose control totals will be perturbed will be done at random. #> For tips on reproducible selection, see `help('calibrate_to_estimate')` # Estimated totals after calibration svytotal(x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, design = raked_design) #> total #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 119041 #> RACE_ETHNICITYHispanic or Latino 27001 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 27633 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 423027 #> SEXFemale 313014 #> SEXMale 283688 #> EDUC_ATTAINMENTHigh school or beyond 231136 #> EDUC_ATTAINMENTLess than high school 365566 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 633.63 #> RACE_ETHNICITYHispanic or Latino 107.98 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 472.41 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 594.14 #> SEXFemale 616.03 #> SEXMale 596.30 #> EDUC_ATTAINMENTHigh school or beyond 2004.80 #> EDUC_ATTAINMENTLess than high school 2067.35 # Matches the control totals! cbind( 'total' = control_totals_for_raking$estimates, 'SE' = control_totals_for_raking$`variance-covariance` |> diag() |> sqrt() ) #> total #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 119041 #> RACE_ETHNICITYHispanic or Latino 27001 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 27633 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 423027 #> SEXMale 283688 #> SEXFemale 313014 #> EDUC_ATTAINMENTHigh school or beyond 231136 #> EDUC_ATTAINMENTLess than high school 365566 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 633.6287 #> RACE_ETHNICITYHispanic or Latino 107.9829 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 472.4107 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 594.1448 #> SEXMale 596.2990 #> SEXFemale 616.0314 #> EDUC_ATTAINMENTHigh school or beyond 2004.8048 #> EDUC_ATTAINMENTLess than high school 2067.3494 estimates_by_design <- svyby_repwts( rep_designs = list( \"NR-adjusted\" = nr_adjusted_design, \"Raked\" = raked_design ), FUN = svytotal, formula = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) t(estimates_by_design[,-1]) |> knitr::kable() raked_design_opsomer_erciulescu <- calibrate_to_sample( primary_rep_design = nr_adjusted_design, control_rep_design = pums_rep_design, cal_formula = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, calfun = survey::cal.raking, epsilon = 1e-9 ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')` estimates_by_design <- svyby_repwts( rep_designs = list( \"calibrate_to_estimate()\" = raked_design, \"calibrate_to_sample()\" = raked_design_opsomer_erciulescu ), FUN = svytotal, formula = ~ VAX_STATUS + RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) t(estimates_by_design[,-1]) |> knitr::kable()"},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"post-stratification","dir":"Articles","previous_headings":"An Example Using a Vaccination Survey","what":"Post-stratification","title":"Calibrating to Estimated Control Totals","text":"primary difference post-stratification raking post-stratification essentially involves single calibration variable, population benchmarks provided value variable. Louisville vaccination survey, variable called POSTSTRATUM based combinations race/ethnicity, sex, educational attainment. post-stratify design, can either supply estimates variance-covariance matrix calibrate_to_estimate(), can supply two replicate design objects calibrate_to_sample(). either method, need supply cal_formula argument formula use svytotal(). use post-stratification adjustment (rather raking), specify calfun = survey::cal.linear. raking example, can see full-sample post-stratified estimates exactly two methods. standard errors post-stratification variables essentially identical, standard errors variables differ slightly.","code":"# Create matching post-stratification variable in both datasets nr_adjusted_design <- nr_adjusted_design |> transform(POSTSTRATUM = interaction(RACE_ETHNICITY, SEX, EDUC_ATTAINMENT, sep = \"|\")) pums_rep_design <- pums_rep_design |> transform(POSTSTRATUM = interaction(RACE_ETHNICITY, SEX, EDUC_ATTAINMENT, sep = \"|\")) levels(pums_rep_design$variables$POSTSTRATUM) <- levels( nr_adjusted_design$variables$POSTSTRATUM ) # Estimate control totals acs_control_totals <- svytotal( x = ~ POSTSTRATUM, design = pums_rep_design ) poststratification_totals <- list( 'estimate' = coef(acs_control_totals), 'variance-covariance' = vcov(acs_control_totals) ) # Inspect the control totals poststratification_totals$estimate |> as.data.frame() |> `colnames<-`('estimate') |> knitr::kable() # Post-stratify the design using the estimates poststrat_design_fuller <- calibrate_to_estimate( rep_design = nr_adjusted_design, estimate = poststratification_totals$estimate, vcov_estimate = poststratification_totals$`variance-covariance`, cal_formula = ~ POSTSTRATUM, # Specify the post-stratification variable calfun = survey::cal.linear # This option is required for post-stratification ) #> Selection of replicate columns whose control totals will be perturbed will be done at random. #> For tips on reproducible selection, see `help('calibrate_to_estimate')` # Post-stratify the design using the two samples poststrat_design_opsomer_erciulescu <- calibrate_to_sample( primary_rep_design = nr_adjusted_design, control_rep_design = pums_rep_design, cal_formula = ~ POSTSTRATUM, # Specify the post-stratification variable calfun = survey::cal.linear # This option is required for post-stratification ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')` estimates_by_design <- svyby_repwts( rep_designs = list( \"calibrate_to_estimate()\" = poststrat_design_fuller, \"calibrate_to_sample()\" = poststrat_design_opsomer_erciulescu ), FUN = svymean, formula = ~ VAX_STATUS + RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) t(estimates_by_design[,-1]) |> knitr::kable()"},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"reproducibility","dir":"Articles","previous_headings":"","what":"Reproducibility","title":"Calibrating to Estimated Control Totals","text":"calibration methods calibrate_to_estimate() calibrate_to_sample() involve one element randomization: determining columns replicate weights assigned given perturbation control totals. calibrate_to_sample() method Fuller (1998), control totals vector dimension \\(p\\), \\(p\\) columns replicate weights calibrated \\(p\\) different vectors perturbed control totals, formed using \\(p\\) scaled eigenvectors spectral decomposition control totals’ variance-covariance matrix (sorted order largest smallest eigenvalues). control columns replicate weights calibrated set perturbed control totals, can use function argument col_selection. calibrated survey design object contains element perturbed_control_cols indicates columns calibrated perturbed control totals; can useful save use input col_selection ensure reproducibility. calibrate_to_sample(), matching done columns replicate weights primary survey columns replicate weights control survey. matching done random unless user specifies otherwise using argument control_col_matches. Louisville Vaccination Survey, primary survey 1,000 replicates control survey 80 columns. can match 80 columns 1,000 replicates specifying 1,000 values consisting NA integers 1 80. calibrated survey design object contains element control_column_matches control survey replicate primary survey replicate column matched .","code":"# Randomly select which columns will be assigned to each set of perturbed control totals dimension_of_control_totals <- length(poststratification_totals$estimate) columns_to_perturb <- sample(x = 1:ncol(nr_adjusted_design$repweights), size = dimension_of_control_totals) print(columns_to_perturb) #> [1] 339 307 843 526 478 908 577 874 563 557 929 34 816 39 349 776 # Perform the calibration poststratified_design <- calibrate_to_estimate( rep_design = nr_adjusted_design, estimate = poststratification_totals$estimate, vcov_estimate = poststratification_totals$`variance-covariance`, cal_formula = ~ POSTSTRATUM, calfun = survey::cal.linear, col_selection = columns_to_perturb # Specified for reproducibility ) poststratified_design$perturbed_control_cols #> NULL # Randomly match the primary replicates to control replicates set.seed(1999) column_matching <- rep(NA, times = ncol(nr_adjusted_design$repweights)) column_matching[sample(x = 1:1000, size = 80)] <- 1:80 str(column_matching) #> int [1:1000] NA NA NA 34 NA NA NA 68 NA NA ... # Perform the calibration poststratified_design <- calibrate_to_sample( primary_rep_design = nr_adjusted_design, control_rep_design = pums_rep_design, cal_formula = ~ POSTSTRATUM, calfun = survey::cal.linear, control_col_matches = column_matching ) str(poststratified_design$control_column_matches) #> int [1:1000] NA NA NA 34 NA NA NA 68 NA NA ..."},{"path":[]},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"two-phase-sampling-vs--multistage-sampling","dir":"Articles","previous_headings":"","what":"Two-phase Sampling vs. Multistage Sampling","title":"Replication Methods for Two-phase Sampling","text":"Two-phase sampling (also known “double sampling”) common feature surveys. two-phase sample, large first-phase sample selected, smaller second-phase sample selected first-phase sample. Multistage cluster sampling special case two-phase sampling, second-phase sample secondary sampling units (SSUs) selected first-phase sample primary sampling units (PSUs). specific case multistage sampling, second-phase sampling SSUs must sample least one SSU within PSU must sample independently across PSUs (words, PSU treated stratum second-phase sampling). Two-phase sampling general restrictions: second-phase sample design can arbitrary, primary sampling units might appear second-phase sample.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"applications-of-two-phase-sampling","dir":"Articles","previous_headings":"","what":"Applications of Two-Phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"flexibility two-phase sampling can quite valuable, reason two-phase samples commonly-used practice. highlight two common applications two-phase sampling : given survey conducted using online panel necessarily two-phase sample, panel recruitment represents first phase sampling process requesting panelists participate specific survey represents second phase sampling. Often, recruitment sampling quite complex (e.g., three-stage stratified cluster sampling), sampling panelists given survey conducted using simple random sampling stratified simple random sampling list panelists. Statistical agencies often reduce cost small survey drawing sample respondents larger survey ’s already conducted. example, U.S. Census Bureau conducts National Survey College Graduates (NSCG) sampling households responded American Community Survey (ACS). Similarly, National Study Caregiving (NSOC) conducted sampling respondents National Health Aging Trends Study (NHATS). information first-phase sample useful design analysis second-phase sample. design standpoint, information collected first-phase sample can used stratify units assign unequal sampling probabilities second-phase sampling, can result precise estimates relative using simple random sampling. analysis standpoint, information collected first-phase sample can also used improve estimators, using raking, post-stratification, generalized regression (GREG) calibrate small second-phase sample large first-phase sample.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"replicate-variance-estimation-with-the-svrep-package","dir":"Articles","previous_headings":"","what":"Replicate Variance Estimation with the ‘svrep’ Package","title":"Replication Methods for Two-phase Sampling","text":"vignette, ’ll show use generalized bootstrap estimate sampling variances estimates based two-phase sample designs. types replication jackknife balanced repeated replication (BRR) can theoretically used, ‘svrep’ package implements two-phase replication methods generalized bootstrap Fay’s generalized replication method. theory, replication methods can used two-phase samples, applicability much limited.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"overview-of-the-generalized-bootstrap","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Overview of the Generalized Bootstrap","title":"Replication Methods for Two-phase Sampling","text":"basic idea generalized bootstrap “mimic” target variance estimator population totals, target variance estimator appropriate particular sampling design can written quadratic form. example, generalized bootstrap can mimic Horvitz-Thompson estimator usual variance estimator used simple random sampling. precise, “mimic”, mean generalized bootstrap variance estimate population total average exactly matches variance estimate produced target variance estimator. order mimic target variance estimator, specify target variance estimator population total \\(\\hat{Y}=\\sum_{=1}^{n}(y_i/\\pi_i)\\) quadratic form. , specify variance estimator \\(v(\\hat{Y})\\) \\(v(\\hat{Y})=\\sum_{=1}^{n}\\sum_{=1}^{n} \\sigma_{ij}(w_iy_i)(w_jy_j)\\), set values \\(\\sigma_{ij},,j \\\\{1,\\dots,n\\}\\). matrix notation, write \\(v(\\hat{Y})=\\breve{y}^{\\prime}\\Sigma\\breve{y}\\), \\(\\Sigma\\) symmetric, positive semi-definite matrix dimension \\(n \\times n\\), element \\(ij\\) equal \\(\\sigma_{ij}\\), \\(\\breve{y}\\) vector whose \\(\\)-th element \\(w_iy_i\\). using generalized bootstrap, difficult part variance estimation process simply identifying quadratic form. quadratic form written , easy create replicate weights using generalized bootstrap. Fortunately, ‘svrep’ package can automatically identify appropriate quadratic form use variance estimators many single-phase two-phase sample designs. user simply needs supply necessary data, describe survey design, select target variance estimator use phase sampling. broad overview generalized survey bootstrap use ‘svrep’ package, reader encouraged read ‘svrep’ package vignette titled “Bootstrap Methods Surveys”. thorough overview generalized survey bootstrap theory, Beaumont Patak (2012) provide clear introduction several useful suggestions implementation practice. present vignette simply describes application generalized bootstrap two-phase samples can implemented ‘svrep’ package.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"creating-example-data","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Creating Example Data","title":"Replication Methods for Two-phase Sampling","text":"example , create two-phase survey design: first phase stratified multistage sample, first stage sample PSUs selected using unequal probability sampling without replacement (PPSWOR) second stage sample selected using simple random sampling without replacement (SRSWOR). second phase sample simple random sample without replacement first phase sample. type design fairly typical survey conducted online panel, panel recruitment uses complex design sampling panelists given survey uses simple random sampling panelists. particular dataset ’ll use comes Public Libraries Survey (PLS), annual survey public libraries U.S, data FY2020.","code":"data('library_multistage_sample', package = 'svrep') # Load first-phase sample twophase_sample <- library_multistage_sample # Select second-phase sample set.seed(2020) twophase_sample[['SECOND_PHASE_SELECTION']] <- sampling::srswor( n = 100, N = nrow(twophase_sample) ) |> as.logical()"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"describing-the-two-phase-survey-design","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Describing the Two-phase Survey Design","title":"Replication Methods for Two-phase Sampling","text":"Next, use ‘survey’ package’s function twophase() describe sample design phase, terms stratification, clustering, probabilities, population sizes. Note use list() arguments, first element list describes first phase sampling, second element list describes second phase sampling.","code":"# Declare survey design twophase_design <- twophase( method = \"full\", data = twophase_sample, # Identify the subset of first-phase elements # which were selected into the second-phase sample subset = ~ SECOND_PHASE_SELECTION, # Describe clusters, probabilities, and population sizes # at each phase of sampling id = list(~ PSU_ID + SSU_ID, ~ 1), probs = list(~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, NULL), fpc = list(~ PSU_POP_SIZE + SSU_POP_SIZE, NULL) )"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"creating-generalized-bootstrap-replicates","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Creating Generalized Bootstrap Replicates","title":"Replication Methods for Two-phase Sampling","text":"two-phase design described, can use as_gen_boot_design() function create generalized bootstrap replicate weights. requires us specify desired number replicates target variance estimator phase sampling. Note different target variance estimators may used phase, since phase might different design. result replicate survey design object can used estimation usual functions ‘survey’ ‘srvyr’ packages. using as_gen_boot_design() two-phase designs, ’s useful know often see warning message needing approximate first-phase variance estimator’s quadratic form. can see output , function emitted warning message. generalized bootstrap works mimicking variance estimator requires variance estimator represented positive semidefinite qudratic form. two-phase designs, however, often case usual variance estimator represented exactly positive semidefinite quadratic form. cases, Beaumont Patak (2012) suggest using approximation actual quadratic form matrix similar positive semidefinite matrix. approximation general never lead underestimation variance, Beaumont Patak (2012) argue produce small overestimate variance practice. Section 5 vignette provides details approximation.","code":"# Obtain a generalized bootstrap replicates # based on # - The phase 1 estimator is the usual variance estimator # for stratified multistage simple random sampling # - The phase 2 estimator is the usual variance estimator # for single-stage simple random sampling twophase_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( \"Phase 1\" = \"Stratified Multistage SRS\", \"Phase 2\" = \"Ultimate Cluster\" ), replicates = 1000 ) twophase_boot_design |> svymean(x = ~ LIBRARIA, na.rm = TRUE) #> mean SE #> LIBRARIA 7.6044 1.8419 twophase_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( \"Phase 1\" = \"Stratified Multistage SRS\", \"Phase 2\" = \"Ultimate Cluster\" ) ) #> Warning in as_gen_boot_design.twophase2(design = twophase_design, #> variance_estimator = list(`Phase 1` = \"Stratified Multistage SRS\", : The sample #> quadratic form matrix for this design and variance estimator is not positive #> semidefinite. It will be approximated by the nearest positive semidefinite #> matrix."},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"create-replicates-using-fays-generalized-replication-method","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Create Replicates Using Fay’s Generalized Replication Method","title":"Replication Methods for Two-phase Sampling","text":"Instead generalized bootstrap, can instead use Fay’s generalized replication method. R code looks almost exactly generalized bootstrap. key difference programming standpoint use argument max_replicates specify maximum number replicates can created. function determines fewer max_replicates needed obtain fully-efficient variance estimator, actual number replicates created less max_replicates.","code":"twophase_genrep_design <- as_fays_gen_rep_design( design = twophase_design, variance_estimator = list( \"Phase 1\" = \"Stratified Multistage SRS\", \"Phase 2\" = \"Ultimate Cluster\" ), max_replicates = 500 ) #> Warning in as_fays_gen_rep_design.twophase2(design = twophase_design, #> variance_estimator = list(`Phase 1` = \"Stratified Multistage SRS\", : The sample #> quadratic form matrix for this design and variance estimator is not positive #> semidefinite. It will be approximated by the nearest positive semidefinite #> matrix."},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"calibrating-second-phase-weights-to-first-phase-estimates","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Calibrating Second-phase Weights to First-phase Estimates","title":"Replication Methods for Two-phase Sampling","text":"two-phase sampling, can helpful calibrate weights small second-phase sample using estimates produced larger, reliable first-phase sample. main reason produce precise estimates variables measured second-phase sample, calibration effective calibration variables associated second-phase variables interest. calibration also nice forces second-phase estimates calibration variables match first-phase estimates, thus improving consistency two sets estimates. Calibrating weights second-phase sample straightforward can done using usual software methods. However, care needed ensure resulting variance estimates appropriately reflect fact calibrating estimates rather known population values. fairly easy replication methods used variance estimation, requires use appropriate functions ‘svrep’ package. Section 4.3.1 memo discusses theory replicate variance estimation two-phase calibration, based detailed treatments topic Fuller (1998) Lohr (2022). general process using ‘svrep’ package calibrate second-phase sample first-phase estimates ensuring replicate weights adjusted appropriately purpose variance estimation. two useful functions ‘svrep’ package purpose, present “Option 1” “Option 2” following overview.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"preliminaries","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package > Calibrating Second-phase Weights to First-phase Estimates","what":"Preliminaries","title":"Replication Methods for Two-phase Sampling","text":"Ensure calibration variables missing values First, need ensure variables want use calibration missing values either first-phase second-phase sample. imputation might necessary. (haven’t already) Create replicate weights second-phase sample calibration, need create replicate weights second-phase sample appropriately reflect sampling variance entire two-phase design. already document, ’ll repeat code .","code":"# Impute missing values (if necessary) twophase_sample <- twophase_sample |> mutate( TOTCIR = ifelse( is.na(TOTCIR), stats::weighted.mean(TOTCIR, na.rm = TRUE, w = 1/SAMPLING_PROB), TOTCIR ), TOTSTAFF = ifelse( is.na(TOTSTAFF), stats::weighted.mean(TOTSTAFF, na.rm = TRUE, w = 1/SAMPLING_PROB), TOTSTAFF ) ) # Describe the two-phase survey design twophase_design <- twophase( method = \"full\", data = twophase_sample, # Identify the subset of first-phase elements # which were selected into the second-phase sample subset = ~ SECOND_PHASE_SELECTION, # Describe clusters, probabilities, and population sizes # at each phase of sampling id = list(~ PSU_ID + SSU_ID, ~ 1), probs = list(~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, NULL), fpc = list(~ PSU_POP_SIZE + SSU_POP_SIZE, NULL) ) # Create replicate weights for the second-phase sample # (meant to reflect variance of the entire two-phase design) twophase_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( \"Phase 1\" = \"Stratified Multistage SRS\", \"Phase 2\" = \"Ultimate Cluster\" ), replicates = 1000, mse = TRUE )"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"option-1-calibrate-to-a-set-of-estimates-and-their-variance-covariance-matrix","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package > Calibrating Second-phase Weights to First-phase Estimates","what":"Option 1: Calibrate to a set of estimates and their variance-covariance matrix","title":"Replication Methods for Two-phase Sampling","text":"approach, use data first-phase sample produce estimated totals use calibration second-phase sample. ensure calibration second-phase sample appropriately reflects variance first-phase estimated totals, also need estimate variance first-phase totals. many ways estimate first-phase variance, convenience ’ll use generalized bootstrap. ’ve estimated first-phase totals, can use function calibrate_to_estimate() calibrate two-phase survey design object first-phase totals. function discussed detail vignette titled “Sample-based Calibration”, underlying method described Fuller (1998). Let’s examine results calibration. First, ’ll check calibrated second-phase estimates match first-phase estimates. Next, ’ll inspect estimate variable wasn’t used calibration.","code":"# Extract a survey design object representing the first phase sample first_phase_design <- twophase_design$phase1$full # Create replicate weights for the first-phase sample first_phase_gen_boot <- as_gen_boot_design( design = first_phase_design, variance_estimator = \"Stratified Multistage SRS\", replicates = 1000 ) # Estimate first-phase totals and their sampling-covariance first_phase_estimates <- svytotal( x = ~ TOTCIR + TOTSTAFF, design = first_phase_gen_boot ) first_phase_totals <- coef(first_phase_estimates) first_phase_vcov <- vcov(first_phase_estimates) print(first_phase_totals) #> TOTCIR TOTSTAFF #> 1648795905.4 152846.6 print(first_phase_vcov) #> TOTCIR TOTSTAFF #> TOTCIR 6.606150e+16 5.853993e+12 #> TOTSTAFF 5.853993e+12 5.747174e+08 #> attr(,\"means\") #> [1] 1648121469.6 152702.4 calibrated_twophase_design <- calibrate_to_estimate( rep_design = twophase_boot_design, # Specify the variables in the data to use for calibration cal_formula = ~ TOTCIR + TOTSTAFF, # Supply the first-phase estimates and their variance estimate = first_phase_totals, vcov_estimate = first_phase_vcov, ) #> Selection of replicate columns whose control totals will be perturbed will be done at random. #> For tips on reproducible selection, see `help('calibrate_to_estimate')` # Display second-phase estimates for calibration variables svytotal( x = ~ TOTCIR + TOTSTAFF, design = calibrated_twophase_design ) #> total SE #> TOTCIR 1648795905 257024311 #> TOTSTAFF 152847 23973 # Display the original first-phase estimates (which are identical!) print(first_phase_estimates) #> total SE #> TOTCIR 1648795905 257024311 #> TOTSTAFF 152847 23973 # Inspect calibrated second-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = calibrated_twophase_design ) #> total SE #> LIBRARIA 57355 12308 # Compare to uncalibrated second-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = twophase_boot_design ) #> total SE #> LIBRARIA 54368 12039 # Compare to first-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = first_phase_gen_boot ) #> total SE #> LIBRARIA 55696 9171.3"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"option-2-calibrate-to-independently-generated-first-phase-replicates","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package > Calibrating Second-phase Weights to First-phase Estimates","what":"Option 2: Calibrate to independently-generated first-phase replicates","title":"Replication Methods for Two-phase Sampling","text":"data first-phase sample available replicate weights created first-phase sample, arguably better method available handle calibration. can simply produce replicate estimates first-phase totals using first-phase replicate, can calibrate second-phase replicate one first-phase replicate totals. , first create replicate weights first-phase design using generalized bootstrap (replication method). ’ve created first-phase replicates, can use function calibrate_to_sample() calibrate two-phase survey design object replicate estimates created using first-phase replicate design. function discussed detail vignette titled “Sample-based Calibration”. See Section 4.3.1 vignette underlying theory, based Fuller (1998) Opsomer Erciulescu (2021).1 Let’s examine results calibration. First, ’ll check calibrated second-phase estimates match first-phase estimates. expected, variance estimate calibrated second-phase estimate variance estimate first-phase estimate, allowing small tolerance numeric differences. Next, ’ll inspect estimate variable wasn’t used calibration.","code":"# Extract a survey design object representing the first phase sample first_phase_design <- twophase_design$phase1$full # Create replicate weights for the first-phase sample first_phase_gen_boot <- as_gen_boot_design( design = first_phase_design, variance_estimator = \"Stratified Multistage SRS\", replicates = 1000 ) calibrated_twophase_design <- calibrate_to_sample( primary_rep_design = twophase_boot_design, # Supply the first-phase replicate design control_rep_design = first_phase_gen_boot, # Specify the variables in the data to use for calibration cal_formula = ~ TOTCIR + TOTSTAFF ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')` # Display second-phase estimates for calibration variables calibrated_ests <- svytotal( x = ~ TOTCIR + TOTSTAFF, design = calibrated_twophase_design ) print(calibrated_ests) #> total SE #> TOTCIR 1648795905 242527993 #> TOTSTAFF 152847 22856 # Display the original first-phase estimates (which are identical!) first_phase_ests <- svytotal( x = ~ TOTCIR + TOTSTAFF, design = first_phase_gen_boot ) print(first_phase_ests) #> total SE #> TOTCIR 1648795905 242515035 #> TOTSTAFF 152847 22854 ratio_of_variances <- vcov(calibrated_ests)/vcov(first_phase_ests) ratio_of_variances #> TOTCIR TOTSTAFF #> TOTCIR 1.0001069 0.9998445 #> TOTSTAFF 0.9998445 1.0002008 #> attr(,\"means\") #> TOTCIR TOTSTAFF #> 1648795905.4 152846.6 # Inspect calibrated second-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = calibrated_twophase_design ) #> total SE #> LIBRARIA 57355 11958 # Compare to uncalibrated second-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = twophase_boot_design ) #> total SE #> LIBRARIA 54368 12039 # Compare to first-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = first_phase_gen_boot ) #> total SE #> LIBRARIA 55696 8876.4"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"ratio-estimation","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Ratio Estimation","title":"Replication Methods for Two-phase Sampling","text":"special case calibration commonly used two-phase samples ratio estimation. Whether use function calibrate_to_sample() calibrate_to_estimate(), syntax similar. Note ratio estimation, calibration formula includes -1 ensure ratio estimation used instead regression estimation. similar , fitting regression model R, use lm(y ~ -1 + x) fit linear model without intercept. Specifying parameter variance = 1 indicates working model used calibration homoskedastic, adjustment factor used every case’s weights. can seen compare adjusted weights unadjusted weights. Note adjustment factor weights simply ratio first-phase estimated total second-phase estimated total.","code":"ratio_calib_design <- calibrate_to_sample( primary_rep_design = twophase_boot_design, # Supply the first-phase replicate design control_rep_design = first_phase_gen_boot, # Specify the GREG formula. # For ratio estimation, we add `-1` to the formula # (i.e., we remove the intercept from the working model) # and specify only a single variable cal_formula = ~ -1 + TOTSTAFF, variance = 1 ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')` ratio_adjusted_weights <- weights(ratio_calib_design, type = \"sampling\") unadjusted_weights <- weights(twophase_boot_design, type = \"sampling\") adjustment_factors <- ratio_adjusted_weights/unadjusted_weights head(adjustment_factors) #> 1 3 5 7 10 13 #> 1.090189 1.090189 1.090189 1.090189 1.090189 1.090189 phase1_total <- svytotal( x = ~ TOTSTAFF, first_phase_design ) |> coef() phase2_total <- svytotal( x = ~ TOTSTAFF, twophase_boot_design ) |> coef() phase1_total/phase2_total #> TOTSTAFF #> 1.090189"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"design-based-estimators-for-two-phase-sampling","dir":"Articles","previous_headings":"","what":"Design-based Estimators for Two-phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"section , first describe double expansion estimator (DEE) produces unbiased estimates two-phase samples, using information sampling design phases. Next, describe calibration estimators adjust weights double-expansion estimator sampling variances can reduced using information first-phase sample. ’ll examine theoretical sampling variance estimator well approaches estimating variance using replication methods. interested reader encouraged consult chapter 9.3 Särndal, Swensson, Wretman (1992) chapter 12 Lohr (2022) detailed discussion two-phase sampling.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"notation","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling","what":"Notation","title":"Replication Methods for Two-phase Sampling","text":"use following notation denote sample size.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"notation-for-samples-and-sample-size","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > Notation","what":"Notation for Samples and Sample Size","title":"Replication Methods for Two-phase Sampling","text":"\\[ \\begin{aligned} s_a &: \\text{set units first-phase sample} \\\\ s_b &: \\text{set units second-phase sample} \\\\ & \\space \\space \\space \\text{Note }s_b \\text{ subset } s_a \\\\ n_a &: \\text{number units }s_1 \\\\ n_b &: \\text{number units }s_2 \\\\ \\end{aligned} \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"notation-for-probabilities-and-weights","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > Notation","what":"Notation for Probabilities and Weights","title":"Replication Methods for Two-phase Sampling","text":"use following notation denote inclusion probability unit, phase: \\[ \\begin{aligned} \\pi^{()}_{} &: \\text{probability unit }\\text{ included } s_a \\\\ \\pi^{(b|s_a)}_{} &: \\text{conditional probability unit }\\text{ included } s_b, \\\\ & \\text{ given realized first-phase sample }s_a \\\\ \\pi_i &: \\text{} \\textbf{unconditional} \\text{ probability unit }\\text{ included }s_b \\\\ \\end{aligned} \\] practice, probability \\(\\pi_i\\) prohibitively difficult calculate, requires us figure \\(\\pi^{(b|s_a)}_{}\\) every possible first-phase sample \\(s_a\\), just particular \\(s_a\\) actually selected. instead, define useful quantity \\(\\pi^{*}\\), depends particular first-phase sample \\(s_a\\) actually selected. \\[ \\pi_i^{*} := \\pi^{(b|s_a)}_{} \\times \\pi^{()}_{} \\] variance estimation, ’s also necessary consider joint inclusion probability (sometimes referred “second order probability”), simply probability pair units \\(\\) \\(j\\) included sample. \\[ \\begin{aligned} \\pi^{()}_{ij} &: \\text{probability units }\\text{ } j \\text{ included } s_a \\\\ \\pi^{(b|s_a)}_{ij} &: \\text{conditional probability units }\\text{ } j \\text{ included } s_b, \\\\ & \\text{ given realized first-phase sample }s_a \\\\ \\end{aligned} \\] also define quantity \\(\\pi^{*}_{ij}\\) similar \\(\\pi^{*}_i\\). \\[ \\pi_{ij}^{*} := \\pi^{(b|s_a)}_{ij} \\times \\pi^{()}_{ij} \\] probabilities \\(\\pi_{}^{*}\\) values used define sampling weights survey. \\[ \\begin{aligned} w^{()}_i &:= 1/\\pi^{()}_i \\\\ w^{(b|s_a)}_i &:= 1/\\pi^{(b|s_a)}_{} \\\\ w^{*}_i &:= 1/\\pi^{*}_i = w^{(b|s_a)}_i \\times w^{()}_i \\end{aligned} \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"the-double-expansion-estimator","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling","what":"The Double Expansion Estimator","title":"Replication Methods for Two-phase Sampling","text":"Suppose wish estimate population total \\(Y\\), using observed values \\(y_i\\) second-phase sample, \\(s_b\\). Särndal, Swensson, Wretman (1992) show can produce unbiased estimate \\(Y\\) using second-phase sample \\(s_b\\), follows: \\[ \\begin{aligned} \\hat{Y}^{(b)} &= \\sum_{=1}^{n_{(b)}} w^{*}_i \\times y_i \\\\ &= \\sum_{=1}^{n_{(b)}} w^{(b|s_a)}_i \\times w^{()}_i \\times y_i \\end{aligned} \\] estimator dubbed “double expansion estimator”, using sampling jargon refers weighting sample value \\(y_i\\) “expanding” \\(y_i\\) sample population. name “double expansion” used weight \\(w^{*}_i\\) can thought first using weight \\(w^{(b|s_a)}_i\\) “expand” quantity \\(y_i\\) using weight \\(w^{()}_i\\) expand quantity \\(w^{(b|s_a)}_i \\times y_i\\).","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"variance-of-the-double-expansion-estimator","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > The Double Expansion Estimator","what":"Variance of the Double Expansion Estimator","title":"Replication Methods for Two-phase Sampling","text":"sampling variance double expansion estimator sum two different components. \\[ \\begin{aligned} V\\left(\\hat{Y}^{(b)}\\right) &= V\\left(\\hat{Y}^{()}\\right)+E\\left(V\\left[\\hat{Y}^{(b)} \\mid s_a \\right]\\right) \\\\ \\\\ \\text{: }& \\hat{Y}^{()} = \\sum_{=1}^{n_{()}} w^{()}_i \\times y_i \\\\ \\text{}& V\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\text{ variance } \\hat{Y}^{(b)} \\\\ &\\text{ across samples } s_b \\\\ &\\text{ drawn given } s_a \\end{aligned} \\] first component variance estimate \\(\\hat{Y}^{()}\\) obtain used entire first-phase sample \\(s_a\\) estimate, rather using subset \\(s_b\\). second component additional variance caused using subset \\(s_b\\) instead \\(s_a\\). equal expected value (across samples \\(s_a\\)) conditional variance \\(\\hat{Y}^{(b)}\\) across samples \\(s_b\\) (conditioning given first-phase sample \\(s_a\\)).","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"estimating-the-variance-of-the-double-expansion-estimator","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > The Double Expansion Estimator > Variance of the Double Expansion Estimator","what":"Estimating the Variance of the Double Expansion Estimator","title":"Replication Methods for Two-phase Sampling","text":"variance components can estimated using values \\(y_i\\) observed \\(s_b\\). second component, simply estimate \\(V\\left[\\hat{Y}^{(b)} \\mid s_a \\right]\\), unbiased estimate expectation, \\(E\\left(V\\left[\\hat{Y}^{(b)} \\mid s_a \\right]\\right)\\). Thus, variance estimate double expansion estimator takes following form: \\[ \\hat{V}\\left(\\hat{Y}^{(b)}\\right) = \\hat{V}\\left[\\hat{Y}^{()} \\right] + \\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"estimating-the-second-phase-variance-component","dir":"Articles","previous_headings":"","what":"Replication Methods for Two-phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"estimating \\(\\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right]\\), simply choose variance estimator second-phase design, taking first-phase sample given. assume variance estimator can written quadratic form. \\[ \\begin{aligned} \\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] &= \\sum_{=1}^{n_b} \\sum_{=1}^{n_b} \\sigma^{(b)}_{ij} (w^{*}_i y_i) (w^{*}_j y_j) \\\\ \\end{aligned} \\] Horvitz-Thompson estimator, instance, use \\(\\sigma^{(b)}_{ij}=\\left(1 - \\frac{\\pi^{b|s_a}_i\\pi^{b|s_a}_j}{\\pi^{b|s_a}_{ij}}\\right)\\). quadratic form can also written matrix notation: \\[ \\begin{aligned} \\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] &= {(W^{*} y)}^{\\prime} \\Sigma_b {(W^{*} y)} \\\\ \\text{}& \\Sigma_b \\text{ } n_b \\times n_b \\text{ symmetric matrix} \\\\ & \\text{ entry } ij \\text{ equal } \\sigma^{(b)}_{ij} \\\\ \\text{} & W^{*} \\text{ } n_b \\times n_b \\text{ diagonal matrix} \\\\ & \\text{ entry } ii \\text{ equal } w^{*}_i \\\\ & y \\text{ } n_b \\times 1 \\text{ vector values} \\\\ & \\text{variable interest} \\end{aligned} \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"estimating-the-first-phase-variance-component","dir":"Articles","previous_headings":"","what":"Replication Methods for Two-phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"Estimating first variance component, \\(V\\left(\\hat{Y}^{()}\\right)\\), slightly trickier. First, need choose variance estimator appropriate first-phase design, use \\(y_i\\) observed entire sample \\(s_a\\). ’ll denote variance estimator \\(\\tilde{V}\\left[\\hat{Y}^{()}\\right]\\). \\[ \\begin{aligned} \\tilde{V}\\left[\\hat{Y}^{()} \\right] &= \\sum_{=1}^{n_a} \\sum_{=1}^{n_a} \\sigma^{()}_{ij} (w^{()}_i y_i) (w^{()}_i y_j) \\\\ \\end{aligned} \\] matrix notation, can write: \\[ \\begin{aligned} \\tilde{V}\\left[\\hat{Y}^{()} \\right] &= {(W^{()} y)}^{\\prime} (\\Sigma_{} ) {(W^{()} y)} \\\\ \\text{}& \\Sigma_{} \\text{ } n_a \\times n_a \\text{ symmetric matrix} \\\\ & \\text{ entry } ij \\text{ equal } \\sigma_{ij} \\\\ \\text{} & W^{()} \\text{ } n_a \\times n_a \\text{ diagonal matrix} \\\\ & \\text{ entry } ii \\text{ equal } w^{()}_i \\end{aligned} \\] However, since ’re working subsample \\(s_b\\) instead \\(s_a\\), need estimate \\(\\tilde{V}\\left[\\hat{Y}^{()} \\right]\\) using data \\(s_b\\). can use second-phase joint inclusion probabilities \\(\\pi^{(b \\mid s_a)}_{ij}\\) produce unbiased estimate \\(\\tilde{V}\\left[\\hat{Y}^{()} \\right]\\) using data \\(s_b\\). \\[ \\begin{aligned} \\hat{V}\\left[\\hat{Y}^{()} \\right] &= \\sum_{=1}^{n_b} \\sum_{=1}^{n_b} \\frac{1}{\\pi^{(b \\mid s_a)}_{ij}} \\sigma^{()}_{ij} (w^{()}_i y_i) (w^{()}_i y_j) \\\\ \\end{aligned} \\] can also write matrix notation: \\[ \\begin{aligned} \\hat{V}\\left[\\hat{Y}^{()} \\right] &= {(W^{()} y)}^{\\prime} (\\Sigma_{^{\\prime}} \\circ D_b ) {(W^{()} y)} \\\\ \\text{}& \\Sigma_{^{\\prime}} \\text{ } n_b \\times n_b \\text{ symmetric matrix} \\\\ & \\text{ entry } ij \\text{ equal } \\sigma_{ij} \\\\ \\text{} & W^{()} \\text{ } n_b \\times n_b \\text{ diagonal matrix} \\\\ & \\text{ entry } ii \\text{ equal } w^{()}_i \\\\ \\text{ }& D_b \\text{ } n_b \\times n_b \\text{ symmetric matrix} \\\\ & \\text{ entry } ij \\text{ equal } \\frac{1}{\\pi^{(b \\mid s_a)}_{ij}}\\\\ \\end{aligned} \\] sidenote, matrix \\(D_b\\) likely source warning messages ’ll see two-phase variance estimator positive semidefinite. 2","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"combining-the-two-estimated-variance-components","dir":"Articles","previous_headings":"","what":"Replication Methods for Two-phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"Putting two estimated variance components together, thus obtain following unbiased variance estimator double expansion estimator. \\[ \\begin{aligned} \\hat{V}\\left(\\hat{Y}^{(b)}\\right) &= \\hat{V}\\left(\\hat{Y}^{()}\\right)+\\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\\\ &= \\sum_{=1}^{n_b} \\sum_{=1}^{n_b} \\frac{1}{\\pi^{(b \\mid s_a)}_{ij}} \\sigma^{()}_{ij} (w^{()}_i y_i) (w^{()}_i y_j) \\\\ &+ \\sum_{=1}^{n_b} \\sum_{=1}^{n_b} \\sigma^{(b)}_{ij} (w^{*}_i y_i) (w^{*}_j y_j) \\\\ \\end{aligned} \\] matrix notation, can write follows: \\[ \\begin{aligned} \\hat{V}\\left(\\hat{Y}^{(b)}\\right) &= \\hat{V}\\left(\\hat{Y}^{()}\\right)+\\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\\\ &= {(W^{()} y)}^{\\prime} (\\Sigma_{^{\\prime}} \\circ D_b ) {(W^{()} y)} \\\\ &+ {(W^{*} y)}^{\\prime} \\Sigma_b {(W^{*} y)} \\\\ \\end{aligned} \\] quadratic forms additive \\(W^{*}=W^{()}W^{(b \\mid s_a)}\\), can compactly write estimator follows: \\[ \\begin{aligned} \\hat{V}\\left(\\hat{Y}^{(b)}\\right) &= (W^{*}y)^{\\prime} \\Sigma_{ab} (W^{*}y) \\\\ \\text{} & \\\\ \\Sigma_{ab} &= {W^{(b)}}^{-1} (\\Sigma_{^{\\prime}} \\circ D_b ) {W^{(b)}}^{-1} + \\Sigma_b \\\\ \\text{} & W^{(b)} \\text{ } n_b \\times n_b \\text{ diagonal matrix} \\\\ & \\text{ entry } ii \\text{ equal } w^{(b \\mid s_a)}_i \\end{aligned} \\] ‘svrep’ package, \\(\\Sigma_{ab}\\) can constructed inputs \\(\\Sigma_{^{\\prime}}\\), \\(\\Sigma_b\\), \\((1/D_b)\\), using function make_twophase_quad_form(). matrix notation useful understanding replication methods variance estimation two-phase samples. unbiased replication variance estimator two-phase samples generate set adjustment factors sets replicate weights expectation \\(\\mathbf{1}_{n_b}\\) variance-covariance matrix \\(\\boldsymbol{\\Sigma}_{ab}\\). generalized bootstrap generating draws multivariate normal distribution parameters. specific combinations simple first-phase second-phase designs, jackknife BRR methods developed accomplish goal (see Lohr (2022) examples). generalized bootstrap however much easier use complex designs actually encountered settings also enjoys advantages 3.","code":"set.seed(2022) y <- rnorm(n = 100) # Select first phase sample, SRS without replacement phase_1_sample_indicators <- sampling::srswor(n = 50, N = 100) |> as.logical() phase_1_sample <- y[phase_1_sample_indicators] # Make variance estimator for first-phase variance component Sigma_a <- make_quad_form_matrix( variance_estimator = \"Ultimate Cluster\", cluster_ids = as.matrix(1:50), strata_ids = rep(1, times = 50) |> as.matrix(), strata_pop_sizes = rep(100, times = 50) |> as.matrix() ) # Select second stage sample, SRS without replacment phase_2_sample_indicators <- sampling::srswor(n = 5, N = 50) |> as.logical() phase_2_sample <- phase_1_sample[phase_2_sample_indicators] # Estimate two-phase variance Sigma_a_prime <- Sigma_a[phase_2_sample_indicators, phase_2_sample_indicators] phase_2_joint_probs <- outer(rep(5/50, times = 5), rep(4/49, times = 5)) diag(phase_2_joint_probs) <- rep(5/50, times = 5) Sigma_b <- make_quad_form_matrix( variance_estimator = \"Ultimate Cluster\", cluster_ids = as.matrix(1:5), strata_ids = rep(1, times = 5) |> as.matrix(), strata_pop_sizes = rep(50, times = 5) |> as.matrix() ) sigma_ab <- make_twophase_quad_form( sigma_1 = Sigma_a_prime, sigma_2 = Sigma_b, phase_2_joint_probs = phase_2_joint_probs ) wts <- rep( (50/100)^(-1) * (5/50)^(-1), times = 5 ) W_star <- diag(wts) W_star_y <- W_star %*% phase_2_sample t(W_star_y) %*% sigma_ab %*% (W_star_y) #> 1 x 1 Matrix of class \"dgeMatrix\" #> [,1] #> [1,] 2182.221 # Since both phases are SRS without replacement, # variance estimate for a total should be similar to the following 5 * var(W_star_y) #> [,1] #> [1,] 2297.075"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"calibration-estimators","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling","what":"Calibration Estimators","title":"Replication Methods for Two-phase Sampling","text":"section describes calibration estimators (raking, post-stratification, ratio estimators) commonly used two-phase designs. detailed treatment estimators, see Chapter 11 Lohr (2022) Chapter 6 Särndal, Swensson, Wretman (1992). two-phase sampling, can helpful calibrate weights small second-phase sample \\(s_b\\) estimates variables \\(x_1, \\dots, x_p\\) measured phases match estimates produced using larger, reliable sample \\(s_a\\). variable \\(y\\) measured second-phase sample, can lead precise estimates calibration variables \\(x_1, \\dots, x_p\\) associated \\(y\\). generalized regression (GREG) used, two-phase GREG estimator can written follows: \\[ \\hat{Y}^{(b)}_{\\text{GREG}} = \\hat{Y}^{()} + \\left(\\hat{\\mathbf{X}}^{()} - \\hat{\\mathbf{X}}^{(b)}\\right)\\hat{\\mathbf{B}}^{(b)} \\] \\(\\hat{\\mathbf{X}}^{()}\\) \\(p\\)-length vector estimated population totals variables \\(x_1, \\dots, x_p\\) estimates using first-phase data, \\(\\hat{\\mathbf{X}}^{(b)}\\) vector estimated population totals using second-phase data, \\(\\hat{\\mathbf{B}}^{(b)}\\) estimated using following: \\[ \\hat{\\mathbf{B}}^{(b)} = \\left(\\sum_{=1}^{n_{(b)}} w^{*}_i \\frac{1}{\\sigma_i^2} \\mathbf{x}_i \\mathbf{x}_i^T\\right)^{-1} \\sum_{=1}^{n_{(b)}} w^{*}_i \\frac{1}{\\sigma_i^2} \\mathbf{x}_i y_i \\] constants \\(\\sigma_i\\) chosen based specific type calibration desired.4 GREG estimator can also expressed weighted estimator based modified weights \\(\\tilde{w}^{*}_i := g_i w^{*}_i\\) modification factor \\(g\\) suitably chosen specific method calibration used (post-stratification, raking, etc.) \\[ \\begin{aligned} \\hat{Y}^{(b)}_{\\text{GREG}} &= \\sum_{=1}^{n_{(b)}} \\tilde{w}^{*}_i y_i = \\sum_{=1}^{n_{(b)}} (g_i w^{*}_i) y_i \\end{aligned} \\] modification factors \\(g_i\\) (commonly referred “g-weights”) can expressed : \\[ g_i = 1+ \\left(\\hat{\\mathbf{X}}^{()} - \\hat{\\mathbf{X}}^{(b)}\\right)^{\\prime} \\left(\\sum_{=1}^{n_{(b)}} w^{*}_i \\frac{1}{\\sigma_i^2} \\mathbf{x}_i \\mathbf{x}_i^T\\right)^{-1} \\sum_{=1}^{n_{(b)}} w^{*}_i \\frac{1}{\\sigma_i^2} \\mathbf{x}_i \\] calibrated second-phase weights \\(\\tilde{w}^{*}_i = g_i w^{*}_i\\) GREG estimator ensure second-phase estimates variables \\(x_1, \\dots, x_p\\) match first-phase estimates. \\[ \\sum_{=1}^{n_{(b)}} \\tilde{w}^{*}_ix_i = \\sum_{=1}^{n_{()}} w^{()}x_i \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"variance-of-the-calibration-estimator","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > Calibration Estimators","what":"Variance of the Calibration Estimator","title":"Replication Methods for Two-phase Sampling","text":"assume second-phase calibration estimator \\(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\) unbiased first-phase estimate \\(\\hat{Y}^{()}\\) (least approximately case), can decompose calibration estimator’s variance first-phase component second-phase component follows: \\[ \\begin{aligned} V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\right) &= V\\left[E\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)} \\mid \\mathbf{Z}\\right)\\right]+E\\left[V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)} \\mid \\mathbf{Z}\\right)\\right] \\\\ &= V\\left[\\hat{Y}^{()}\\right]+E\\left[V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)} \\mid \\mathbf{Z}\\right)\\right] \\end{aligned} \\] first term first-phase variance component second term second-phase variance component. Using second-phase sample, variance calibration estimator can thus estimated unbiasedly following estimator: \\[ V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\right) =\\hat{V}\\left[\\hat{Y}^{()}\\right] + \\hat{V}\\left[\\hat{E}^{(b)} \\mid s_a\\right] \\] \\(\\hat{E}^{(b)} = \\sum_{=1}^{n_{(b)}} w^{*}e_i\\) \\(e_i= y_i - \\mathbf{x}^{\\prime}_i\\hat{\\mathbf{B}}^{(b)}\\) “residual” GREG model. variance estimator saw earlier uncalibrated estimator, \\(\\hat{Y}^{(b)}\\), except second-phase component GREG estimator uses \\(\\hat{E}^{(b)}\\) place \\(\\hat{Y}^{(b)}\\) \\[ \\hat{V}\\left(\\hat{Y}^{(b)}\\right) = \\hat{V}\\left[\\hat{Y}^{()} \\right] + \\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\] decomposition useful understanding theoretical variance calibration estimator can estimated general.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"replication-variance-estimation","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > Calibration Estimators","what":"Replication Variance Estimation","title":"Replication Methods for Two-phase Sampling","text":"variance estimation using replication methods, another (approximate) decomposition proves useful. Fuller (1998) decomposes two-phase calibration estimator’s variance follows. \\[ V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\right) \\approx E \\left[ V \\left( \\tilde{E}^{(b)} \\mid s_a \\right) \\right] + \\mathbf{B}^{\\prime} \\mathbf{V}\\left(\\hat{\\mathbf{X}}^{()}\\right)\\mathbf{B} \\] \\(\\mathbf{B}\\) finite-population version \\(\\hat{\\mathbf{B}}^{(b)}\\) calculate data entire population rather just second-phase sample \\(s_b\\), \\(\\tilde{E}^{(b)}=\\sum_{=1}^{n_{(b)}} w^{*}_i\\left(y_i - \\mathbf{x}_i^{\\prime}\\mathbf{B}\\right)\\) weighted sum second-phase residuals based using \\(\\mathbf{B}\\). decomposition variance suggests following estimator: \\[ \\hat{V}\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\right) := \\hat{V} \\left( \\hat{E}^{(b)} \\mid s_a \\right) + (\\hat{\\mathbf{B}}^{(b)})^{\\prime} \\hat{\\mathbf{V}}\\left(\\hat{\\mathbf{X}}^{()}\\right)(\\hat{\\mathbf{B}}^{(b)}) \\] first component estimated using second-phase data conditional variance estimator second-phase design (taking selected first-phase sample given). second component depends first-phase estimates \\(\\hat{\\mathbf{X}}^{()}\\) well first-phase variance estimate \\(\\hat{V}(\\hat{\\mathbf{X}}^{()})\\) values \\(\\mathbf{B}^{(b)}\\) used calibration. Fuller (1998) proposed replication-based version estimator. describe estimator, first suppose developed two-phase replicate weights appropriate double-expansion estimator. \\[ \\begin{aligned} \\hat{V}\\left(\\hat{Y}^{(b)}\\right) &= K_{(b)}\\sum_{r=1}^{R_{(b)}} \\left( \\hat{Y}^{(b)}_{(r)} - \\hat{Y}^{(b)} \\right)^2 \\\\ \\text{}& \\hat{Y}^{(b)}_{(r)}= \\sum_{=1}^{n_{(b)}}w_{r,} y_i \\\\ & \\text{}r\\text{-th} \\text{ replicate estimate} \\\\ & \\text{second-phase sample } \\\\ \\text{}& K_{(b)}\\text{ constant specific} \\\\ &\\text{replication method} \\end{aligned} \\] Now suppose \\(k\\)-length vector estimated first-phase totals, \\(\\hat{\\mathbf{X}}^{()}\\), used calibration second phase weights. suppose estimated totals also estimated variance-covariance matrix, denoted \\(\\hat{\\mathbf{V}}\\left(\\hat{\\mathbf{X}}^{()}\\right)\\), \\(k \\times k\\) matrix. can decompose variance-covariance matrix follows: \\[ \\hat{\\mathbf{V}}\\left(\\hat{\\mathbf{X}}^{()}\\right) = K_{(b)} \\sum_{=1}^{R_{(b)}} \\boldsymbol{\\delta}_i^{\\prime} \\boldsymbol{\\delta}_i \\] \\(\\boldsymbol{\\delta}_i\\) vector dimension \\(k\\), \\(K_{(b)}\\) constant mentioned earlier. multiple ways decomposition. Two particularly useful methods either use eigendecomposition, suggested Fuller (1998), instead use replicate estimates first-phase survey, suggested Opsomer Erciulescu (2021). Fuller demonstrates can obtain reasonable variance estimator two-phase calibration estimator using \\(R_{(b)}\\) vectors \\(\\boldsymbol{\\delta}_{r}\\) form \\(R_{(b)}\\) different control totals use calibration targets \\(R_{(b)}\\) second-phase replicates. words, simply calibrate \\(r\\)-th set replicate weights \\(r\\)-th control total \\(\\hat{\\mathbf{X}}^{()} + \\boldsymbol{\\delta}_{r}\\). Crucially, order vectors \\(\\boldsymbol{\\delta}_{r}\\) totally random, vectors \\(\\boldsymbol{\\delta}_{r}\\) independent sets replicate weights \\(\\mathbf{w}_{r}\\). Fuller (1998) shows calibrating second-phase replicates random calibration targets described results variance estimator consistent variance two-phase calibration estimator. underlying estimator described R code earlier vignette use functions calibrate_to_estimate() calibrate_to_sample(). essential difference two functions form vectors \\(\\boldsymbol{\\delta}_r\\). function calibrate_to_estimate() forms vectors \\(\\boldsymbol{\\delta}_{r}\\) using eigen-decomposition specified variance-covariance matrix. contrast, function calibrate_to_sample() forms vectors \\(\\boldsymbol{\\delta}_{r}\\) using replicate estimates first-phase sample.","code":"# Print first phase estimates and their variance-covariance print(first_phase_totals) #> TOTCIR TOTSTAFF #> 1648795905.4 152846.6 print(first_phase_vcov) #> TOTCIR TOTSTAFF #> TOTCIR 6.606150e+16 5.853993e+12 #> TOTSTAFF 5.853993e+12 5.747174e+08 #> attr(,\"means\") #> [1] 1648121469.6 152702.4 # Calibrate the two-phase replicate design # to the totals estimated from the first-phase sample calibrated_twophase_design <- calibrate_to_estimate( rep_design = twophase_boot_design, # Specify the variables in the data to use for calibration cal_formula = ~ TOTCIR + TOTSTAFF, # Supply the first-phase estimates and their variance estimate = first_phase_totals, vcov_estimate = first_phase_vcov, ) #> Selection of replicate columns whose control totals will be perturbed will be done at random. #> For tips on reproducible selection, see `help('calibrate_to_estimate')` calibrated_twophase_design <- calibrate_to_sample( primary_rep_design = twophase_boot_design, # Supply the first-phase replicate design control_rep_design = first_phase_gen_boot, # Specify the variables in the data to use for calibration cal_formula = ~ TOTCIR + TOTSTAFF ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')`"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"ensuring-the-variance-estimator-is-positive-semidefinite","dir":"Articles","previous_headings":"","what":"Ensuring the Variance Estimator is Positive Semidefinite","title":"Replication Methods for Two-phase Sampling","text":"’ve made far vignette, ’re probably now well-aware variance estimators two-phase designs often positive semidefinite quadratic form ’d like . Instead, ’re usually close quite positive semidefinite quadratic form, owing difficulty estimating first-phase variance component.5 One solution handling quadratic form matrix \\(\\Sigma_{ab}\\) positive semidefinite approximate \\(\\tilde{\\Sigma}_{ab} = \\Gamma \\Lambda^{*} \\Gamma^{\\prime}\\), \\(\\Gamma\\) matrix eigenvalues \\(\\Sigma_{ab}\\), \\(\\Lambda\\) diagonal matrix eigenvalues \\(\\Sigma_{ab}\\), \\(\\Lambda^{*}\\) updated version \\(\\Lambda\\) negative eigenvalues replaced \\(0\\). solution suggested Beaumont Patak (2012) general-purpose solution implementing generalized bootstrap target variance estimator ’s mimicking isn’t positive semidefinite. Beaumont Patak (2012) argue using \\(\\tilde{\\Sigma}_{ab}\\) instead \\(\\Sigma_{ab}\\) result small overestimation.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"usage-with-the-generalized-bootstrap","dir":"Articles","previous_headings":"Ensuring the Variance Estimator is Positive Semidefinite","what":"Usage with the Generalized Bootstrap","title":"Replication Methods for Two-phase Sampling","text":"function as_gen_boot_design() used create generalized bootstrap replicate weights, warn target variance estimator positive semidefinite let know therefore approximate target variance estimator using method described .","code":"gen_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( 'Phase 1' = \"Ultimate Cluster\", 'Phase 2' = \"Ultimate Cluster\" ) ) #> Warning in as_gen_boot_design.twophase2(design = twophase_design, #> variance_estimator = list(`Phase 1` = \"Ultimate Cluster\", : The sample #> quadratic form matrix for this design and variance estimator is not positive #> semidefinite. It will be approximated by the nearest positive semidefinite #> matrix."},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"helper-functions-for-ensuring-an-estimator-is-positive-semidefinite","dir":"Articles","previous_headings":"Ensuring the Variance Estimator is Positive Semidefinite","what":"Helper Functions for Ensuring an Estimator is Positive Semidefinite","title":"Replication Methods for Two-phase Sampling","text":"‘svrep’ package two functions can helpful dealing matrices hope positive semidefinite might . function is_psd_matrix() simply checks whether matrix positive semidefinite. works estimating matrix’s eigenvalues determining whether negative. matrix isn’t positive semidefinite (least symmetric), function get_nearest_psd_matrix() implement approximation method described earlier. Approximating quadratic form one positive semidefinite leads similar (slightly larger) estimated standard error. example two-phase design based library survey earlier, can see approximation results standard error estimate slightly larger standard error estimate based quadratic form wasn’t quite positive semidefinite.","code":"twophase_quad_form_matrix <- get_design_quad_form( design = twophase_design, variance_estimator = list( 'Phase 1' = \"Ultimate Cluster\", 'Phase 2' = \"Ultimate Cluster\" ) ) twophase_quad_form_matrix |> is_psd_matrix() #> [1] FALSE approx_quad_form <- get_nearest_psd_matrix(twophase_quad_form_matrix) # Extract weights and a single variable from the second-phase sample ## NOTE: To get second-phase data, ## we use `my_design$phase1$sample$variables`. ## To get first-phase data, ## we use `my_design$phase1$full$variables wts <- weights(twophase_design, type = \"sampling\") y <- twophase_design$phase1$sample$variables$TOTSTAFF wtd_y <- as.matrix(wts * y) # Estimate standard errors std_error <- as.numeric( t(wtd_y) %*% twophase_quad_form_matrix %*% wtd_y ) |> sqrt() approx_std_error <- as.numeric( t(wtd_y) %*% approx_quad_form %*% wtd_y ) |> sqrt() print(approx_std_error) #> [1] 20498.68 print(std_error) #> [1] 19765.59 approx_std_error / std_error #> [1] 1.037089"},{"path":[]},{"path":"https://bschneidr.github.io/svrep/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Ben Schneider. Author, maintainer.","code":""},{"path":"https://bschneidr.github.io/svrep/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Schneider, B. (2023). \"svrep: Tools Creating, Updating, Analyzing Survey Replicate Weights\". R package version 0.6.0.","code":"@Misc{, author = {Benjamin Schneider}, year = {2023}, title = {svrep: Tools for Creating, Updating, and Analyzing Survey Replicate Weights}, note = {R package version 0.6.0}, url = {https://CRAN.R-project.org/package=svrep}, }"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"svrep","dir":"","previous_headings":"","what":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"svrep provides methods creating, updating, analyzing replicate weights surveys. Functions svrep can used implement adjustments replicate designs (e.g. nonresponse weighting class adjustments) analyze effect replicate weights estimates interest. Facilitates creation bootstrap generalized bootstrap replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"can install released version svrep CRAN : can install development version GitHub :","code":"install.packages(\"svrep\") # install.packages(\"devtools\") devtools::install_github(\"bschneidr/svrep\")"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"using ‘svrep’ package, please make sure cite resulting publications. appreciated package maintainer helps incentivize ongoing development, maintenance, support. Schneider B. (2023). “svrep: Tools Creating, Updating, Analyzing Survey Replicate Weights”. R package version 0.6.0. using ‘svrep’ package, please also cite ‘survey’ package R , since essential use ‘svrep’. Call citation('svrep'), citation('survey'), citation('base') information generate BibTex entries citing packages well R.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/index.html","id":"creating-replicate-weights","dir":"","previous_headings":"Example usage","what":"Creating replicate weights","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"Suppose data survey selected using complex sampling method cluster sampling. represent complex survey design, can create survey design object using survey package. help us estimate sampling variances, can create bootstrap replicate weights. function as_bootstrap_design() creates bootstrap replicate weights appropriate common complex sampling designs, using bootstrapping methods ‘survey’ package well additional methods Rao-Wu-Yue-Beaumont method (generalization Rao-Wu bootstrap). especially complex survey designs (e.g., systematic samples), generalized survey bootstrap can used. relatively simple designs, can also use random-groups jackknife.","code":"library(survey) library(svrep) data(api, package = \"survey\") set.seed(2021) # Create a survey design object for a sample # selected using a single-stage cluster sample without replacement dclus1 <- svydesign(data = apiclus1, id = ~dnum, weights = ~pw, fpc = ~fpc) # Create replicate-weights survey design orig_rep_design <- as_bootstrap_design(dclus1, replicates = 500, type = \"Rao-Wu-Yue-Beaumont\") print(orig_rep_design) #> Call: as_bootstrap_design(dclus1, replicates = 500, type = \"Rao-Wu-Yue-Beaumont\") #> Survey bootstrap with 500 replicates. # Load example data for a stratified systematic sample data('library_stsys_sample', package = 'svrep') # First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] # Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) # Convert to generalized bootstrap replicate design gen_boot_design_sd2 <- as_gen_boot_design( design = design_obj, variance_estimator = \"SD2\", replicates = 500 ) #> For `variance_estimator='SD2', assumes rows of data are sorted in the same order used in sampling. # Create random-group jackknife replicates # for a single-stage survey with many first-stage sampling units rand_grp_jk_design <- apisrs |> svydesign(data = _, ids = ~ 1, weights = ~ pw) |> as_random_group_jackknife_design( replicates = 20 )"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"adjusting-for-non-response-or-unknown-eligibility","dir":"","previous_headings":"Example usage","what":"Adjusting for non-response or unknown eligibility","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"social surveys, unit nonresponse extremely common. also somewhat common respondent cases classified “ineligible” survey based response. general, sampled cases typically classified “respondents”, “nonrespondents”, “ineligible cases”, “unknown eligibility” cases. common practice adjust weights non-response sampled cases whose eligibility survey unknown. common form adjustment “weight redistribution”: example, weights non-respondents reduced zero, weights respondents correspondingly increased total weight sample unchanged. order account adjustments estimating variances survey statistics, adjustments repeated separately set replicate weights. process can easily implemented using redistribute_weights() function. supplying column names argument redistribute_weights(), adjustments conducted separately different groups. can used conduct nonresponse weighting class adjustments.","code":"# Create variable giving response status orig_rep_design$variables[['response_status']] <- sample( x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), prob = c(0.6, 0.2, 0.1, 0.1), size = nrow(orig_rep_design), replace = TRUE ) table(orig_rep_design$variables$response_status) #> #> Ineligible Nonrespondent Respondent Unknown eligibility #> 16 32 119 16 # Adjust weights for unknown eligibility ue_adjusted_design <- redistribute_weights( design = orig_rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\") ) nr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status == \"Nonrespondent\", increase_if = response_status == \"Respondent\", by = c(\"stype\") )"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"comparing-estimates-from-different-sets-of-weights","dir":"","previous_headings":"Example usage","what":"Comparing estimates from different sets of weights","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"order assess whether weighting adjustments impact estimates care , want compare estimates different sets weights. function svyby_repwts() makes easy compare estimates different sets weights. can even test differences estimates two sets weights calculate confidence intervals difference.","code":"# Estimate overall means (and their standard errors) from each design overall_estimates <- svyby_repwts( rep_designs = list('original' = orig_rep_design, 'nonresponse-adjusted' = nr_adjusted_design), formula = ~ api00, FUN = svymean ) print(overall_estimates, row.names = FALSE) #> Design_Name api00 se #> nonresponse-adjusted 641.2030 25.54368 #> original 644.1694 23.06284 # Estimate domain means (and their standard errors) from each design domain_estimates <- svyby_repwts( rep_designs = list('original' = orig_rep_design, 'nonresponse-adjusted' = nr_adjusted_design), formula = ~ api00, by = ~ stype, FUN = svymean ) print(domain_estimates, row.names = FALSE) #> Design_Name stype api00 se #> nonresponse-adjusted E 649.9188 25.56366 #> original E 648.8681 22.31347 #> nonresponse-adjusted H 603.5390 45.26079 #> original H 618.5714 37.39448 #> nonresponse-adjusted M 616.3260 36.27983 #> original M 631.4400 31.03957 estimates <- svyby_repwts( rep_designs = list('original' = orig_rep_design, 'nonresponse-adjusted' = nr_adjusted_design), formula = ~ api00, FUN = svymean ) vcov(estimates) #> nonresponse-adjusted original #> nonresponse-adjusted 652.4793 585.5253 #> original 585.5253 531.8947 diff_between_ests <- svycontrast(stat = estimates, contrasts = list( \"Original vs. Adjusted\" = c(-1,1) )) print(diff_between_ests) #> contrast SE #> Original vs. Adjusted 2.9664 3.6501 confint(diff_between_ests) #> 2.5 % 97.5 % #> Original vs. Adjusted -4.187705 10.12056"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"diagnosing-potential-issues-with-weights","dir":"","previous_headings":"Example usage","what":"Diagnosing potential issues with weights","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"adjusting replicate weights, several diagnostics can used ensure adjustments carried correctly good harm. function summarize_rep_weights() helps allowing quickly summarize replicate weights. example, carrying nonresponse adjustments, might want verify weights nonrespondents set zero replicate. can use summarize_rep_weights() compare summary statistics replicate, can use argument group summaries one variables. end adjustment process, can inspect number rows columns examine variability weights across replicates.","code":"summarize_rep_weights( rep_design = nr_adjusted_design, type = 'specific', by = \"response_status\" ) |> subset(Rep_Column %in% 1:2) #> response_status Rep_Column N N_NONZERO SUM MEAN CV #> 1 Ineligible 1 16 16 608.1360 38.00850 1.2415437 #> 2 Ineligible 2 16 16 739.2634 46.20397 0.7578107 #> 501 Nonrespondent 1 32 0 0.0000 0.00000 NaN #> 502 Nonrespondent 2 32 0 0.0000 0.00000 NaN #> 1001 Respondent 1 119 119 6236.0577 52.40385 1.0431318 #> 1002 Respondent 2 119 119 6426.4544 54.00382 0.8345243 #> 1501 Unknown eligibility 1 16 0 0.0000 0.00000 NaN #> 1502 Unknown eligibility 2 16 0 0.0000 0.00000 NaN #> MIN MAX #> 1 0.5632079 120.38814 #> 2 0.5422029 77.44622 #> 501 0.0000000 0.00000 #> 502 0.0000000 0.00000 #> 1001 0.6072282 151.10496 #> 1002 0.5971008 102.40567 #> 1501 0.0000000 0.00000 #> 1502 0.0000000 0.00000 nr_adjusted_design |> subset(response_status == \"Respondent\") |> summarize_rep_weights( type = 'overall' ) #> nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 119 500 29 30 5625.555 1257.982 0.5305136 367.826"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"sample-based-calibration","dir":"","previous_headings":"Example usage","what":"Sample-based calibration","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"rake poststratify estimated control totals rather “true” population values, may need account variance estimated control totals ensure calibrated estimates appropriately reflect sampling error primary survey interest survey control totals estimated. ‘svrep’ package provides two functions accomplish . function calibrate_to_estimate() requires user supply vector control totals variance-covariance matrix, function calibrate_to_sample() requires user supply dataset replicate weights use estimating control totals sampling variance. example, suppose survey measuring vaccination status adults Louisville, Kentucky. variance estimation, use 100 bootstrap replicates. reduce nonresponse bias coverage error survey, can rake survey population totals demographic groups estimated Census Bureau American Community Survey (ACS). estimate population totals raking purposes, can use microdata replicate weights. can see distribution race/ethnicity among respondents differs distribution race/ethnicity ACS benchmarks. two options calibrating sample control totals benchmark survey. first approach, supply point estimates variance-covariance matrix function calibrate_to_estimate(). second approach, supply control survey’s replicate design calibrate_to_sample(). calibration, can see estimated vaccination rate decreased, estimated standard error estimated vaccination rate increased.","code":"data(\"lou_vax_survey\") # Load example data lou_vax_survey <- svydesign(ids = ~ 1, weights = ~ SAMPLING_WEIGHT, data = lou_vax_survey) |> as_bootstrap_design(replicates = 100, mse = TRUE) # Adjust for nonresponse lou_vax_survey <- lou_vax_survey |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\" ) |> subset(RESPONSE_STATUS == \"Respondent\") # Load microdata to use for estimating control totals data(\"lou_pums_microdata\") acs_benchmark_survey <- survey::svrepdesign( data = lou_pums_microdata, variables = ~ UNIQUE_ID + AGE + SEX + RACE_ETHNICITY + EDUC_ATTAINMENT, weights = ~ PWGTP, repweights = \"PWGTP\\\\d{1,2}\", type = \"successive-difference\", mse = TRUE ) # Compare demographic estimates from the two data sources estimate_comparisons <- data.frame( 'Vax_Survey' = svymean(x = ~ RACE_ETHNICITY, design = lou_vax_survey) |> coef(), 'ACS_Benchmark' = svymean(x = ~ RACE_ETHNICITY, design = acs_benchmark_survey) |> coef() ) rownames(estimate_comparisons) <- gsub(x = rownames(estimate_comparisons), \"RACE_ETHNICITY\", \"\") print(estimate_comparisons) #> Vax_Survey #> Black or African American alone, not Hispanic or Latino 0.16932271 #> Hispanic or Latino 0.03386454 #> Other Race, not Hispanic or Latino 0.05776892 #> White alone, not Hispanic or Latino 0.73904382 #> ACS_Benchmark #> Black or African American alone, not Hispanic or Latino 0.19949824 #> Hispanic or Latino 0.04525039 #> Other Race, not Hispanic or Latino 0.04630955 #> White alone, not Hispanic or Latino 0.70894182 # Estimate control totals and their variance-covariance matrix control_totals <- svymean(x = ~ RACE_ETHNICITY + EDUC_ATTAINMENT, design = acs_benchmark_survey) point_estimates <- coef(control_totals) vcov_estimates <- vcov(control_totals) # Calibrate the vaccination survey to the estimated control totals vax_survey_raked_to_estimates <- calibrate_to_estimate( rep_design = lou_vax_survey, estimate = point_estimates, vcov_estimate = vcov_estimates, cal_formula = ~ RACE_ETHNICITY + EDUC_ATTAINMENT, calfun = survey::cal.raking ) vax_survey_raked_to_acs_sample <- calibrate_to_sample( primary_rep_design = lou_vax_survey, control_rep_design = acs_benchmark_survey, cal_formula = ~ RACE_ETHNICITY + EDUC_ATTAINMENT, calfun = survey::cal.raking ) # Compare the two sets of estimates svyby_repwts( rep_design = list( 'NR-adjusted' = lou_vax_survey, 'Raked to estimate' = vax_survey_raked_to_estimates, 'Raked to sample' = vax_survey_raked_to_acs_sample ), formula = ~ VAX_STATUS, FUN = svymean, keep.names = FALSE ) #> Design_Name VAX_STATUSUnvaccinated VAX_STATUSVaccinated se1 #> 1 NR-adjusted 0.4621514 0.5378486 0.01863299 #> 2 Raked to estimate 0.4732623 0.5267377 0.01895171 #> 3 Raked to sample 0.4732623 0.5267377 0.01893093 #> se2 #> 1 0.01863299 #> 2 0.01895171 #> 3 0.01893093"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"saving-results-to-a-data-file","dir":"","previous_headings":"Example usage","what":"Saving results to a data file","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"’re satisfied weights, can create data frame analysis variables columns final full-sample weights replicate weights. format easy export data files can loaded R software later.","code":"data_frame_with_final_weights <- vax_survey_raked_to_estimates |> as_data_frame_with_weights( full_wgt_name = \"RAKED_WGT\", rep_wgt_prefix = \"RAKED_REP_WGT_\" ) # Preview first 10 column names colnames(data_frame_with_final_weights) |> head(10) #> [1] \"RESPONSE_STATUS\" \"RACE_ETHNICITY\" \"SEX\" \"EDUC_ATTAINMENT\" #> [5] \"VAX_STATUS\" \"SAMPLING_WEIGHT\" \"RAKED_WGT\" \"RAKED_REP_WGT_1\" #> [9] \"RAKED_REP_WGT_2\" \"RAKED_REP_WGT_3\" # Write the data to a CSV file write.csv( x = data_frame_with_final_weights, file = \"survey-data_with-updated-weights.csv\" )"},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":null,"dir":"Reference","previous_headings":"","what":"Add inactive replicates to a survey design object — add_inactive_replicates","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"Adds inactive replicates survey design object. inactive replicate replicate contribute variance estimates adds matrix replicate weights matrix desired number columns. new replicates' values simply equal full-sample weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"","code":"add_inactive_replicates(design, n_total, n_to_add, location = \"last\")"},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"design survey design object, created either survey srvyr packages. n_total total number replicates result contain. design already contains n_total replicates (), update made. n_to_add number additional replicates add. Can use n_total argument n_to_add argument, . location Either \"first\", \"last\" (default), \"random\". Specifies columns new replicates located matrix replicate weights. Use \"first\" place new replicates first (.e., leftmost part matrix), \"last\" place new replicates last (.e., rightmost part matrix). Use \"random\" intersperse new replicates random column locations matrix; original replicates still original order.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"updated survey design object, number columns replicate weights potentially increased. increase happens user specifies n_to_add argument instead n_total, user specifies n_total n_total less number columns replicate weights design already .","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"Inactive replicates also sometimes referred \"dead replicates\", example Ash (2014). purpose adding inactive replicates increase number columns replicate weights without impacting variance estimates. can useful, example, combining data survey across multiple years, different years use different number replicates, consistent number replicates desired combined data file. Suppose initial replicate design \\(L\\) replicates, respective constants \\(c_k\\) \\(k=1,\\dots,L\\) used estimate variance formula $$v_{R} = \\sum_{k=1}^L c_k\\left(\\hat{T}_y^{(k)}-\\hat{T}_y\\right)^2$$ \\(\\hat{T}_y\\) estimate produced using full-sample weights \\(\\hat{T}_y^{(k)}\\) estimate replicate \\(k\\). Inactive replicates simply replicates exactly equal full sample: , replicate \\(k\\) called \"inactive\" vector replicate weights exactly equals full-sample weights. case, using formula estimate variances, replicates contribute nothing variance estimate. analyst uses variant formula full-sample estimate \\(\\hat{T}_y\\) replaced average replicate estimate (.e., \\(L^{-1}\\sum_{k=1}^{L}\\hat{T}_y^{(k)}\\)), variance estimates differ vs. adding inactive replicates. reason, strongly recommend explicitly specify mse=TRUE creating replicate design object R functions svrepdesign(), as_bootstrap_design(), etc. working already existing replicate design, can update mse option TRUE simply using code my_design$mse <- TRUE.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"","code":"library(survey) #> Loading required package: grid #> Loading required package: Matrix #> Loading required package: survival #> #> Attaching package: ‘survey’ #> The following object is masked from ‘package:graphics’: #> #> dotchart set.seed(2023) # Create an example survey design object sample_data <- data.frame( PSU = c(1,2,3) ) survey_design <- svydesign( data = sample_data, ids = ~ PSU, weights = ~ 1 ) rep_design <- survey_design |> as.svrepdesign(type = \"JK1\", mse = TRUE) # Inspect replicates before subsampling rep_design |> weights(type = \"analysis\") #> [,1] [,2] [,3] #> [1,] 0.0 1.5 1.5 #> [2,] 1.5 0.0 1.5 #> [3,] 1.5 1.5 0.0 # Inspect replicates after adding inactive replicates rep_design |> add_inactive_replicates(n_total = 5, location = \"first\") |> weights(type = \"analysis\") #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1 1 0.0 1.5 1.5 #> [2,] 1 1 1.5 0.0 1.5 #> [3,] 1 1 1.5 1.5 0.0 rep_design |> add_inactive_replicates(n_to_add = 2, location = \"last\") |> weights(type = \"analysis\") #> [,1] [,2] [,3] [,4] [,5] #> [1,] 0.0 1.5 1.5 1 1 #> [2,] 1.5 0.0 1.5 1 1 #> [3,] 1.5 1.5 0.0 1 1 rep_design |> add_inactive_replicates(n_to_add = 5, location = \"random\") |> weights(type = \"analysis\") #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #> [1,] 1 1 1 0.0 1 1.5 1 1.5 #> [2,] 1 1 1 1.5 1 0.0 1 1.5 #> [3,] 1 1 1 1.5 1 1.5 1 0.0"},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"Converts survey design object replicate design object replicate weights formed using bootstrap method. Supports stratified, cluster samples one stages sampling. stage sampling, either simple random sampling (without replacement) unequal probability sampling (without replacement) may used.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"","code":"as_bootstrap_design( design, type = \"Rao-Wu-Yue-Beaumont\", replicates = 500, compress = TRUE, mse = getOption(\"survey.replicates.mse\"), samp_method_by_stage = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. type type bootstrap use, chosen based applicability sampling method used survey. available types following: \"Rao-Wu-Yue-Beaumont\" (default): bootstrap method Beaumont Émond (2022), generalization Rao-Wu-Yue bootstrap, applicable wide variety designs, including single-stage multistage stratified designs. design may different sampling methods used different stages. stage sampling may potentially PPS (.e., use unequal probabilities), without replacement, may potentially use Poisson sampling. stratum fixed sample size \\(n\\) sampling units, resampling replicate resamples \\((n-1)\\) sampling units replacement. \"Rao-Wu\": basic Rao-Wu \\((n-1)\\) bootstrap method, applicable single-stage designs multistage designs first-stage sampling fractions small (can thus ignored). Accommodates stratified designs. sampling within stratum must simple random sampling without replacement, although first-stage sampling effectively treated sampling without replacement. \"Preston\": Preston's multistage rescaled bootstrap, applicable single-stage designs multistage designs arbitrary sampling fractions. Accommodates stratified designs. sampling within stratum must simple random sampling without replacement. \"Canty-Davison\": Canty-Davison bootstrap, applicable single-stage designs, arbitrary sampling fractions. Accommodates stratified designs. sampling stratum must simple random sampling without replacement. replicates Number bootstrap replicates (large possible, given computer memory/storage limitations). commonly-recommended default 500. compress Use compressed representation replicate weights matrix. reduces computer memory required represent replicate weights impact estimates. mse TRUE, compute variances sums squares around point estimate full-sample weights, FALSE, compute variances sums squares around mean estimate replicate weights. samp_method_by_stage (Optional). default, function automatically determine sampling method used stage. However, argument can used ensure correct sampling method identified stage. Accepts vector length equal number stages sampling. element one following: \"SRSWOR\" - Simple random sampling, without replacement \"SRSWR\" - Simple random sampling, replacement \"PPSWOR\" - Unequal probabilities selection, without replacement \"PPSWR\" - Unequal probabilities selection, replacement \"Poisson\" - Poisson sampling: sampling unit selected sample , potentially different probabilities inclusion sampling unit.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"replicate design object, class svyrep.design, can used usual functions, svymean() svyglm(). Use weights(..., type = 'analysis') extract matrix replicate weights. Use as_data_frame_with_weights() convert design object data frame columns full-sample replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"Beaumont, J.-F.; Émond, N. (2022). \"Bootstrap Variance Estimation Method Multistage Sampling Two-Phase Sampling Poisson Sampling Used Second Phase.\" Stats, 5: 339–357. https://doi.org/10.3390/stats5020019 Canty, .J.; Davison, .C. (1999). \"Resampling-based variance estimation labour force surveys.\" Statistician, 48: 379-391. Preston, J. (2009). \"Rescaled bootstrap stratified multistage sampling.\" Survey Methodology, 35(2): 227-234. Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). \"recent work resampling methods complex surveys.\" Survey Methodology, 18: 209–217.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"","code":"library(survey) # Example 1: A multistage sample with two stages of SRSWOR ## Load an example dataset from a multistage sample, with two stages of SRSWOR data(\"mu284\", package = 'survey') multistage_srswor_design <- svydesign(data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2) ## Convert the survey design object to a bootstrap design set.seed(2022) bootstrap_rep_design <- as_bootstrap_design(multistage_srswor_design, replicates = 500) ## Compare std. error estimates from bootstrap versus linearization data.frame( 'Statistic' = c('total', 'mean', 'median'), 'SE (bootstrap)' = c(SE(svytotal(x = ~ y1, design = bootstrap_rep_design)), SE(svymean(x = ~ y1, design = bootstrap_rep_design)), SE(svyquantile(x = ~ y1, quantile = 0.5, design = bootstrap_rep_design))), 'SE (linearization)' = c(SE(svytotal(x = ~ y1, design = multistage_srswor_design)), SE(svymean(x = ~ y1, design = multistage_srswor_design)), SE(svyquantile(x = ~ y1, quantile = 0.5, design = multistage_srswor_design))), check.names = FALSE ) #> Statistic SE (bootstrap) SE (linearization) #> 1 total 2311.130145 2274.254701 #> 2 mean 2.449955 2.273653 #> 3 median 2.331234 2.521210 # Example 2: A multistage-sample, # first stage selected with unequal probabilities without replacement # second stage selected with simple random sampling without replacement data(\"library_multistage_sample\", package = \"svrep\") multistage_pps <- svydesign(data = library_multistage_sample, ids = ~ PSU_ID + SSU_ID, fpc = ~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, pps = \"brewer\") bootstrap_rep_design <- as_bootstrap_design( multistage_pps, replicates = 500, samp_method_by_stage = c(\"PPSWOR\", \"SRSWOR\") ) ## Compare std. error estimates from bootstrap versus linearization data.frame( 'Statistic' = c('total', 'mean'), 'SE (bootstrap)' = c( SE(svytotal(x = ~ TOTCIR, na.rm = TRUE, design = bootstrap_rep_design)), SE(svymean(x = ~ TOTCIR, na.rm = TRUE, design = bootstrap_rep_design))), 'SE (linearization)' = c( SE(svytotal(x = ~ TOTCIR, na.rm = TRUE, design = multistage_pps)), SE(svymean(x = ~ TOTCIR, na.rm = TRUE, design = multistage_pps))), check.names = FALSE ) #> Statistic SE (bootstrap) SE (linearization) #> 1 total 266151536.55 255100437.38 #> 2 mean 45762.71 42544.16"},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"Convert survey design object data frame weights stored columns","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"","code":"as_data_frame_with_weights( design, full_wgt_name = \"FULL_SAMPLE_WGT\", rep_wgt_prefix = \"REP_WGT_\", vars_to_keep = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"design survey design object, created either survey srvyr packages. full_wgt_name column name use full-sample weights rep_wgt_prefix replicate design objects, prefix use column names replicate weights. column names created appending replicate number prefix. vars_to_keep default, variables data kept. select subset non-weight variables, can supply character vector variable names keep.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"data frame, new columns containing weights survey design object","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"","code":"data(\"lou_vax_survey\", package = 'svrep') library(survey) # Create a survey design object survey_design <- svydesign(data = lou_vax_survey, weights = ~ SAMPLING_WEIGHT, ids = ~ 1) rep_survey_design <- as.svrepdesign(survey_design, type = \"boot\", replicates = 10) # Adjust the weights for nonresponse nr_adjusted_design <- redistribute_weights( design = rep_survey_design, reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\", by = c(\"RACE_ETHNICITY\", \"EDUC_ATTAINMENT\") ) # Save the survey design object as a data frame nr_adjusted_data <- as_data_frame_with_weights( nr_adjusted_design, full_wgt_name = \"NR_ADJUSTED_WGT\", rep_wgt_prefix = \"NR_ADJUSTED_REP_WGT_\" ) head(nr_adjusted_data) #> RESPONSE_STATUS RACE_ETHNICITY #> 1 Nonrespondent White alone, not Hispanic or Latino #> 2 Nonrespondent Black or African American alone, not Hispanic or Latino #> 3 Respondent White alone, not Hispanic or Latino #> 4 Nonrespondent White alone, not Hispanic or Latino #> 5 Nonrespondent White alone, not Hispanic or Latino #> 6 Respondent White alone, not Hispanic or Latino #> SEX EDUC_ATTAINMENT VAX_STATUS SAMPLING_WEIGHT NR_ADJUSTED_WGT #> 1 Female Less than high school 596.702 0.000 #> 2 Female High school or beyond 596.702 0.000 #> 3 Female Less than high school Vaccinated 596.702 1223.239 #> 4 Female Less than high school 596.702 0.000 #> 5 Female High school or beyond 596.702 0.000 #> 6 Female High school or beyond Vaccinated 596.702 1059.068 #> NR_ADJUSTED_REP_WGT_1 NR_ADJUSTED_REP_WGT_2 NR_ADJUSTED_REP_WGT_3 #> 1 0 0.000 0 #> 2 0 0.000 0 #> 3 0 2572.449 0 #> 4 0 0.000 0 #> 5 0 0.000 0 #> 6 0 0.000 0 #> NR_ADJUSTED_REP_WGT_4 NR_ADJUSTED_REP_WGT_5 NR_ADJUSTED_REP_WGT_6 #> 1 0.000 0.000 0.000 #> 2 0.000 0.000 0.000 #> 3 1260.888 0.000 0.000 #> 4 0.000 0.000 0.000 #> 5 0.000 0.000 0.000 #> 6 2058.492 3243.364 1056.924 #> NR_ADJUSTED_REP_WGT_7 NR_ADJUSTED_REP_WGT_8 NR_ADJUSTED_REP_WGT_9 #> 1 0 0.000 0 #> 2 0 0.000 0 #> 3 0 1219.633 0 #> 4 0 0.000 0 #> 5 0 0.000 0 #> 6 0 1024.285 0 #> NR_ADJUSTED_REP_WGT_10 #> 1 0.000 #> 2 0.000 #> 3 1202.584 #> 4 0.000 #> 5 0.000 #> 6 2074.098 # Check the column names of the result colnames(nr_adjusted_data) #> [1] \"RESPONSE_STATUS\" \"RACE_ETHNICITY\" \"SEX\" #> [4] \"EDUC_ATTAINMENT\" \"VAX_STATUS\" \"SAMPLING_WEIGHT\" #> [7] \"NR_ADJUSTED_WGT\" \"NR_ADJUSTED_REP_WGT_1\" \"NR_ADJUSTED_REP_WGT_2\" #> [10] \"NR_ADJUSTED_REP_WGT_3\" \"NR_ADJUSTED_REP_WGT_4\" \"NR_ADJUSTED_REP_WGT_5\" #> [13] \"NR_ADJUSTED_REP_WGT_6\" \"NR_ADJUSTED_REP_WGT_7\" \"NR_ADJUSTED_REP_WGT_8\" #> [16] \"NR_ADJUSTED_REP_WGT_9\" \"NR_ADJUSTED_REP_WGT_10\""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"Converts survey design object replicate design object replicate weights formed using generalized replication method Fay (1989). generalized replication method forms replicate weights textbook variance estimator, provided variance estimator can represented quadratic form whose matrix positive semidefinite (covers large class variance estimators).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"","code":"as_fays_gen_rep_design( design, variance_estimator = NULL, aux_var_names = NULL, max_replicates = 500, balanced = TRUE, psd_option = \"warn\", mse = TRUE, compress = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. variance_estimator name variance estimator whose quadratic form matrix created. See variance-estimators detailed description variance estimator. Options include: \"Yates-Grundy\": Yates-Grundy variance estimator based first-order second-order inclusion probabilities. \"Horvitz-Thompson\": Horvitz-Thompson variance estimator based first-order second-order inclusion probabilities. \"Poisson Horvitz-Thompson\": Horvitz-Thompson variance estimator based assuming Poisson sampling, first-order inclusion probabilities inferred sampling probabilities survey design object. \"Stratified Multistage SRS\": usual stratified multistage variance estimator based estimating variance cluster totals within strata stage. \"Ultimate Cluster\": usual variance estimator based estimating variance first-stage cluster totals within first-stage strata. \"Deville-1\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 1\". \"Deville-2\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 2\". \"Deville-Tille\": variance estimator useful balanced sampling designs, proposed Deville Tillé (2005). \"SD1\": non-circular successive-differences variance estimator described Ash (2014), sometimes used variance estimation systematic sampling. \"SD2\": circular successive-differences variance estimator described Ash (2014). estimator basis \"successive-differences replication\" estimator commonly used variance estimation systematic sampling. aux_var_names (used variance_estimator = \"Deville-Tille\"). vector names auxiliary variables used sampling. max_replicates maximum number replicates allow (large possible, given computer memory/storage limitations). commonly-recommended default 500. number replicates needed balanced, fully-efficient estimator less max_replicates, number replicates needed created. replicates needed max_replicates, full number replicates needed created, random subsample retained. balanced balanced=TRUE, replicates contribute equally variance estimates, number replicates needed may slightly increase. psd_option Either \"warn\" (default) \"error\". option specifies happen target variance estimator quadratic form matrix positive semidefinite. can occasionally happen, particularly two-phase designs. psd_option=\"error\", error message displayed. psd_option=\"warn\", warning message displayed, quadratic form matrix approximated similar positive semidefinite matrix. approximation suggested Beaumont Patak (2012), note conservative sense producing overestimates variance. Beaumont Patak (2012) argue overestimation expected small magnitude. See get_nearest_psd_matrix details approximation. mse TRUE (default), compute variances sums squares around point estimate full-sample weights, FALSE, compute variances sums squares around mean estimate replicate weights. Fay's generalized replication method, strongly recommended use mse = TRUE. compress reduces computer memory required represent replicate weights impact estimates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"replicate design object, class svyrep.design, can used usual functions, svymean() svyglm(). Use weights(..., type = 'analysis') extract matrix replicate weights. Use as_data_frame_with_weights() convert design object data frame columns full-sample replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"See Fay (1989) full description replication method, see documentation make_fays_gen_rep_factors implementation details. See variance-estimators description variance estimator available use function. Use rescale_reps eliminate negative adjustment factors.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"two-phase-designs","dir":"Reference","previous_headings":"","what":"Two-Phase Designs","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"two-phase design, variance_estimator list variance estimators' names, two elements, list('Ultimate Cluster', 'Poisson Horvitz-Thompson'). two-phase designs, following estimators may used second phase: \"Ultimate Cluster\" \"Stratified Multistage SRS\" \"Poisson Horvitz-Thompson\" statistical details handling two-phase designs, see documentation make_twophase_quad_form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"generalized replication method first proposed Fay (1984). Fay (1989) refined generalized replication method produce \"balanced\" replicates, sense replicate contributes equally variance estimates. advantage balanced replicates one can still obtain reasonable variance estimate using random subset replicates. - Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59. - Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. - Dippo, Cathryn, Robert Fay, David Morganstein. 1984. “Computing Variances Complex Samples Replicate Weights.” , 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf. - Fay, Robert. 1984. “Properties Estimates Variance Based Replication Methods.” , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf. - Fay, Robert. 1989. “Theory Application Replicate Weighting Variance Calculations.” , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf - Matei, Alina, Yves Tillé. (2005). “Evaluation Variance Approximations Estimators Maximum Entropy Sampling Unequal Probability Fixed Sample Size.” Journal Official Statistics, 21(4):543–70.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"","code":"if (FALSE) { library(survey) ## Load an example systematic sample ---- data('library_stsys_sample', package = 'svrep') ## First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] ## Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) ## Convert to generalized replicate design gen_rep_design_sd2 <- as_fays_gen_rep_design( design = design_obj, variance_estimator = \"SD2\", max_replicates = 250, mse = TRUE ) svytotal(x = ~ TOTSTAFF, na.rm = TRUE, design = gen_rep_design_sd2) }"},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"Converts survey design object replicate design object replicate weights formed using generalized bootstrap method. generalized survey bootstrap method forming bootstrap replicate weights textbook variance estimator, provided variance estimator can represented quadratic form whose matrix positive semidefinite (covers large class variance estimators).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"","code":"as_gen_boot_design( design, variance_estimator = NULL, aux_var_names = NULL, replicates = 500, tau = \"auto\", exact_vcov = FALSE, psd_option = \"warn\", mse = getOption(\"survey.replicates.mse\"), compress = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. variance_estimator name variance estimator whose quadratic form matrix created. See variance-estimators detailed description variance estimator. Options include: \"Yates-Grundy\": Yates-Grundy variance estimator based first-order second-order inclusion probabilities. \"Horvitz-Thompson\": Horvitz-Thompson variance estimator based first-order second-order inclusion probabilities. \"Poisson Horvitz-Thompson\": Horvitz-Thompson variance estimator based assuming Poisson sampling, first-order inclusion probabilities inferred sampling probabilities survey design object. \"Stratified Multistage SRS\": usual stratified multistage variance estimator based estimating variance cluster totals within strata stage. \"Ultimate Cluster\": usual variance estimator based estimating variance first-stage cluster totals within first-stage strata. \"Deville-1\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 1\". \"Deville-2\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 2\". \"Deville-Tille\": variance estimator useful balanced sampling designs, proposed Deville Tillé (2005). \"SD1\": non-circular successive-differences variance estimator described Ash (2014), sometimes used variance estimation systematic sampling. \"SD2\": circular successive-differences variance estimator described Ash (2014). estimator basis \"successive-differences replication\" estimator commonly used variance estimation systematic sampling. aux_var_names (used variance_estimator = \"Deville-Tille\"). vector names auxiliary variables used sampling. replicates Number bootstrap replicates (large possible, given computer memory/storage limitations). commonly-recommended default 500. tau Either \"auto\", single number. rescaling constant used avoid negative weights transformation \\(\\frac{w + \\tau - 1}{\\tau}\\), \\(w\\) original weight \\(\\tau\\) rescaling constant tau. tau=\"auto\", rescaling factor determined automatically follows: adjustment factors nonnegative, tau set equal 1; otherwise, tau set smallest value needed rescale adjustment factors least 0.01. exact_vcov exact_vcov=TRUE, replicate factors generated variance estimates totals exactly match results target variance estimator. requires num_replicates exceeds rank Sigma. replicate factors generated applying PCA-whitening collection draws multivariate Normal distribution, applying coloring transformation whitened collection draws. psd_option Either \"warn\" (default) \"error\". option specifies happen target variance estimator quadratic form matrix positive semidefinite. can occasionally happen, particularly two-phase designs. psd_option=\"error\", error message displayed. psd_option=\"warn\", warning message displayed, quadratic form matrix approximated similar positive semidefinite matrix. approximation suggested Beaumont Patak (2012), note conservative sense producing overestimates variance. Beaumont Patak (2012) argue overestimation expected small magnitude. See get_nearest_psd_matrix details approximation. mse TRUE, compute variances sums squares around point estimate full-sample weights, FALSE, compute variances sums squares around mean estimate replicate weights. compress reduces computer memory required represent replicate weights impact estimates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"replicate design object, class svyrep.design, can used usual functions, svymean() svyglm(). Use weights(..., type = 'analysis') extract matrix replicate weights. Use as_data_frame_with_weights() convert design object data frame columns full-sample replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"Let \\(v( \\hat{T_y})\\) textbook variance estimator estimated population total \\(\\hat{T}_y\\) variable \\(y\\). base weight case \\(\\) sample \\(w_i\\), let \\(\\breve{y}_i\\) denote weighted value \\(w_iy_i\\). Suppose can represent textbook variance estimator quadratic form: \\(v(\\hat{T}_y) = \\breve{y}\\Sigma\\breve{y}^T\\), \\(n \\times n\\) matrix \\(\\Sigma\\). constraint \\(\\Sigma\\) , sample, must symmetric positive semidefinite. bootstrapping process creates \\(B\\) sets replicate weights, \\(b\\)-th set replicate weights vector length \\(n\\) denoted \\(\\mathbf{}^{(b)}\\), whose \\(k\\)-th value denoted \\(a_k^{(b)}\\). yields \\(B\\) replicate estimates population total, \\(\\hat{T}_y^{*(b)}=\\sum_{k \\s} a_k^{(b)} \\breve{y}_k\\), \\(b=1, \\ldots B\\), can used estimate sampling variance. $$ v_B\\left(\\hat{T}_y\\right)=\\frac{\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2}{B} $$ bootstrap variance estimator can written quadratic form: $$ v_B\\left(\\hat{T}_y\\right) =\\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}} $$ $$ \\boldsymbol{\\Sigma}_B = \\frac{\\sum_{b=1}^B\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)^{\\prime}}{B} $$ Note vector adjustment factors \\(\\mathbf{}^{(b)}\\) expectation \\(\\mathbf{1}_n\\) variance-covariance matrix \\(\\boldsymbol{\\Sigma}\\), bootstrap expectation \\(E_{*}\\left( \\boldsymbol{\\Sigma}_B \\right) = \\boldsymbol{\\Sigma}\\). Since bootstrap process takes sample values \\(\\breve{y}\\) fixed, bootstrap expectation variance estimator \\(E_{*} \\left( \\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}}\\right)= \\mathbf{\\breve{y}}^{\\prime}\\Sigma \\mathbf{\\breve{y}}\\). Thus, can produce bootstrap variance estimator expectation textbook variance estimator simply randomly generating \\(\\mathbf{}^{(b)}\\) distribution following two conditions: Condition 1: \\(\\quad \\mathbf{E}_*(\\mathbf{})=\\mathbf{1}_n\\) Condition 2: \\(\\quad \\mathbf{E}_*\\left(\\mathbf{}-\\mathbf{1}_n\\right)\\left(\\mathbf{}-\\mathbf{1}_n\\right)^{\\prime}=\\mathbf{\\Sigma}\\) multiple ways generate adjustment factors satisfying conditions, simplest general method simulate multivariate normal distribution: \\(\\mathbf{} \\sim MVN(\\mathbf{1}_n, \\boldsymbol{\\Sigma})\\). method used function.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"details-on-rescaling-to-avoid-negative-adjustment-factors","dir":"Reference","previous_headings":"","what":"Details on Rescaling to Avoid Negative Adjustment Factors","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"Let \\(\\mathbf{} = \\left[ \\mathbf{}^{(1)} \\cdots \\mathbf{}^{(b)} \\cdots \\mathbf{}^{(B)} \\right]\\) denote \\((n \\times B)\\) matrix bootstrap adjustment factors. eliminate negative adjustment factors, Beaumont Patak (2012) propose forming rescaled matrix nonnegative replicate factors \\(\\mathbf{}^S\\) rescaling adjustment factor \\(a_k^{(b)}\\) follows: $$ a_k^{S,(b)} = \\frac{a_k^{(b)} + \\tau - 1}{\\tau} $$ \\(\\tau \\geq 1 - a_k^{(b)} \\geq 1\\) \\(k\\) \\(\\left\\{ 1,\\ldots,n \\right\\}\\) \\(b\\) \\(\\left\\{1, \\ldots, B\\right\\}\\). value \\(\\tau\\) can set based realized adjustment factor matrix \\(\\mathbf{}\\) choosing \\(\\tau\\) prior generating adjustment factor matrix \\(\\mathbf{}\\) \\(\\tau\\) likely large enough prevent negative bootstrap weights. adjustment factors rescaled manner, important adjust scale factor used estimating variance bootstrap replicates, becomes \\(\\frac{\\tau^2}{B}\\) instead \\(\\frac{1}{B}\\). $$ \\textbf{Prior rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{1}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2 $$ $$ \\textbf{rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{\\tau^2}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{S*(b)}-\\hat{T}_y\\right)^2 $$ sharing dataset uses rescaled weights generalized survey bootstrap, documentation dataset instruct user use replication scale factor \\(\\frac{\\tau^2}{B}\\) rather \\(\\frac{1}{B}\\) estimating sampling variances.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"two-phase-designs","dir":"Reference","previous_headings":"","what":"Two-Phase Designs","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"two-phase design, variance_estimator list variance estimators' names, two elements, list('Ultimate Cluster', 'Poisson Horvitz-Thompson'). two-phase designs, following estimators may used second phase: \"Ultimate Cluster\" \"Stratified Multistage SRS\" \"Poisson Horvitz-Thompson\" statistical details handling two-phase designs, see documentation make_twophase_quad_form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"generalized survey bootstrap first proposed Bertail Combris (1997). See Beaumont Patak (2012) clear overview generalized survey bootstrap. generalized survey bootstrap represents one strategy forming replication variance estimators general framework proposed Fay (1984) Dippo, Fay, Morganstein (1984). - Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59. - Bellhouse, D.R. (1985). \"Computing Methods Variance Estimation Complex Surveys.\" Journal Official Statistics, Vol.1, .3. - Beaumont, Jean-François, Zdenek Patak. 2012. “Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.” International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x. - Bertail, Combris. 1997. “Bootstrap Généralisé d’un Sondage.” Annales d’Économie Et de Statistique, . 46: 49. https://doi.org/10.2307/20076068. - Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. - Dippo, Cathryn, Robert Fay, David Morganstein. 1984. “Computing Variances Complex Samples Replicate Weights.” , 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf. - Fay, Robert. 1984. “Properties Estimates Variance Based Replication Methods.” , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf. - Matei, Alina, Yves Tillé. (2005). “Evaluation Variance Approximations Estimators Maximum Entropy Sampling Unequal Probability Fixed Sample Size.” Journal Official Statistics, 21(4):543–70.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"","code":"if (FALSE) { library(survey) # Example 1: Bootstrap based on the Yates-Grundy estimator ---- set.seed(2014) data('election', package = 'survey') ## Create survey design object pps_design_yg <- svydesign( data = election_pps, id = ~1, fpc = ~p, pps = ppsmat(election_jointprob), variance = \"YG\" ) ## Convert to generalized bootstrap replicate design gen_boot_design_yg <- pps_design_yg |> as_gen_boot_design(variance_estimator = \"Yates-Grundy\", replicates = 1000, tau = \"auto\") svytotal(x = ~ Bush + Kerry, design = pps_design_yg) svytotal(x = ~ Bush + Kerry, design = gen_boot_design_yg) # Example 2: Bootstrap based on the successive-difference estimator ---- data('library_stsys_sample', package = 'svrep') ## First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] ## Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) ## Convert to generalized bootstrap replicate design gen_boot_design_sd2 <- as_gen_boot_design( design = design_obj, variance_estimator = \"SD2\", replicates = 2000 ) ## Estimate sampling variances svytotal(x = ~ TOTSTAFF, na.rm = TRUE, design = gen_boot_design_sd2) svytotal(x = ~ TOTSTAFF, na.rm = TRUE, design = design_obj) # Example 3: Two-phase sample ---- # -- First stage is stratified systematic sampling, # -- second stage is response/nonresponse modeled as Poisson sampling nonresponse_model <- glm( data = library_stsys_sample, family = quasibinomial('logit'), formula = I(RESPONSE_STATUS == \"Survey Respondent\") ~ 1, weights = 1/library_stsys_sample$SAMPLING_PROB ) library_stsys_sample[['RESPONSE_PROPENSITY']] <- predict( nonresponse_model, newdata = library_stsys_sample, type = \"response\" ) twophase_design <- twophase( data = library_stsys_sample, # Identify cases included in second phase sample subset = ~ I(RESPONSE_STATUS == \"Survey Respondent\"), strata = list(~ SAMPLING_STRATUM, NULL), id = list(~ 1, ~ 1), probs = list(NULL, ~ RESPONSE_PROPENSITY), fpc = list(~ STRATUM_POP_SIZE, NULL), ) twophase_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( \"SD2\", \"Poisson Horvitz-Thompson\" ) ) svytotal(x = ~ LIBRARIA, design = twophase_boot_design) }"},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"Forms specified number jackknife replicates based grouping primary sampling units (PSUs) random, (approximately) equal-sized groups.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"","code":"as_random_group_jackknife_design( design, replicates = 50, var_strat = NULL, var_strat_frac = NULL, sort_var = NULL, adj_method = \"variance-stratum-psus\", scale_method = \"variance-stratum-psus\", group_var_name = \".random_group\", compress = TRUE, mse = getOption(\"survey.replicates.mse\") )"},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. replicates number replicates create variance stratum. total number replicates created number variance strata times replicates. Every design stratum must least many primary sampling units (PSUs), replicates. var_strat Specifies name variable data defines variance strata use grouped jackknife. var_strat = NULL, effectively one variance stratum. var_strat_frac Specifies sampling fraction use finite population corrections value var_strat. Can use either single number variable data corresponding var_strat. sort_var (Optional) Specifies name variable data used sort data assigning random groups. variable specified var_strat, sorting happen within values variable. adj_method Specifies calculate replicate weight adjustment factor. Available options adj_method include: \"variance-stratum-psus\" (default) replicate weight adjustment unit based number PSUs variance stratum. \"variance-units\" replicate weight adjustment unit based number variance units variance stratum. See section \"Adjustment Scale Methods\" details. scale_method Specifies calculate scale factor replicate. Available options scale_method include: \"variance-stratum-psus\" scale factor variance unit based number PSUs compared number PSUs variance stratum. \"variance-units\" scale factor variance unit based number variance units variance stratum. See section \"Adjustment Scale Methods\" details. group_var_name (Optional) name new variable created save identifiers random group PSU grouped purpose forming replicates. Specify group_var_name = NULL avoid creating variable data. compress Use compressed representation replicate weights matrix. reduces computer memory required represent replicate weights impact estimates. mse TRUE, compute variances sums squares around point estimate full-sample weights, FALSE, compute variances sums squares around mean estimate replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"replicate design object, class svyrep.design, can used usual functions, svymean() svyglm(). Use weights(..., type = 'analysis') extract matrix replicate weights. Use as_data_frame_with_weights() convert design object data frame columns full-sample replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"formation-of-random-groups","dir":"Reference","previous_headings":"","what":"Formation of Random Groups","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"Within value VAR_STRAT, data sorted first-stage sampling strata, PSUs stratum randomly arranged. Groups formed serially placing PSUs group. first PSU VAR_STRAT placed first group, second PSU second group, . PSU assigned last group, process begins assigning next PSU first group, PSU second group, . random group observation assigned can saved variable data using function argument group_var_name.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"adjustment-and-scale-methods","dir":"Reference","previous_headings":"","what":"Adjustment and Scale Methods","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"jackknife replication variance estimator based \\(R\\) replicates takes following form: $$ v(\\hat{\\theta}) = \\sum_{r=1}^{R} (1 - f_r) \\times c_r \\times \\left(\\hat{\\theta}_r - \\hat{\\theta}\\right)^2 $$ \\(r\\) indexes one \\(R\\) sets replicate weights, \\(c_r\\) corresponding scale factor \\(r\\)-th replicate, \\(1 - f_r\\) optional finite population correction factor can potentially differ across variance strata. form replicate weights, PSUs divided \\(\\tilde{H}\\) variance strata, \\(\\tilde{h}\\)-th variance stratum contains \\(G_{\\tilde{h}}\\) random groups. number replicates \\(R\\) equals total number random groups across variance strata: \\(R = \\sum_{\\tilde{h}}^{\\tilde{H}} G_{\\tilde{h}}\\). words, replicate corresponds one random groups one variance strata. weights replicate \\(r\\) corresponding random group \\(g\\) within variance stratum \\(\\tilde{h}\\) defined follows. case \\(\\) variance stratum \\(\\tilde{h}\\), \\(w_{}^{(r)} = w_i\\). case \\(\\) variance stratum \\(\\tilde{h}\\) random group \\(g\\), \\(w_{}^{(r)} = a_{\\tilde{h}g} w_i\\). Otherwise, case \\(\\) random group \\(g\\) variance stratum \\(\\tilde{h}\\), \\(w_{}^{(r)} = 0\\). R function argument adj_method determines adjustment factor \\(a_{\\tilde{h} g}\\) calculated. adj_method = \"variance-units\", \\(a_{\\tilde{h} g}\\) calculated based \\(G_{\\tilde{h}}\\), number random groups variance stratum \\(\\tilde{h}\\). adj_method = \"variance-stratum-psus\", \\(a_{\\tilde{h} g}\\) calculated based \\(n_{\\tilde{h}g}\\), number PSUs random group \\(g\\) variance stratum \\(\\tilde{h}\\), well \\(n_{\\tilde{h}}\\), total number PSUs variance stratum \\(\\tilde{h}\\). adj_method = \"variance-units\", : $$a_{\\tilde{h}g} = \\frac{G_{\\tilde{h}}}{G_{\\tilde{h}} - 1}$$ adj_method = \"variance-stratum-psus\", : $$a_{\\tilde{h}g} = \\frac{n_{\\tilde{h}}}{n_{\\tilde{h}} - n_{\\tilde{h}g}}$$ scale factor \\(c_r\\) replicate \\(r\\) corresponding random group \\(g\\) within variance stratum \\(\\tilde{h}\\) calculated according function argument scale_method. scale_method = \"variance-units\", : $$c_r = \\frac{G_{\\tilde{h}} - 1}{G_{\\tilde{h}}}$$ scale_method = \"variance-stratum-psus\", : $$c_r = \\frac{n_{\\tilde{h}} - n_{\\tilde{h}g}}{n_{\\tilde{h}}}$$ sampling fraction \\(f_r\\) used finite population correction \\(1 - f_r\\) default assumed equal 0. However, user can supply sampling fraction variance stratum using argument var_strat_frac. variance units variance stratum differing numbers PSUs, combination adj_method = \"variance-stratum-psus\" scale_method = \"variance-units\" recommended Valliant, Brick, Dever (2008), corresponding method \"GJ2\". random-groups jackknife method often referred \"DAGJK\" corresponds options var_strat = NULL, adj_method = \"variance-units\", scale_method = \"variance-units\". DAGJK method yield upwardly-biased variance estimates totals total number PSUs multiple total number replicates (Valliant, Brick, Dever 2008).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"See Section 15.5 Valliant, Dever, Kreuter (2018) introduction grouped jackknife guidelines creating random groups. - Valliant, R., Dever, J., Kreuter, F. (2018). \"Practical Tools Designing Weighting Survey Samples, 2nd edition.\" New York: Springer. See Valliant, Brick, Dever (2008) statistical details related adj_method scale_method arguments. - Valliant, Richard, Michael Brick, Jill Dever. 2008. \"Weight Adjustments Grouped Jackknife Variance Estimator.\" Journal Official Statistics. 24: 469–88. See Chapter 4 Wolter (2007) additional details jackknife, including method based random groups. - Wolter, Kirk. 2007. \"Introduction Variance Estimation.\" New York, NY: Springer New York. https://doi.org/10.1007/978-0-387-35099-8.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"","code":"library(survey) # Load example data data('api', package = 'survey') api_strat_design <- svydesign( data = apistrat, id = ~ 1, strata = ~stype, weights = ~pw ) # Create a random-groups jackknife design jk_design <- as_random_group_jackknife_design( api_strat_design, replicates = 15 ) print(jk_design) #> Call: as_random_group_jackknife_design(api_strat_design, replicates = 15) #> with 15 replicates."},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":null,"dir":"Reference","previous_headings":"","what":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"Calibrate weights primary survey match estimated totals control survey, using adjustments replicate weights account variance estimated control totals. adjustments replicate weights conducted using method proposed Fuller (1998). method can used implement general calibration well post-stratification raking specifically (see details calfun parameter).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"","code":"calibrate_to_estimate( rep_design, estimate, vcov_estimate, cal_formula, calfun = survey::cal.linear, bounds = list(lower = -Inf, upper = Inf), verbose = FALSE, maxit = 50, epsilon = 1e-07, variance = NULL, col_selection = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"rep_design replicate design object primary survey, created either survey srvyr packages. estimate vector estimated control totals. names entries must match names calling svytotal(x = cal_formula, design = rep_design). vcov_estimate variance-covariance matrix estimated control totals. column names row names must match names estimate. cal_formula formula listing variables use calibration. variables must included rep_design. calfun calibration function survey package, cal.linear, cal.raking, cal.logit. Use cal.linear ordinary post-stratification, cal.raking raking. See calibrate additional details. bounds Parameter passed grake calibration. See calibrate details. verbose Parameter passed grake calibration. See calibrate details. maxit Parameter passed grake calibration. See calibrate details. epsilon Parameter passed grake calibration. calibration, absolute difference calibration target calibrated estimate larger epsilon times (1 plus absolute value target). See calibrate details. variance Parameter passed grake calibration. See calibrate details. col_selection Optional parameter determine replicate columns control totals perturbed. supplied, col_selection must integer vector length equal length estimate.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"replicate design object, full-sample weights calibrated totals estimate, replicate weights adjusted account variance control totals. element col_selection indicates, replicate column calibrated primary survey, column replicate weights matched control survey.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"Fuller method, k randomly-selected replicate columns primary survey calibrated control totals formed perturbing k-dimensional vector estimated control totals using spectral decomposition variance-covariance matrix estimated control totals. replicate columns simply calibrated unperturbed control totals. set replicate columns whose control totals perturbed random, multiple ways ensure matching reproducible. user can either call set.seed using function, supply vector randomly-selected column indices argument col_selection.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"syntax-for-common-types-of-calibration","dir":"Reference","previous_headings":"","what":"Syntax for Common Types of Calibration","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"ratio estimation auxiliary variable X, use following options: - cal_formula = ~ -1 + X - variance = 1, - cal.fun = survey::cal.linear post-stratification, use following option: - cal.fun = survey::cal.linear raking, use following option: - cal.fun = survey::cal.raking","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"Fuller, W.. (1998). \"Replication variance estimation two-phase samples.\" Statistica Sinica, 8: 1153-1164. Opsomer, J.D. . Erciulescu (2021). \"Replication variance estimation sample-based calibration.\" Survey Methodology, 47: 265-277.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"","code":"if (FALSE) { # Load example data for primary survey ---- suppressPackageStartupMessages(library(survey)) data(api) primary_survey <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) |> as.svrepdesign(type = \"JK1\") # Load example data for control survey ---- control_survey <- svydesign(id = ~ 1, fpc = ~fpc, data = apisrs) |> as.svrepdesign(type = \"JK1\") # Estimate control totals ---- estimated_controls <- svytotal(x = ~ stype + enroll, design = control_survey) control_point_estimates <- coef(estimated_controls) control_vcov_estimate <- vcov(estimated_controls) # Calibrate totals for one categorical variable and one numeric ---- calibrated_rep_design <- calibrate_to_estimate( rep_design = primary_survey, estimate = control_point_estimates, vcov_estimate = control_vcov_estimate, cal_formula = ~ stype + enroll ) # Inspect estimates before and after calibration ---- ##_ For the calibration variables, estimates and standard errors ##_ from calibrated design will match those of the control survey svytotal(x = ~ stype + enroll, design = primary_survey) svytotal(x = ~ stype + enroll, design = control_survey) svytotal(x = ~ stype + enroll, design = calibrated_rep_design) ##_ Estimates from other variables will be changed as well svymean(x = ~ api00 + api99, design = primary_survey) svymean(x = ~ api00 + api99, design = control_survey) svymean(x = ~ api00 + api99, design = calibrated_rep_design) # Inspect weights before and after calibration ---- summarize_rep_weights(primary_survey, type = 'overall') summarize_rep_weights(calibrated_rep_design, type = 'overall') # For reproducibility, specify which columns are randomly selected for Fuller method ---- column_selection <- calibrated_rep_design$col_selection print(column_selection) calibrated_rep_design <- calibrate_to_estimate( rep_design = primary_survey, estimate = control_point_estimates, vcov_estimate = control_vcov_estimate, cal_formula = ~ stype + enroll, col_selection = column_selection ) }"},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":null,"dir":"Reference","previous_headings":"","what":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"Calibrate weights primary survey match estimated totals control survey, using adjustments replicate weights account variance estimated control totals. adjustments replicate weights conducted using method proposed Opsomer Erciulescu (2021). method can used implement general calibration well post-stratification raking specifically (see details calfun parameter).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"","code":"calibrate_to_sample( primary_rep_design, control_rep_design, cal_formula, calfun = survey::cal.linear, bounds = list(lower = -Inf, upper = Inf), verbose = FALSE, maxit = 50, epsilon = 1e-07, variance = NULL, control_col_matches = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"primary_rep_design replicate design object primary survey, created either survey srvyr packages. control_rep_design replicate design object control survey. cal_formula formula listing variables use calibration. variables must included primary_rep_design control_rep_design. calfun calibration function survey package, cal.linear, cal.raking, cal.logit. Use cal.linear ordinary post-stratification, cal.raking raking. See calibrate additional details. bounds Parameter passed grake calibration. See calibrate details. verbose Parameter passed grake calibration. See calibrate details. maxit Parameter passed grake calibration. See calibrate details. epsilon Parameter passed grake calibration. calibration, absolute difference calibration target calibrated estimate larger epsilon times (1 plus absolute value target). See calibrate details. variance Parameter passed grake calibration. See calibrate details. control_col_matches Optional parameter specify control survey replicate matched primary survey replicate. \\(-th\\) entry control_col_matches equals \\(k\\), replicate \\(\\) primary_rep_design matched replicate \\(k\\) control_rep_design. Entries NA denote primary survey replicate matched control survey replicate. parameter used, matching done random.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"replicate design object, full-sample weights calibrated totals control_rep_design, replicate weights adjusted account variance control totals. primary_rep_design fewer columns replicate weights control_rep_design, number replicate columns length rscales increased multiple k, scale updated dividing k. element control_column_matches indicates, replicate column calibrated primary survey, column replicate weights matched control survey. Columns matched control survey replicate column indicated NA. element degf set match primary survey ensure degrees freedom erroneously inflated potential increases number columns replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"Opsomer-Erciulescu method, column replicate weights control survey randomly matched column replicate weights primary survey, column primary survey calibrated control totals estimated perturbing control sample's full-sample estimates using estimates matched column replicate weights control survey. fewer columns replicate weights control survey primary survey, primary replicate columns matched replicate column control survey. columns replicate weights control survey primary survey, columns replicate weights primary survey duplicated k times, k smallest positive integer resulting number columns replicate weights primary survey greater equal number columns replicate weights control survey. replicate columns control survey matched random primary survey replicate columns, multiple ways ensure matching reproducible. user can either call set.seed using function, supply mapping argument control_col_matches.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"syntax-for-common-types-of-calibration","dir":"Reference","previous_headings":"","what":"Syntax for Common Types of Calibration","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"ratio estimation auxiliary variable X, use following options: - cal_formula = ~ -1 + X - variance = 1, - cal.fun = survey::cal.linear post-stratification, use following option: - cal.fun = survey::cal.linear raking, use following option: - cal.fun = survey::cal.raking","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"Opsomer, J.D. . Erciulescu (2021). \"Replication variance estimation sample-based calibration.\" Survey Methodology, 47: 265-277.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"","code":"if (FALSE) { # Load example data for primary survey ---- suppressPackageStartupMessages(library(survey)) data(api) primary_survey <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) |> as.svrepdesign(type = \"JK1\") # Load example data for control survey ---- control_survey <- svydesign(id = ~ 1, fpc = ~fpc, data = apisrs) |> as.svrepdesign(type = \"JK1\") # Calibrate totals for one categorical variable and one numeric ---- calibrated_rep_design <- calibrate_to_sample( primary_rep_design = primary_survey, control_rep_design = control_survey, cal_formula = ~ stype + enroll, ) # Inspect estimates before and after calibration ---- ##_ For the calibration variables, estimates and standard errors ##_ from calibrated design will match those of the control survey svytotal(x = ~ stype + enroll, design = primary_survey) svytotal(x = ~ stype + enroll, design = control_survey) svytotal(x = ~ stype + enroll, design = calibrated_rep_design) ##_ Estimates from other variables will be changed as well svymean(x = ~ api00 + api99, design = primary_survey) svymean(x = ~ api00 + api99, design = control_survey) svymean(x = ~ api00 + api99, design = calibrated_rep_design) # Inspect weights before and after calibration ---- summarize_rep_weights(primary_survey, type = 'overall') summarize_rep_weights(calibrated_rep_design, type = 'overall') # For reproducibility, specify how to match replicates between surveys ---- column_matching <- calibrated_rep_design$control_col_matches print(column_matching) calibrated_rep_design <- calibrate_to_sample( primary_rep_design = primary_survey, control_rep_design = control_survey, cal_formula = ~ stype + enroll, control_col_matches = column_matching ) }"},{"path":"https://bschneidr.github.io/svrep/reference/compress_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Produce a compressed representation of a survey design object — compress_design","title":"Produce a compressed representation of a survey design object — compress_design","text":"Produce compressed representation survey design object","code":""},{"path":"https://bschneidr.github.io/svrep/reference/compress_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Produce a compressed representation of a survey design object — compress_design","text":"","code":"compress_design(design, vars_to_keep = NULL)"},{"path":"https://bschneidr.github.io/svrep/reference/compress_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Produce a compressed representation of a survey design object — compress_design","text":"design survey design object vars_to_keep (Optional) character vector variables design keep compressed design. default, none variables retained.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/compress_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Produce a compressed representation of a survey design object — compress_design","text":"list two elements. design_subset element design object minimal rows needed represent survey design. index element links row original design row design_subset, design can \"uncompressed.\"","code":""},{"path":"https://bschneidr.github.io/svrep/reference/distribute_matrix_across_clusters.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","title":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","text":"Turns cluster-level matrix element-level matrix suitably duplicating rows columns matrix.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/distribute_matrix_across_clusters.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","text":"","code":"distribute_matrix_across_clusters( cluster_level_matrix, cluster_ids, rows = TRUE, cols = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/distribute_matrix_across_clusters.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","text":"cluster_level_matrix square matrix, whose number rows/columns matches number clusters. cluster_ids vector cluster identifiers. rows=TRUE, number unique elements cluster_ids must match number rows cluster_level_matrix. cols=TRUE, number unique elements cluster_ids must match number columns cluster_level_matrix. rows Whether duplicate rows cluster_level_matrix elements cluster. cols Whether duplicate columns cluster_level_matrix elements cluster.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/distribute_matrix_across_clusters.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","text":"input cluster_level_matrix rows/columns duplicated number rows (rows=TRUE) columns (cols=TRUE) equals length cluster_ids.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"function estimates number bootstrap replicates needed reduce simulation error bootstrap variance estimator target level, \"simulation error\" defined error caused using finite number bootstrap replicates simulation error measured simulation coefficient variation (\"simulation CV\").","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"","code":"estimate_boot_reps_for_target_cv(svrepstat, target_cv = 0.05)"},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"svrepstat estimate obtained bootstrap replicate survey design object, function svymean(..., return.replicates = TRUE) withReplicates(..., return.replicates = TRUE). target_cv numeric value (vector numeric values) 0 1. target simulation CV bootstrap variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"data frame one row value target_cv. column TARGET_CV gives target coefficient variation. column MAX_REPS gives maximum number replicates needed statistics included svrepstat. remaining columns give number replicates needed statistic.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"suggested-usage","dir":"Reference","previous_headings":"","what":"Suggested Usage","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"- Step 1: Determine largest acceptable level simulation error key survey estimates, level simulation error measured terms simulation CV. refer \"target CV.\" conventional value target CV 5%. - Step 2: Estimate key statistics interest using large number bootstrap replicates (5,000) save estimates bootstrap replicate. can conveniently done using function survey package svymean(..., return.replicates = TRUE) withReplicates(..., return.replicates = TRUE). - Step 3: Use function estimate_boot_reps_for_target_cv() estimate minimum number bootstrap replicates needed attain target CV.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"Unlike replication methods jackknife balanced repeated replication, bootstrap variance estimator's precision can always improved using larger number replicates, use finite number bootstrap replicates introduces simulation error variance estimation process. Simulation error can measured \"simulation coefficient variation\" (CV), ratio standard error bootstrap estimator expectation bootstrap estimator, expectation standard error evaluated respect bootstrapping process given selected sample. statistic \\(\\hat{\\theta}\\), simulation CV bootstrap variance estimator \\(v_{B}(\\hat{\\theta})\\) based \\(B\\) replicate estimates \\(\\hat{\\theta}^{\\star}_1,\\dots,\\hat{\\theta}^{\\star}_B\\) defined follows: $$ CV_{\\star}(v_{B}(\\hat{\\theta})) = \\frac{\\sqrt{var_{\\star}(v_B(\\hat{\\theta}))}}{E_{\\star}(v_B(\\hat{\\theta}))} = \\frac{CV_{\\star}(E_2)}{\\sqrt{B}} $$ $$ E_2 = (\\hat{\\theta}^{\\star} - \\hat{\\theta})^2 $$ $$ CV_{\\star}(E_2) = \\frac{\\sqrt{var_{\\star}(E_2)}}{E_{\\star}(E_2)} $$ \\(var_{\\star}\\) \\(E_{\\star}\\) evaluated respect bootstrapping process, given selected sample. simulation CV, denoted \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\), estimated given number replicates \\(B\\) estimating \\(CV_{\\star}(E_2)\\) using observed values dividing \\(\\sqrt{B}\\). bootstrap errors assumed normally distributed, \\(CV_{\\star}(E_2)=\\sqrt{2}\\) \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\) need estimated. Using observed replicate estimates estimate simulation CV instead assuming normality allows simulation CV used wide array bootstrap methods.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"See Section 3.3 Section 8 Beaumont Patak (2012) details example simulation CV used determine number bootstrap replicates needed various alternative bootstrap methods empirical illustration. Beaumont, J.-F. Z. Patak. (2012), \"Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling.\" International Statistical Review, 80: 127-148. doi:10.1111/j.1751-5823.2011.00166.x .","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"","code":"if (FALSE) { set.seed(2022) # Create an example bootstrap survey design object ---- library(survey) data('api', package = 'survey') boot_design <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) |> svrep::as_bootstrap_design(replicates = 5000) # Calculate estimates of interest and retain estimates from each replicate ---- estimated_means_and_proportions <- svymean(x = ~ api00 + api99 + stype, design = boot_design, return.replicates = TRUE) custom_statistic <- withReplicates(design = boot_design, return.replicates = TRUE, theta = function(wts, data) { numerator <- sum(data$api00 * wts) denominator <- sum(data$api99 * wts) statistic <- numerator/denominator return(statistic) }) # Determine minimum number of bootstrap replicates needed to obtain given simulation CVs ---- estimate_boot_reps_for_target_cv( svrepstat = estimated_means_and_proportions, target_cv = c(0.01, 0.05, 0.10) ) estimate_boot_reps_for_target_cv( svrepstat = custom_statistic, target_cv = c(0.01, 0.05, 0.10) ) }"},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"Estimates bootstrap simulation error, expressed \"simulation coefficient variation\" (CV).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"","code":"estimate_boot_sim_cv(svrepstat)"},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"svrepstat estimate obtained bootstrap replicate survey design object, function svymean(..., return.replicates = TRUE) withReplicates(..., return.replicates = TRUE).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"data frame one row statistic. column STATISTIC gives name statistic. column SIMULATION_CV gives estimated simulation CV statistic. column N_REPLICATES gives number bootstrap replicates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"Unlike replication methods jackknife balanced repeated replication, bootstrap variance estimator's precision can always improved using larger number replicates, use finite number bootstrap replicates introduces simulation error variance estimation process. Simulation error can measured \"simulation coefficient variation\" (CV), ratio standard error bootstrap estimator expectation bootstrap estimator, expectation standard error evaluated respect bootstrapping process given selected sample. statistic \\(\\hat{\\theta}\\), simulation CV bootstrap variance estimator \\(v_{B}(\\hat{\\theta})\\) based \\(B\\) replicate estimates \\(\\hat{\\theta}^{\\star}_1,\\dots,\\hat{\\theta}^{\\star}_B\\) defined follows: $$ CV_{\\star}(v_{B}(\\hat{\\theta})) = \\frac{\\sqrt{var_{\\star}(v_B(\\hat{\\theta}))}}{E_{\\star}(v_B(\\hat{\\theta}))} = \\frac{CV_{\\star}(E_2)}{\\sqrt{B}} $$ $$ E_2 = (\\hat{\\theta}^{\\star} - \\hat{\\theta})^2 $$ $$ CV_{\\star}(E_2) = \\frac{\\sqrt{var_{\\star}(E_2)}}{E_{\\star}(E_2)} $$ \\(var_{\\star}\\) \\(E_{\\star}\\) evaluated respect bootstrapping process, given selected sample. simulation CV, denoted \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\), estimated given number replicates \\(B\\) estimating \\(CV_{\\star}(E_2)\\) using observed values dividing \\(\\sqrt{B}\\). bootstrap errors assumed normally distributed, \\(CV_{\\star}(E_2)=\\sqrt{2}\\) \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\) need estimated. Using observed replicate estimates estimate simulation CV instead assuming normality allows simulation CV used wide array bootstrap methods.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"See Section 3.3 Section 8 Beaumont Patak (2012) details example simulation CV used determine number bootstrap replicates needed various alternative bootstrap methods empirical illustration. Beaumont, J.-F. Z. Patak. (2012), \"Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling.\" International Statistical Review, 80: 127-148. doi:10.1111/j.1751-5823.2011.00166.x .","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"","code":"if (FALSE) { set.seed(2022) # Create an example bootstrap survey design object ---- library(survey) data('api', package = 'survey') boot_design <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) |> svrep::as_bootstrap_design(replicates = 5000) # Calculate estimates of interest and retain estimates from each replicate ---- estimated_means_and_proportions <- svymean(x = ~ api00 + api99 + stype, design = boot_design, return.replicates = TRUE) custom_statistic <- withReplicates(design = boot_design, return.replicates = TRUE, theta = function(wts, data) { numerator <- sum(data$api00 * wts) denominator <- sum(data$api99 * wts) statistic <- numerator/denominator return(statistic) }) # Estimate simulation CV of bootstrap estimates ---- estimate_boot_sim_cv( svrepstat = estimated_means_and_proportions ) estimate_boot_sim_cv( svrepstat = custom_statistic ) }"},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":null,"dir":"Reference","previous_headings":"","what":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"Determines quadratic form matrix specified variance estimator, parsing information stored survey design object created using 'survey' package.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"","code":"get_design_quad_form( design, variance_estimator, ensure_psd = FALSE, aux_var_names = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. Also accepts two-phase design objects class 'twophase2'; see section titled \"Two-Phase Designs\" information handling two-phase designs. variance_estimator name variance estimator whose quadratic form matrix created. See section \"Variance Estimators\" . Options include: \"Yates-Grundy\": Yates-Grundy variance estimator based first-order second-order inclusion probabilities. \"Horvitz-Thompson\": Horvitz-Thompson variance estimator based first-order second-order inclusion probabilities. \"Poisson Horvitz-Thompson\": Horvitz-Thompson variance estimator based assuming Poisson sampling specified first-order inclusion probabilities. \"Stratified Multistage SRS\": usual stratified multistage variance estimator based estimating variance cluster totals within strata stage. \"Ultimate Cluster\": usual variance estimator based estimating variance first-stage cluster totals within first-stage strata. \"Deville-1\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 1\". \"Deville-2\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 2\". \"Deville-Tille\": variance estimator useful balanced sampling designs, proposed Deville Tillé (2005). \"SD1\": non-circular successive-differences variance estimator described Ash (2014), sometimes used variance estimation systematic sampling. \"SD2\": circular successive-differences variance estimator described Ash (2014). estimator basis \"successive-differences replication\" estimator commonly used variance estimation systematic sampling. ensure_psd TRUE (default), ensures result positive semidefinite matrix. necessary quadratic form used input replication methods generalized bootstrap. mathematical details, please see documentation function get_nearest_psd_matrix(). approximation method discussed Beaumont Patak (2012) context forming replicate weights two-phase samples. authors argue approximation lead small overestimation variance. aux_var_names required variance_estimator = \"Deville-Tille\". character vector variable names auxiliary variables used Breidt Chauvet (2011) variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"matrix representing quadratic form specified variance estimator, based extracting information clustering, stratification, selection probabilities survey design object.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"variance-estimators","dir":"Reference","previous_headings":"","what":"Variance Estimators","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"See variance-estimators description variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"two-phase-designs","dir":"Reference","previous_headings":"","what":"Two-Phase Designs","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"two-phase design, variance_estimator list variance estimators' names, two elements, list('Ultimate Cluster', 'Poisson Horvitz-Thompson'). two-phase designs, following estimators may used second phase: \"Ultimate Cluster\" \"Stratified Multistage SRS\" \"Poisson Horvitz-Thompson\" statistical details handling two-phase designs, see documentation make_twophase_quad_form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"- Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59. - Beaumont, Jean-François, Zdenek Patak. (2012). \"Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.\" International Statistical Review 80 (1): 127–48. - Bellhouse, D.R. (1985). \"Computing Methods Variance Estimation Complex Surveys.\" Journal Official Statistics, Vol.1, .3. - Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. - Särndal, C.-E., Swensson, B., & Wretman, J. (1992). \"Model Assisted Survey Sampling.\" Springer New York.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"","code":"if (FALSE) { # Example 1: Quadratic form for successive-difference variance estimator ---- data('library_stsys_sample', package = 'svrep') ## First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] ## Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) ## Obtain quadratic form quad_form_matrix <- get_design_quad_form( design = design_obj, variance_estimator = \"SD2\" ) ## Estimate variance of estimated population total y <- design_obj$variables$LIBRARIA wts <- weights(design_obj, type = 'sampling') y_wtd <- as.matrix(y) * wts y_wtd[is.na(y_wtd)] <- 0 pop_total <- sum(y_wtd) var_est <- t(y_wtd) %*% quad_form_matrix %*% y_wtd std_error <- sqrt(var_est) print(pop_total); print(std_error) # Compare to estimate from assuming SRS svytotal(x = ~ LIBRARIA, na.rm = TRUE, design = design_obj) # Example 2: Two-phase design (second phase is nonresponse) ---- ## Estimate response propensities, separately by stratum library_stsys_sample[['RESPONSE_PROB']] <- svyglm( design = design_obj, formula = I(RESPONSE_STATUS == \"Survey Respondent\") ~ SAMPLING_STRATUM, family = quasibinomial('logistic') ) |> predict(type = 'response') ## Create a survey design object, ## where nonresponse is treated as a second phase of sampling twophase_design <- twophase( data = library_stsys_sample, strata = list(~ SAMPLING_STRATUM, NULL), id = list(~ 1, ~ 1), fpc = list(~ STRATUM_POP_SIZE, NULL), probs = list(NULL, ~ RESPONSE_PROB), subset = ~ I(RESPONSE_STATUS == \"Survey Respondent\") ) ## Obtain quadratic form for the two-phase variance estimator, ## where first phase variance contribution estimated ## using the successive differences estimator ## and second phase variance contribution estimated ## using the Horvitz-Thompson estimator ## (with joint probabilities based on assumption of Poisson sampling) get_design_quad_form( design = twophase_design, variance_estimator = list( \"SD2\", \"Poisson Horvitz-Thompson\" ) ) }"},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"Approximates symmetric, real matrix nearest positive semidefinite matrix Frobenius norm, using method Higham (1988). real, symmetric matrix, equivalent \"zeroing \" negative eigenvalues. See \"Details\" section information.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"","code":"get_nearest_psd_matrix(X)"},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"X symmetric, real matrix missing values.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"nearest positive semidefinite matrix dimension X.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"Let \\(\\) denote symmetric, real matrix positive semidefinite. can form spectral decomposition \\(=\\Gamma \\Lambda \\Gamma^{\\prime}\\), \\(\\Lambda\\) diagonal matrix whose entries eigenvalues \\(\\). method Higham (1988) approximate \\(\\) \\(\\tilde{} = \\Gamma \\Lambda_{+} \\Gamma^{\\prime}\\), \\(ii\\)-th entry \\(\\Lambda_{+}\\) \\(\\max(\\Lambda_{ii}, 0)\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"- Higham, N. J. (1988). \"Computing nearest symmetric positive semidefinite matrix.\" Linear Algebra Applications, 103, 103–118.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"","code":"X <- matrix( c(2, 5, 5, 5, 2, 5, 5, 5, 2), nrow = 3, byrow = TRUE ) get_nearest_psd_matrix(X) #> [,1] [,2] [,3] #> [1,] 4 4 4 #> [2,] 4 4 4 #> [3,] 4 4 4"},{"path":"https://bschneidr.github.io/svrep/reference/getvars.html","id":null,"dir":"Reference","previous_headings":"","what":"Get variables from a database — getvars","title":"Get variables from a database — getvars","text":"database helper function copied 'survey' package","code":""},{"path":"https://bschneidr.github.io/svrep/reference/getvars.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get variables from a database — getvars","text":"","code":"getvars( formula, dbconnection, tables, db.only = TRUE, updates = NULL, subset = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/getvars.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get variables from a database — getvars","text":"formula Either formula character vector giving names variables dbconnection database connection tables Name(s) table(s) pull db.Unclear parameter inherited 'survey' package updates Updates potentially make subset Optional indices data subset returning result","code":""},{"path":"https://bschneidr.github.io/svrep/reference/getvars.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get variables from a database — getvars","text":"data frame","code":""},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"Compute matrix joint inclusion probabilities quadratic form Horvitz-Thompson variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"","code":"ht_matrix_to_joint_probs(ht_quad_form)"},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"ht_quad_form matrix quadratic form representing Horvitz-Thompson variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"matrix joint inclusion probabilities","code":""},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"quadratic form matrix Horvitz-Thompson variance estimator \\(ij\\)-th entry equal \\((1-\\frac{\\pi_i \\pi_j}{\\pi_{ij}})\\). matrix joint probabilties \\(ij\\)-th entry equal \\(\\pi_{ij}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Check whether a matrix is positive semidefinite — is_psd_matrix","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"Check whether matrix positive semidefinite, based checking symmetric negative eigenvalues.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"","code":"is_psd_matrix(X, tolerance = sqrt(.Machine$double.eps))"},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"X matrix missing infinite values. tolerance Tolerance controlling whether tiny computed eigenvalue actually considered negative. Computed negative eigenvalues considered negative less less -abs(tolerance * max(eigen(X)$values)). small nonzero tolerance recommended since eigenvalues nearly always computed floating-point error.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"logical value. TRUE matrix deemed positive semidefinite. Negative otherwise (including X symmetric).","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"","code":"X <- matrix( c(2, 5, 5, 5, 2, 5, 5, 5, 2), nrow = 3, byrow = TRUE ) is_psd_matrix(X) #> [1] FALSE eigen(X)$values #> [1] 12 -3 -3"},{"path":"https://bschneidr.github.io/svrep/reference/libraries.html","id":null,"dir":"Reference","previous_headings":"","what":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","title":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","text":"Data taken complete census public libraries United States FY2020 (April 2020 March 2021). Public Libraries Survey (PLS) annual census public libraries U.S., including public libraries identified state library administrative agencies 50 states, District Columbia, outlying territories American Samoa, Guam, Northern Mariana Islands, U.S. Virgin Islands (Puerto Rico participate FY2020). primary dataset, library_census, represents full microdata census. datasets library_multistage_sample library_stsys_sample samples drawn library_census using different sampling methods.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/libraries.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","text":"","code":"data(library_census) data(library_multistage_sample) data(library_stsys_sample)"},{"path":"https://bschneidr.github.io/svrep/reference/libraries.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","text":"Library Census (library_census): dataset includes 9,245 records (one per library) 23 variables. column variable label, accessible using function var_label() 'labelled' package simply calling attr(x, 'label') given column. data include subset variables included public-use data published PLS, specifically Public Library System Data File. Particularly relevant variables include: Identifier variables survey response status: FSCSKEY: unique identifier libraries. LIBNAME: name library RESPONSE_STATUS: Response status Public Library Survey: indicates whether library respondent, nonrespondent, closed. Numeric summaries: TOTCIR: Total circulation VISITS: Total visitors REGBOR: Total number registered users TOTSTAFF: Total staff (measured full-time equivalent staff) LIBRARIA: Total librarians (measured full-time equivalent staff) TOTOPEXP: Total operating expenses TOTINCM: Total income BRANLIB: Number library branches CENTLIB: Number central library locations Location: LONGITUD: Geocoded longitude (WGS84 CRS) LATITUD: Geocoded latitude (WGS84 CRS) STABR: Two-letter state abbreviation CBSA: Five-digit identifer core-based statistical area (CBSA) MICROF: Flag metropolitan micropolitan statistical area Library Multistage Sample (library_multistage_sample): data represent two-stage sample (PSUs SSUs), first stage sample selected using unequal probability sampling without replacement (PPSWOR) second stage sample selected using simple random sampling without replacement (SRSWOR). Includes variables library_census, additional design variables. PSU_ID: unique identifier primary sampling units SSU_ID: unique identifer secondary sampling units SAMPLING_PROB: Overall inclusion probability PSU_SAMPLING_PROB: Inclusion probability PSU SSU_SAMPLING_PROB: Inclusion probability SSU PSU_POP_SIZE: number PSUs population SSU_POP_SIZE: number population SSUs within PSU Library Stratified Systematic Sample (library_stsys_sample): data represent stratified systematic sample. Includes variables library_census, additional design variables. SAMPLING_STRATUM: Unique identifier sampling strata STRATUM_POP_SIZE: population size stratum SAMPLING_SORT_ORDER: sort order used selecting random systematic sample SAMPLING_PROB: Overall inclusion probability","code":""},{"path":"https://bschneidr.github.io/svrep/reference/libraries.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","text":"Pelczar, M., Soffronoff, J., Nielsen, E., Li, J., & Mabile, S. (2022). Data File Documentation: Public Libraries United States Fiscal Year 2020. Institute Museum Library Services: Washington, D.C.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_pums_microdata.html","id":null,"dir":"Reference","previous_headings":"","what":"ACS PUMS Data for Louisville — lou_pums_microdata","title":"ACS PUMS Data for Louisville — lou_pums_microdata","text":"Person-level microdata American Community Survey (ACS) 2015-2019 public-use microdata sample (PUMS) data Louisville, KY. microdata sample represents adults (persons aged 18 ) Louisville, KY. data include replicate weights use variance estimation.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_pums_microdata.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ACS PUMS Data for Louisville — lou_pums_microdata","text":"","code":"data(lou_pums_microdata)"},{"path":"https://bschneidr.github.io/svrep/reference/lou_pums_microdata.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"ACS PUMS Data for Louisville — lou_pums_microdata","text":"data frame 80 rows 85 variables UNIQUE_ID: Unique identifier records AGE: Age years (copied AGEP variable ACS microdata) RACE_ETHNICITY: Race Hispanic/Latino ethnicity derived RAC1P HISP variables ACS microdata collapsed smaller number categories. SEX: Male Female EDUC_ATTAINMENT: Highest level education attained ('Less high school' 'High school beyond') derived SCHL variable ACS microdata collapsed smaller number categories. PWGTP: Weights full-sample PWGTP1-PWGTP80: 80 columns replicate weights created using Successive Differences Replication (SDR) method.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_pums_microdata.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ACS PUMS Data for Louisville — lou_pums_microdata","text":"","code":"if (FALSE) { data(lou_pums_microdata) # Prepare the data for analysis with the survey package library(survey) lou_pums_rep_design <- survey::svrepdesign( data = lou_pums_microdata, variables = ~ UNIQUE_ID + AGE + SEX + RACE_ETHNICITY + EDUC_ATTAINMENT, weights = ~ PWGTP, repweights = \"PWGTP\\\\d{1,2}\", type = \"successive-difference\", mse = TRUE ) # Estimate population proportions svymean(~ SEX, design = lou_pums_rep_design) }"},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey.html","id":null,"dir":"Reference","previous_headings":"","what":"Louisville Vaccination Survey — lou_vax_survey","title":"Louisville Vaccination Survey — lou_vax_survey","text":"survey measuring Covid-19 vaccination status handful demographic variables, based simple random sample 1,000 residents Louisville, Kentucky approximately 50% response rate. data created using simulation.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Louisville Vaccination Survey — lou_vax_survey","text":"","code":"data(lou_vax_survey)"},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Louisville Vaccination Survey — lou_vax_survey","text":"data frame 1,000 rows 6 variables RESPONSE_STATUS Response status survey ('Respondent' 'Nonrespondent') RACE_ETHNICITY Race Hispanic/Latino ethnicity derived RAC1P HISP variables ACS microdata collapsed smaller number categories. SEX Male Female EDUC_ATTAINMENT Highest level education attained ('Less high school' 'High school beyond') derived SCHL variable ACS microdata collapsed smaller number categories. VAX_STATUS Covid-19 vaccination status ('Vaccinated' 'Unvaccinated') SAMPLING_WEIGHT Sampling weight: equal cases since data come simple random sample","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey_control_totals.html","id":null,"dir":"Reference","previous_headings":"","what":"Control totals for the Louisville Vaccination Survey — lou_vax_survey_control_totals","title":"Control totals for the Louisville Vaccination Survey — lou_vax_survey_control_totals","text":"Control totals use raking post-stratification Louisville Vaccination Survey data. Control totals population size estimates ACS 2015-2019 5-year Public Use Microdata Sample (PUMS) specific demographic categories among adults Jefferson County, KY. data created using simulation.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey_control_totals.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Control totals for the Louisville Vaccination Survey — lou_vax_survey_control_totals","text":"","code":"data(lou_vax_survey_control_totals)"},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey_control_totals.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Control totals for the Louisville Vaccination Survey — lou_vax_survey_control_totals","text":"nested list object two lists, poststratification raking, contains two elements: estimates variance-covariance. poststratification Control totals combination RACE_ETHNICITY, SEX, EDUC_ATTAINMENT. estimates: numeric vector estimated population totals. variance-covariance: variance-covariance matrix estimated population totals. raking Separate control totals RACE_ETHNICITY, SEX, EDUC_ATTAINMENT. estimates: numeric vector estimated population totals. variance-covariance: variance-covariance matrix estimated population totals.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"Creates quadratic form matrix variance estimator balanced samples, proposed Deville Tillé (2005).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"","code":"make_deville_tille_matrix(probs, aux_vars)"},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"probs vector first-order inclusion probabilities aux_vars matrix auxiliary variables, number rows matching number elements probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"symmetric matrix whose dimension matches length probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"See Section 6.8 Tillé (2020) detail estimator, including explanation quadratic form. See Deville Tillé (2005) results simulation study comparing alternative estimators balanced sampling. estimator can written follows: $$ v(\\hat{Y})=\\sum_{k \\S} \\frac{c_k}{\\pi_k^2}\\left(y_k-\\hat{y}_k^*\\right)^2, $$ $$ \\hat{y}_k^*=\\mathbf{z}_k^{\\top}\\left(\\sum_{\\ell \\S} c_{\\ell} \\frac{\\mathbf{z}_{\\ell} \\mathbf{z}_{\\ell}^{\\prime}}{\\pi_{\\ell}^2}\\right)^{-1} \\sum_{\\ell \\S} c_{\\ell} \\frac{\\mathbf{z}_{\\ell} y_{\\ell}}{\\pi_{\\ell}^2} $$ \\(\\mathbf{z}_k\\) denotes vector auxiliary variables observation \\(k\\) included sample \\(S\\), inclusion probability \\(\\pi_k\\). value \\(c_k\\) set \\(\\frac{n}{n-q}(1-\\pi_k)\\), \\(n\\) number observations \\(q\\) number auxiliary variables. See Li, Chen, Krenzke (2014) example estimator's use basis generalized replication estimator. See Breidt Chauvet (2011) discussion alternative simulation-based estimators specific application variance estimation balanced samples selected using cube method.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"- Breidt, F.J. Chauvet, G. (2011). \"Improved variance estimation balanced samples drawn via cube method.\" Journal Statistical Planning Inference, 141, 411-425. - Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. - Li, J., Chen, S., Krenzke, T. (2014). \"Replication Variance Estimation Balanced Sampling: Application PIAAC Study.\" Proceedings Survey Research Methods Section, 2014: 985–994. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf. - Tillé, Y. (2020). \"Sampling estimation finite populations.\" (. Hekimi, Trans.). Wiley.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":null,"dir":"Reference","previous_headings":"","what":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"Generate matrix replication factors using Fay's generalized replication method. method yields fully efficient variance estimator sufficient number replicates used.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"","code":"make_fays_gen_rep_factors( Sigma, max_replicates = Matrix::rankMatrix(Sigma) + 4, balanced = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"Sigma quadratic form matrix corresponding target variance estimator. Must positive semidefinite. max_replicates maximum number replicates allow. function attempt create minimum number replicates needed produce fully-efficient variance estimator. replicates needed max_replicates, full number replicates needed created, random subsample retained. balanced balanced=TRUE, replicates contribute equally variance estimates, number replicates needed may slightly increase.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"matrix replicate factors, number rows matching number rows Sigma number columns less equal max_replicates. calculate variance estimates using factors, use overall scale factor given calling attr(x, \"scale\") result.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"See Fay (1989) full explanation Fay's generalized replication method. documentation provides brief overview. Let \\(\\boldsymbol{\\Sigma}\\) quadratic form matrix target variance estimator, assumed positive semidefinite. Suppose rank \\(\\boldsymbol{\\Sigma}\\) \\(k\\), \\(\\boldsymbol{\\Sigma}\\) can represented spectral decomposition \\(k\\) eigenvectors eigenvalues, \\(r\\)-th eigenvector eigenvalue denoted \\(\\mathbf{v}_{(r)}\\) \\(\\lambda_r\\), respectively. $$ \\boldsymbol{\\Sigma} = \\sum_{r=1}^k \\lambda_r \\mathbf{v}_{(r)} \\mathbf{v^{\\prime}}_{(r)} $$ balanced = FALSE, let \\(\\mathbf{H}\\) denote identity matrix \\(k' = k\\) rows/columns. balanced = TRUE, let \\(\\mathbf{H}\\) Hadamard matrix (entries equal \\(1\\) \\(-1\\)), order \\(k^{\\prime} \\geq k\\). Let \\(\\mathbf{H}_{mr}\\) denote entry row \\(m\\) column \\(r\\) \\(\\mathbf{H}\\). \\(k^{\\prime}\\) replicates formed follows. Let \\(r\\) denote given replicate, \\(r = 1, ..., k^{\\prime}\\), let \\(c\\) denote positive constant (yet specified). \\(r\\)-th replicate adjustment factor \\(\\mathbf{f}_{r}\\) formed : $$ \\mathbf{f}_{r} = 1 + c \\sum_{m=1}^k H_{m r} \\lambda_{(m)}^{\\frac{1}{2}} \\mathbf{v}_{(m)} $$ balanced = FALSE, \\(c = 1\\). balanced = TRUE, \\(c = \\frac{1}{\\sqrt{k^{\\prime}}}\\). replicates negative, can use rescale_reps, recalculates replicate factors smaller value \\(c\\). \\(k^{\\prime}\\) replicates used, variance estimates calculated : $$ v_{rep}\\left(\\hat{T}_y\\right) = \\sum_{r=1}^{k^{\\prime}}\\left(\\hat{T}_y^{*(r)}-\\hat{T}_y\\right)^2 $$ population totals, replication variance estimator exactly match target variance estimator number replicates \\(k^{\\prime}\\) matches rank \\(\\Sigma\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"the-number-of-replicates","dir":"Reference","previous_headings":"","what":"The Number of Replicates","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"balanced=TRUE, number replicates created may need increase slightly. due fact Hadamard matrix order \\(k^{\\prime} \\geq k\\) used balance replicates, may necessary use order \\(k^{\\prime} > k\\). number replicates \\(k^{\\prime}\\) large practical purposes, one can simply retain random subset \\(R\\) \\(k^{\\prime}\\) replicates. case, variances calculated follows: $$ v_{rep}\\left(\\hat{T}_y\\right) = \\frac{k^{\\prime}}{R} \\sum_{r=1}^{R}\\left(\\hat{T}_y^{*(r)}-\\hat{T}_y\\right)^2 $$ happens max_replicates less matrix rank Sigma: random subset created replicates retained. Subsampling replicates recommended using balanced=TRUE, since case every replicate contributes equally variance estimates. balanced=FALSE, randomly subsampling replicates valid may produce large variation variance estimates since replicates case may vary greatly contribution variance estimates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"balanced=TRUE, Hadamard matrix used described . Hadamard matrix deterministically created using function hadamard() 'survey' package. However, order rows/columns randomly permuted forming replicates. general, column-ordering replicate weights random. ensure exact reproducibility, recommended call set.seed() using function.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"Fay, Robert. 1989. \"Theory Application Replicate Weighting Variance Calculations.\" , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"","code":"if (FALSE) { library(survey) # Load an example dataset that uses unequal probability sampling ---- data('election', package = 'survey') # Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ---- n <- nrow(election_pps) pi <- election_jointprob horvitz_thompson_matrix <- matrix(nrow = n, ncol = n) for (i in seq_len(n)) { for (j in seq_len(n)) { horvitz_thompson_matrix[i,j] <- 1 - (pi[i,i] * pi[j,j])/pi[i,j] } } ## Equivalently: horvitz_thompson_matrix <- make_quad_form_matrix( variance_estimator = \"Horvitz-Thompson\", joint_probs = election_jointprob ) # Make generalized replication adjustment factors ---- adjustment_factors <- make_fays_gen_rep_factors( Sigma = horvitz_thompson_matrix, max_replicates = 50 ) attr(adjustment_factors, 'scale') # Compute the Horvitz-Thompson estimate and the replication estimate ht_estimate <- svydesign(data = election_pps, ids = ~ 1, prob = diag(election_jointprob), pps = ppsmat(election_jointprob)) |> svytotal(x = ~ Kerry) rep_estimate <- svrepdesign( data = election_pps, weights = ~ wt, repweights = adjustment_factors, combined.weights = FALSE, scale = attr(adjustment_factors, 'scale'), rscales = rep(1, times = ncol(adjustment_factors)), type = \"other\", mse = TRUE ) |> svytotal(x = ~ Kerry) SE(rep_estimate) SE(ht_estimate) SE(rep_estimate) / SE(ht_estimate) }"},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":null,"dir":"Reference","previous_headings":"","what":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"Creates replicate factors generalized survey bootstrap method. generalized survey bootstrap method forming bootstrap replicate weights textbook variance estimator, provided variance estimator can represented quadratic form whose matrix positive semidefinite (covers large class variance estimators).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"","code":"make_gen_boot_factors(Sigma, num_replicates, tau = \"auto\", exact_vcov = FALSE)"},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"Sigma matrix quadratic form used represent variance estimator. Must positive semidefinite. num_replicates number bootstrap replicates create. tau Either \"auto\", single number. rescaling constant used avoid negative weights transformation \\(\\frac{w + \\tau - 1}{\\tau}\\), \\(w\\) original weight \\(\\tau\\) rescaling constant tau. tau=\"auto\", rescaling factor determined automatically follows: adjustment factors nonnegative, tau set equal 1; otherwise, tau set smallest value needed rescale adjustment factors least 0.01. exact_vcov exact_vcov=TRUE, replicate factors generated variance-covariance matrix exactly matches target variance estimator's quadratic form (within numeric precision). desirable causes variance estimates totals closely match values target variance estimator. requires num_replicates exceeds rank Sigma. replicate factors generated applying PCA-whitening collection draws multivariate Normal distribution, applying coloring transformation whitened collection draws.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"matrix number rows Sigma, number columns equal num_replicates. object attribute named tau can retrieved calling attr(= 'tau') object. value tau rescaling factor used avoid negative weights. addition, object attributes named scale rscales can passed directly svrepdesign. Note value scale \\(\\tau^2/B\\), value rscales vector length \\(B\\), every entry equal \\(1\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"Let \\(v( \\hat{T_y})\\) textbook variance estimator estimated population total \\(\\hat{T}_y\\) variable \\(y\\). base weight case \\(\\) sample \\(w_i\\), let \\(\\breve{y}_i\\) denote weighted value \\(w_iy_i\\). Suppose can represent textbook variance estimator quadratic form: \\(v(\\hat{T}_y) = \\breve{y}\\Sigma\\breve{y}^T\\), \\(n \\times n\\) matrix \\(\\Sigma\\). constraint \\(\\Sigma\\) , sample, must symmetric positive semidefinite. bootstrapping process creates \\(B\\) sets replicate weights, \\(b\\)-th set replicate weights vector length \\(n\\) denoted \\(\\mathbf{}^{(b)}\\), whose \\(k\\)-th value denoted \\(a_k^{(b)}\\). yields \\(B\\) replicate estimates population total, \\(\\hat{T}_y^{*(b)}=\\sum_{k \\s} a_k^{(b)} \\breve{y}_k\\), \\(b=1, \\ldots B\\), can used estimate sampling variance. $$ v_B\\left(\\hat{T}_y\\right)=\\frac{\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2}{B} $$ bootstrap variance estimator can written quadratic form: $$ v_B\\left(\\hat{T}_y\\right) =\\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}} $$ $$ \\boldsymbol{\\Sigma}_B = \\frac{\\sum_{b=1}^B\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)^{\\prime}}{B} $$ Note vector adjustment factors \\(\\mathbf{}^{(b)}\\) expectation \\(\\mathbf{1}_n\\) variance-covariance matrix \\(\\boldsymbol{\\Sigma}\\), bootstrap expectation \\(E_{*}\\left( \\boldsymbol{\\Sigma}_B \\right) = \\boldsymbol{\\Sigma}\\). Since bootstrap process takes sample values \\(\\breve{y}\\) fixed, bootstrap expectation variance estimator \\(E_{*} \\left( \\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}}\\right)= \\mathbf{\\breve{y}}^{\\prime}\\Sigma \\mathbf{\\breve{y}}\\). Thus, can produce bootstrap variance estimator expectation textbook variance estimator simply randomly generating \\(\\mathbf{}^{(b)}\\) distribution following two conditions: Condition 1: \\(\\quad \\mathbf{E}_*(\\mathbf{})=\\mathbf{1}_n\\) Condition 2: \\(\\quad \\mathbf{E}_*\\left(\\mathbf{}-\\mathbf{1}_n\\right)\\left(\\mathbf{}-\\mathbf{1}_n\\right)^{\\prime}=\\mathbf{\\Sigma}\\) multiple ways generate adjustment factors satisfying conditions, simplest general method simulate multivariate normal distribution: \\(\\mathbf{} \\sim MVN(\\mathbf{1}_n, \\boldsymbol{\\Sigma})\\). method used function.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"details-on-rescaling-to-avoid-negative-adjustment-factors","dir":"Reference","previous_headings":"","what":"Details on Rescaling to Avoid Negative Adjustment Factors","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"Let \\(\\mathbf{} = \\left[ \\mathbf{}^{(1)} \\cdots \\mathbf{}^{(b)} \\cdots \\mathbf{}^{(B)} \\right]\\) denote \\((n \\times B)\\) matrix bootstrap adjustment factors. eliminate negative adjustment factors, Beaumont Patak (2012) propose forming rescaled matrix nonnegative replicate factors \\(\\mathbf{}^S\\) rescaling adjustment factor \\(a_k^{(b)}\\) follows: $$ a_k^{S,(b)} = \\frac{a_k^{(b)} + \\tau - 1}{\\tau} $$ \\(\\tau \\geq 1 - a_k^{(b)} \\geq 1\\) \\(k\\) \\(\\left\\{ 1,\\ldots,n \\right\\}\\) \\(b\\) \\(\\left\\{1, \\ldots, B\\right\\}\\). value \\(\\tau\\) can set based realized adjustment factor matrix \\(\\mathbf{}\\) choosing \\(\\tau\\) prior generating adjustment factor matrix \\(\\mathbf{}\\) \\(\\tau\\) likely large enough prevent negative bootstrap weights. adjustment factors rescaled manner, important adjust scale factor used estimating variance bootstrap replicates, becomes \\(\\frac{\\tau^2}{B}\\) instead \\(\\frac{1}{B}\\). $$ \\textbf{Prior rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{1}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2 $$ $$ \\textbf{rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{\\tau^2}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{S*(b)}-\\hat{T}_y\\right)^2 $$ sharing dataset uses rescaled weights generalized survey bootstrap, documentation dataset instruct user use replication scale factor \\(\\frac{\\tau^2}{B}\\) rather \\(\\frac{1}{B}\\) estimating sampling variances.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"generalized survey bootstrap first proposed Bertail Combris (1997). See Beaumont Patak (2012) clear overview generalized survey bootstrap. generalized survey bootstrap represents one strategy forming replication variance estimators general framework proposed Fay (1984) Dippo, Fay, Morganstein (1984). - Beaumont, Jean-François, Zdenek Patak. 2012. “Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.” International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x. - Bertail, Combris. 1997. “Bootstrap Généralisé d’un Sondage.” Annales d’Économie Et de Statistique, . 46: 49. https://doi.org/10.2307/20076068. - Dippo, Cathryn, Robert Fay, David Morganstein. 1984. “Computing Variances Complex Samples Replicate Weights.” , 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf. - Fay, Robert. 1984. “Properties Estimates Variance Based Replication Methods.” , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"","code":"if (FALSE) { library(survey) # Load an example dataset that uses unequal probability sampling ---- data('election', package = 'survey') # Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ---- n <- nrow(election_pps) pi <- election_jointprob horvitz_thompson_matrix <- matrix(nrow = n, ncol = n) for (i in seq_len(n)) { for (j in seq_len(n)) { horvitz_thompson_matrix[i,j] <- 1 - (pi[i,i] * pi[j,j])/pi[i,j] } } ## Equivalently: horvitz_thompson_matrix <- make_quad_form_matrix( variance_estimator = \"Horvitz-Thompson\", joint_probs = election_jointprob ) # Make generalized bootstrap adjustment factors ---- bootstrap_adjustment_factors <- make_gen_boot_factors( Sigma = horvitz_thompson_matrix, num_replicates = 80, tau = 'auto' ) # Determine replication scale factor for variance estimation ---- tau <- attr(bootstrap_adjustment_factors, 'tau') B <- ncol(bootstrap_adjustment_factors) replication_scaling_constant <- tau^2 / B # Create a replicate design object ---- election_pps_bootstrap_design <- svrepdesign( data = election_pps, weights = 1 / diag(election_jointprob), repweights = bootstrap_adjustment_factors, combined.weights = FALSE, type = \"other\", scale = attr(bootstrap_adjustment_factors, 'scale'), rscales = attr(bootstrap_adjustment_factors, 'rscales') ) # Compare estimates to Horvitz-Thompson estimator ---- election_pps_ht_design <- svydesign( id = ~1, fpc = ~p, data = election_pps, pps = ppsmat(election_jointprob), variance = \"HT\" ) svytotal(x = ~ Bush + Kerry, design = election_pps_bootstrap_design) svytotal(x = ~ Bush + Kerry, design = election_pps_ht_design) }"},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"Several variance estimators designs use unequal probability sampling without replacement (.e., PPSWOR), variance estimation tends accurate using approximation estimator uses first-order inclusion probabilities (.e., basic sampling weights) ignores joint inclusion probabilities. function returns matrix quadratic form used represent variance estimators.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"","code":"make_ppswor_approx_matrix(probs, method = \"Deville-1\")"},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"probs vector first-order inclusion probabilities method string specifying approximation method use. See \"Details\" section . Options include: \"Deville-1\" \"Deville-2\"","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"symmetric matrix whose dimension matches length probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"variance estimators shown effective designs use fixed sample size high-entropy sampling method. includes PPSWOR sampling methods, unequal-probability systematic sampling important exception. variance estimators generally take following form: $$ \\hat{v}(\\hat{Y}) = \\sum_{=1}^{n} c_i (\\breve{y}_i - \\frac{1}{\\sum_{=k}^{n}c_k}\\sum_{k=1}^{n}c_k \\breve{y}_k)^2 $$ \\(\\breve{y}_i = y_i/\\pi_i\\) weighted value variable interest, \\(c_i\\) constants depend approximation method used. matrix quadratic form, denoted \\(\\Sigma\\), \\(ij\\)-th entry defined follows: $$ \\sigma_{ii} = c_i (1 - \\frac{c_i}{\\sum_{k=1}^{n}c_k}) \\textit{ } = j \\\\ \\sigma_{ij}=\\frac{-c_i c_j}{\\sum_{k=1}^{n}c_k} \\textit{ } \\neq j \\\\ $$ \\(\\pi_{} = 1\\) every unit, \\(\\sigma_{ij}=0\\) \\(,j\\). one sampling unit, \\(\\sigma_{11}=0\\); , unit treated sampled certainty. constants \\(c_i\\) defined approximation method follows, names taken directly Matei Tillé (2005). \"Deville-1\": $$c_i=\\left(1-\\pi_i\\right) \\frac{n}{n-1}$$ \"Deville-2\": $$c_i = (1-\\pi_i) \\left[1 - \\sum_{k=1}^{n} \\left(\\frac{1-\\pi_k}{\\sum_{k=1}^{n}(1-\\pi_k)}\\right)^2 \\right]^{-1}$$ approximations \"Deville-1\" \"Deville-2\" shown simulation studies Matei Tillé (2005) perform much better terms MSE compared strictly-unbiased Horvitz-Thompson Yates-Grundy variance estimators. case simple random sampling without replacement (SRSWOR), estimators identical usual Horvitz-Thompson variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"Matei, Alina, Yves Tillé. 2005. “Evaluation Variance Approximations Estimators Maximum Entropy Sampling Unequal Probability Fixed Sample Size.” Journal Official Statistics 21(4):543–70.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"Common variance estimators estimated population totals can represented quadratic form. Given choice variance estimator information sample design, function constructs matrix quadratic form. notation, let \\(v(\\hat{Y}) = \\mathbf{\\breve{y}}^{\\prime}\\mathbf{\\Sigma}\\mathbf{\\breve{y}}\\), \\(\\breve{y}\\) vector weighted values, \\(y_i/\\pi_i, \\space =1,\\dots,n\\). function constructs \\(n \\times n\\) matrix quadratic form, \\(\\mathbf{\\Sigma}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"","code":"make_quad_form_matrix( variance_estimator = \"Yates-Grundy\", probs = NULL, joint_probs = NULL, cluster_ids = NULL, strata_ids = NULL, strata_pop_sizes = NULL, sort_order = NULL, aux_vars = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"variance_estimator name variance estimator whose quadratic form matrix created. See section \"Variance Estimators\" . Options include: \"Yates-Grundy\": Yates-Grundy variance estimator based first-order second-order inclusion probabilities. used, argument joint_probs must also used. \"Horvitz-Thompson\": Horvitz-Thompson variance estimator based first-order second-order inclusion probabilities. used, argument joint_probs must also used. \"Stratified Multistage SRS\": usual stratified multistage variance estimator based estimating variance cluster totals within strata stage. option used, necessary also use arguments strata_ids, cluster_ids, strata_pop_sizes, strata_pop_sizes. \"Ultimate Cluster\": usual variance estimator based estimating variance first-stage cluster totals within first-stage strata. option used, necessary also use arguments strata_ids, cluster_ids, strata_pop_sizes. Optionally, use finite population correction factors, one can also use argument strata_pop_sizes. \"Deville-1\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 1\". option used, necessary also use arguments strata_ids, cluster_ids, probs. \"Deville-2\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 2\". option used, necessary also use arguments strata_ids, cluster_ids, probs. \"SD1\": non-circular successive-differences variance estimator described Ash (2014), sometimes used variance estimation systematic sampling. \"SD2\": circular successive-differences variance estimator described Ash (2014). estimator basis \"successive-differences replication\" estimator commonly used variance estimation systematic sampling. \"Deville-Tille\": estimator Deville Tillé (2005), developed balanced sampling using cube method. probs Required variance_estimator equals \"Deville-1\", \"Deville-2\", \"Breidt-Chauvet\". matrix data frame sampling probabilities. multiple stages sampling, probs can multiple columns, one column level sampling accounted variance estimator. joint_probs used variance_estimator = \"Horvitz-Thompson\" variance_estimator = \"Yates-Grundy\". matrix joint inclusion probabilities. Element [,] matrix first-order inclusion probability unit , element [,j] joint inclusion probability units j. cluster_ids Required unless variance_estimator equals \"Horvitz-Thompson\" \"Yates-Grundy\". matrix data frame cluster IDs. multiple stages sampling, cluster_ids can multiple columns, one column level sampling accounted variance estimator. strata_ids Required variance_estimator equals \"Stratified Multistage SRS\" \"Ultimate Cluster\". matrix data frame strata IDs. multiple stages sampling, strata_ids can multiple columns, one column level sampling accounted variance estimator. strata_pop_sizes Required variance_estimator equals \"Stratified Multistage SRS\", can optionally used variance_estimator equals \"Ultimate Cluster\", \"SD1\", \"SD2\". multiple stages sampling, strata_pop_sizes can multiple columns, one column level sampling accounted variance estimator. sort_order Required variance_estimator equals \"SD1\" \"SD2\". vector orders rows data order used sampling. aux_vars Required variance_estimator equals \"Deville-Tille\". matrix auxiliary variables.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"matrix quadratic form representing variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"variance-estimators","dir":"Reference","previous_headings":"","what":"Variance Estimators","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"See variance-estimators description variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"arguments-required-for-each-variance-estimator","dir":"Reference","previous_headings":"","what":"Arguments required for each variance estimator","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"arguments required optional variance estimator.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"","code":"if (FALSE) { # Example 1: The Horvitz-Thompson Estimator library(survey) data(\"election\", package = \"survey\") ht_quad_form_matrix <- make_quad_form_matrix(variance_estimator = \"Horvitz-Thompson\", joint_probs = election_jointprob) ##_ Produce variance estimate wtd_y <- as.matrix(election_pps$wt * election_pps$Bush) t(wtd_y) %*% ht_quad_form_matrix %*% wtd_y ##_ Compare against result from 'survey' package svytotal(x = ~ Bush, design = svydesign(data=election_pps, variance = \"HT\", pps = ppsmat(election_jointprob), ids = ~ 1, fpc = ~ p)) |> vcov() # Example 2: Stratified multistage Sample ---- data(\"mu284\", package = 'survey') multistage_srswor_design <- svydesign(data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2) multistage_srs_quad_form <- make_quad_form_matrix( variance_estimator = \"Stratified Multistage SRS\", cluster_ids = mu284[,c('id1', 'id2')], strata_ids = matrix(1, nrow = nrow(mu284), ncol = 2), strata_pop_sizes = mu284[,c('n1', 'n2')] ) wtd_y <- as.matrix(weights(multistage_srswor_design) * mu284$y1) t(wtd_y) %*% multistage_srs_quad_form %*% wtd_y ##_ Compare against result from 'survey' package svytotal(x = ~ y1, design = multistage_srswor_design) |> vcov() # Example 3: Successive-differences estimator ---- data('library_stsys_sample', package = 'svrep') sd1_quad_form <- make_quad_form_matrix( variance_estimator = 'SD1', cluster_ids = library_stsys_sample[,'FSCSKEY',drop=FALSE], strata_ids = library_stsys_sample[,'SAMPLING_STRATUM',drop=FALSE], strata_pop_sizes = library_stsys_sample[,'STRATUM_POP_SIZE',drop=FALSE], sort_order = library_stsys_sample[['SAMPLING_SORT_ORDER']] ) wtd_y <- as.matrix(library_stsys_sample[['TOTCIR']] / library_stsys_sample$SAMPLING_PROB) wtd_y[is.na(wtd_y)] <- 0 t(wtd_y) %*% sd1_quad_form %*% wtd_y # Example 4: Deville estimators ---- data('library_multistage_sample', package = 'svrep') deville_quad_form <- make_quad_form_matrix( variance_estimator = 'Deville-1', cluster_ids = library_multistage_sample[,c(\"PSU_ID\", \"SSU_ID\")], strata_ids = cbind(rep(1, times = nrow(library_multistage_sample)), library_multistage_sample$PSU_ID), probs = library_multistage_sample[,c(\"PSU_SAMPLING_PROB\", \"SSU_SAMPLING_PROB\")] ) }"},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":null,"dir":"Reference","previous_headings":"","what":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"Creates bootstrap replicate weights multistage stratified sample design using method Beaumont Émond (2022), generalization Rao-Wu-Yue bootstrap. design may different sampling methods used different stages. stage sampling may potentially use unequal probabilities (without replacement) may potentially use Poisson sampling.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"","code":"make_rwyb_bootstrap_weights( num_replicates = 100, samp_unit_ids, strata_ids, samp_unit_sel_probs, samp_method_by_stage = rep(\"PPSWOR\", times = ncol(samp_unit_ids)), allow_final_stage_singletons = TRUE, output = \"weights\" )"},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"num_replicates Positive integer giving number bootstrap replicates create samp_unit_ids Matrix data frame sampling unit IDs stage sampling strata_ids Matrix data frame strata IDs sampling unit stage sampling samp_unit_sel_probs Matrix data frame selection probabilities sampling unit stage sampling. samp_method_by_stage vector length equal number stages sampling, corresponding number columns samp_unit_ids. describes method sampling used stage. element one following: \"SRSWOR\" - Simple random sampling, without replacement \"SRSWR\" - Simple random sampling, replacement \"PPSWOR\" - Unequal probabilities selection, without replacement \"PPSWR\" - Unequal probabilities selection, replacement \"Poisson\" - Poisson sampling: sampling unit selected sample , potentially different probabilities inclusion sampling unit. allow_final_stage_singletons Logical value indicating whether allow non-certainty singleton strata final sampling stage (rather throw error message). TRUE, sampling unit non-certainty singleton stratum final-stage adjustment factor calculated selected certainty final stage (.e., adjustment factor 1), final bootstrap weight calculated combining adjustment factor final-stage selection probability. output Either \"weights\" (default) \"factors\". Specifying output = \"factors\" returns matrix replicate adjustment factors can later multiplied full-sample weights produce matrix replicate weights. Specifying output = \"weights\" returns matrix replicate weights, full-sample weights inferred using samp_unit_sel_probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"matrix number rows samp_unit_ids number columns equal value argument num_replicates. Specifying output = \"factors\" returns matrix replicate adjustment factors can later multiplied full-sample weights produce matrix replicate weights. Specifying output = \"weights\" returns matrix replicate weights, full-sample weights inferred using samp_unit_sel_probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"Beaumont Émond (2022) describe general algorithm forming bootstrap replicate weights multistage stratified samples, based method Rao-Wu-Yue, extensions sampling without replacement use unequal probabilities selection (.e., sampling probability proportional size) well Poisson sampling. methods guaranteed produce nonnegative replicate weights provide design-unbiased design-consistent variance estimates totals, designs sampling uses one following methods: \"SRSWOR\" - Simple random sampling, without replacement \"SRSWR\" - Simple random sampling, replacement \"PPSWR\" - Unequal probabilities selection, replacement \"Poisson\" - Poisson sampling: sampling unit selected sample , potentially different probabilities inclusion sampling unit. designs least one stage's strata sampling without replacement unequal probabilities selection (\"PPSWOR\"), bootstrap method Beaumont Émond (2022) guaranteed produce nonnegative weights, design-unbiased, since method approximates joint selection probabilities needed unbiased estimation. Unless stages use simple random sampling without replacement, resulting bootstrap replicate weights guaranteed strictly positive, may useful calibration analyses domains small sample sizes. stages use simple random sampling without replacement, possible replicate weights zero. survey nonresponse, may useful represent response/nonresponse additional stage sampling, sampling conducted Poisson sampling unit's \"selection probability\" stage response propensity (typically estimated).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"Beaumont, J.-F.; Émond, N. (2022). \"Bootstrap Variance Estimation Method Multistage Sampling Two-Phase Sampling Poisson Sampling Used Second Phase.\" Stats, 5: 339–357. https://doi.org/10.3390/stats5020019 Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). \"recent work resampling methods complex surveys.\" Surv. Methodol., 18: 209–217.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"","code":"if (FALSE) { library(survey) # Example 1: A multistage sample with two stages of SRSWOR ## Load an example dataset from a multistage sample, with two stages of SRSWOR data(\"mu284\", package = 'survey') multistage_srswor_design <- svydesign(data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2) ## Create bootstrap replicate weights set.seed(2022) bootstrap_replicate_weights <- make_rwyb_bootstrap_weights( num_replicates = 5000, samp_unit_ids = multistage_srswor_design$cluster, strata_ids = multistage_srswor_design$strata, samp_unit_sel_probs = multistage_srswor_design$fpc$sampsize / multistage_srswor_design$fpc$popsize, samp_method_by_stage = c(\"SRSWOR\", \"SRSWOR\") ) ## Create a replicate design object with the survey package bootstrap_rep_design <- svrepdesign( data = multistage_srswor_design$variables, repweights = bootstrap_replicate_weights, weights = weights(multistage_srswor_design, type = \"sampling\"), type = \"bootstrap\" ) ## Compare std. error estimates from bootstrap versus linearization data.frame( 'Statistic' = c('total', 'mean', 'median'), 'SE (bootstrap)' = c(SE(svytotal(x = ~ y1, design = bootstrap_rep_design)), SE(svymean(x = ~ y1, design = bootstrap_rep_design)), SE(svyquantile(x = ~ y1, quantile = 0.5, design = bootstrap_rep_design))), 'SE (linearization)' = c(SE(svytotal(x = ~ y1, design = multistage_srswor_design)), SE(svymean(x = ~ y1, design = multistage_srswor_design)), SE(svyquantile(x = ~ y1, quantile = 0.5, design = multistage_srswor_design))), check.names = FALSE ) # Example 2: A single-stage sample selected with unequal probabilities, without replacement ## Load an example dataset of U.S. counties states with 2004 Presidential vote counts data(\"election\", package = 'survey') pps_wor_design <- svydesign(data = election_pps, pps = \"overton\", fpc = ~ p, # Inclusion probabilities ids = ~ 1) ## Create bootstrap replicate weights set.seed(2022) bootstrap_replicate_weights <- make_rwyb_bootstrap_weights( num_replicates = 5000, samp_unit_ids = pps_wor_design$cluster, strata_ids = pps_wor_design$strata, samp_unit_sel_probs = pps_wor_design$prob, samp_method_by_stage = c(\"PPSWOR\") ) ## Create a replicate design object with the survey package bootstrap_rep_design <- svrepdesign( data = pps_wor_design$variables, repweights = bootstrap_replicate_weights, weights = weights(pps_wor_design, type = \"sampling\"), type = \"bootstrap\" ) ## Compare std. error estimates from bootstrap versus linearization data.frame( 'Statistic' = c('total', 'mean'), 'SE (bootstrap)' = c(SE(svytotal(x = ~ Bush, design = bootstrap_rep_design)), SE(svymean(x = ~ I(Bush/votes), design = bootstrap_rep_design))), 'SE (Overton\\'s PPS approximation)' = c(SE(svytotal(x = ~ Bush, design = pps_wor_design)), SE(svymean(x = ~ I(Bush/votes), design = pps_wor_design))), check.names = FALSE ) }"},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"successive-difference variance estimator can represented quadratic form. function determines matrix quadratic form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"","code":"make_sd_matrix(n, f = 0, type = \"SD1\")"},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"n Number rows columns matrix f single number 0 1, representing sampling fraction. Default value 0. type Either \"SD1\" \"SD2\". See \"Details\" section definitions.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"matrix dimension n","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"Ash (2014) describes estimator follows: $$ \\hat{v}_{SD1}(\\hat{Y}) = (1-f) \\frac{n}{2(n-1)} \\sum_{k=2}^n\\left(\\breve{y}_k-\\breve{y}_{k-1}\\right)^2 $$ $$ \\hat{v}_{SD2}(\\hat{Y}) = \\frac{1}{2}(1-f)\\left[\\sum_{k=2}^n\\left(\\breve{y}_k-\\breve{y}_{k-1}\\right)^2+\\left(\\breve{y}_n-\\breve{y}_1\\right)^2\\right] $$ \\(\\breve{y}_k\\) weighted value \\(y_k/\\pi_k\\) unit \\(k\\) selection probability \\(\\pi_k\\), \\(f\\) sampling fraction \\(\\frac{n}{N}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"usual variance estimator simple random sampling without replacement can represented quadratic form. function determines matrix quadratic form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"","code":"make_srswor_matrix(n, f = 0)"},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"n Sample size f single number 0 1, representing sampling fraction. Default value 0.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"symmetric matrix dimension n","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"basic variance estimator total simple random sampling without replacement follows: $$ \\hat{v}(\\hat{Y}) = (1 - f)\\frac{n}{n - 1} \\sum_{=1}^{n} (y_i - \\bar{y})^2 $$ \\(f\\) sampling fraction \\(\\frac{n}{N}\\). \\(f=0\\), matrix quadratic form non-diagonal elements equal \\(-(n-1)^{-1}\\), diagonal elements equal \\(1\\). \\(f > 0\\), element multiplied \\((1-f)\\). \\(n=1\\), function returns \\(1 \\times 1\\) matrix whose sole element equals \\(0\\) (essentially treating sole sampled unit selection made probability \\(1\\)).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":null,"dir":"Reference","previous_headings":"","what":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"function combines quadratic forms phase two phase design, combined variance entire two-phase sampling design can estimated.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"","code":"make_twophase_quad_form( sigma_1, sigma_2, phase_2_joint_probs, ensure_psd = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"sigma_1 quadratic form first phase variance estimator, subsetted include cases selected phase two sample. sigma_2 quadratic form second phase variance estimator, conditional selection first phase sample. phase_2_joint_probs matrix conditional joint inclusion probabilities second phase, given selected first phase sample. ensure_psd TRUE (default), ensures result positive semidefinite matrix. necessary quadratic form used input replication methods generalized bootstrap. details, see help section entitled \"Ensuring Result Positive Semidefinite\".","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"quadratic form matrix can used estimate sampling variance two-phase sample design.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"two-phase variance estimator quadratic form matrix \\(\\boldsymbol{\\Sigma}_{ab}\\) given : $$ \\boldsymbol{\\Sigma}_{ab} = {W}^{-1}_b(\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b ){W}^{-1}_b + \\boldsymbol{\\Sigma}_b $$ first term estimates variance contribution first phase sampling, second term estimates variance contribution second phase sampling. full quadratic form variance estimator : $$ v(\\hat{t_y}) = \\breve{\\breve{y^{'}}} \\boldsymbol{\\Sigma}_{ab} \\breve{\\breve{y}} $$ weighted variable \\(\\breve{\\breve{y}}_k = \\frac{y_k}{\\pi_{ak}\\pi_{bk}}\\), formed using first phase inclusion probability, denoted \\(\\pi_{ak}\\), conditional second phase inclusion probability (given selected first phase sample), denoted \\(\\pi_{bk}\\). notation estimator follows: \\(n_a\\) denotes first phase sample size. \\(n_b\\) denotes second phase sample size. \\(\\boldsymbol{\\Sigma}_a\\) denotes matrix dimension \\(n_a \\times n_a\\) representing quadratic form variance estimator used full first-phase design. \\(\\boldsymbol{\\Sigma}_{^\\prime}\\) denotes matrix dimension \\(n_b \\times n_b\\) formed subsetting rows columns \\(\\boldsymbol{\\Sigma}_a\\) include cases selected second-phase sample. \\(\\boldsymbol{\\Sigma}_{b}\\) denotes matrix dimension \\(n_b \\times n_b\\) representing Horvitz-Thompson estimator variance second-phase sample, conditional selected first-phase sample. \\(\\boldsymbol{D}_b\\) denotes \\(n_b \\times n_b\\) matrix weights formed inverses second-phase joint inclusion probabilities, element \\(kl\\) equal \\(\\pi_{bkl}^{-1}\\), \\(\\pi_{bkl}\\) conditional probability units \\(k\\) \\(l\\) included second-phase sample, given selected first-phase sample. Note matrix often positive semidefinite, two-phase variance estimator quadratic form necessarily positive semidefinite. \\(\\boldsymbol{W}_b\\) denotes diagonal \\(n_b \\times n_b\\) matrix whose \\(k\\)-th diagonal entry second-phase weight \\(\\pi_{bk}^{-1}\\), \\(\\pi_{bk}\\) conditional probability unit \\(k\\) included second-phase sample, given selected first-phase sample.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"ensuring-the-result-is-positive-semidefinite","dir":"Reference","previous_headings":"","what":"Ensuring the Result is Positive Semidefinite","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"Note matrix \\((\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b )\\) may positive semidefinite, since matrix \\(D_b\\) guaranteed positive semidefinite. \\((\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b )\\) found positive semidefinite, approximated nearest positive semidefinite matrix Frobenius norm, using method Higham (1988). approximation discussed Beaumont Patak (2012) context forming replicate weights two-phase samples. authors argue approximation lead small overestimation variance. Since \\((\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b )\\) real, symmetric matrix, equivalent \"zeroing \" negative eigenvalues. precise, denote \\(=(\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b )\\). can form spectral decomposition \\(=\\Gamma \\Lambda \\Gamma^{\\prime}\\), \\(\\Lambda\\) diagonal matrix whose entries eigenvalues \\(\\). method Higham (1988) approximate \\(\\) \\(\\tilde{} = \\Gamma \\Lambda_{+} \\Gamma^{\\prime}\\), \\(ii\\)-th entry \\(\\Lambda_{+}\\) \\(\\max(\\Lambda_{ii}, 0)\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"See Section 7.5 Tillé (2020) Section 9.3 Särndal, Swensson, Wretman (1992) overview variance estimation two-phase sampling. case Horvitz-Thompson variance estimator used phases, method used function equivalent equation (9.3.8) Särndal, Swensson, Wretman (1992) equation (7.7) Tillé (2020). However, function can used combination first-phase second-phase variance estimators, provided joint inclusion probabilities second-phase design available nonzero. Beaumont, Jean-François, Zdenek Patak. (2012). “Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.” International Statistical Review 80 (1): 127–48. Higham, N. J. (1988). \"Computing nearest symmetric positive semidefinite matrix.\" Linear Algebra Applications, 103, 103–118. Särndal, C.-E., Swensson, B., & Wretman, J. (1992). \"Model Assisted Survey Sampling.\" Springer New York. Tillé, Y. (2020). \"Sampling estimation finite populations.\" (. Hekimi, Trans.). Wiley.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"","code":"if (FALSE) { ## ---------------------- Example 1 ------------------------## ## First phase is a stratified multistage sample ## ## Second phase is a simple random sample ## ##----------------------------------------------------------## data('library_multistage_sample', package = 'svrep') # Load first-phase sample twophase_sample <- library_multistage_sample # Select second-phase sample set.seed(2022) twophase_sample[['SECOND_PHASE_SELECTION']] <- sampling::srswor( n = 100, N = nrow(twophase_sample) ) |> as.logical() # Declare survey design twophase_design <- twophase( method = \"full\", data = twophase_sample, # Identify the subset of first-phase elements # which were selected into the second-phase sample subset = ~ SECOND_PHASE_SELECTION, # Describe clusters, probabilities, and population sizes # at each phase of sampling id = list(~ PSU_ID + SSU_ID, ~ 1), probs = list(~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, NULL), fpc = list(~ PSU_POP_SIZE + SSU_POP_SIZE, NULL) ) # Get quadratic form matrix for the first phase design first_phase_sigma <- get_design_quad_form( design = twophase_design$phase1$full, variance_estimator = \"Stratified Multistage SRS\" ) # Subset to only include cases sampled in second phase first_phase_sigma <- first_phase_sigma[twophase_design$subset, twophase_design$subset] # Get quadratic form matrix for the second-phase design second_phase_sigma <- get_design_quad_form( design = twophase_design$phase2, variance_estimator = \"Ultimate Cluster\" ) # Get second-phase joint probabilities n <- twophase_design$phase2$fpc$sampsize[1,1] N <- twophase_design$phase2$fpc$popsize[1,1] second_phase_joint_probs <- Matrix::Matrix((n/N)*((n-1)/(N-1)), nrow = n, ncol = n) diag(second_phase_joint_probs) <- rep(n/N, times = n) # Get quadratic form for entire two-phase variance estimator twophase_quad_form <- make_twophase_quad_form( sigma_1 = first_phase_sigma, sigma_2 = second_phase_sigma, phase_2_joint_probs = second_phase_joint_probs ) # Use for variance estimation rep_factors <- make_gen_boot_factors( Sigma = twophase_quad_form, num_replicates = 500 ) library(survey) combined_weights <- 1/twophase_design$prob twophase_rep_design <- svrepdesign( data = twophase_sample |> subset(SECOND_PHASE_SELECTION), type = 'other', repweights = rep_factors, weights = combined_weights, combined.weights = FALSE, scale = attr(rep_factors, 'scale'), rscales = attr(rep_factors, 'rscales') ) svymean(x = ~ LIBRARIA, design = twophase_rep_design) ## ---------------------- Example 2 ------------------------## ## First phase is a stratified systematic sample ## ## Second phase is nonresponse, modeled as Poisson sampling ## ##----------------------------------------------------------## data('library_stsys_sample', package = 'svrep') # Determine quadratic form for full first-phase sample variance estimator full_phase1_quad_form <- make_quad_form_matrix( variance_estimator = \"SD2\", cluster_ids = library_stsys_sample[,'FSCSKEY',drop=FALSE], strata_ids = library_stsys_sample[,'SAMPLING_STRATUM',drop=FALSE], strata_pop_sizes = library_stsys_sample[,'STRATUM_POP_SIZE',drop=FALSE], sort_order = library_stsys_sample$SAMPLING_SORT_ORDER ) # Identify cases included in phase two sample # (in this example, respondents) phase2_inclusion <- ( library_stsys_sample$RESPONSE_STATUS == \"Survey Respondent\" ) phase2_sample <- library_stsys_sample[phase2_inclusion,] # Estimate response propensities response_propensities <- glm( data = library_stsys_sample, family = quasibinomial('logit'), formula = phase2_inclusion ~ 1, weights = 1/library_stsys_sample$SAMPLING_PROB ) |> predict(type = \"response\", newdata = phase2_sample) # Estimate conditional joint inclusion probabilities for second phase phase2_joint_probs <- outer(response_propensities, response_propensities) diag(phase2_joint_probs) <- response_propensities # Determine quadratic form for variance estimator of second phase # (Horvitz-Thompson estimator for nonresponse modeled as Poisson sampling) phase2_quad_form <- make_quad_form_matrix( variance_estimator = \"Horvitz-Thompson\", joint_probs = phase2_joint_probs ) # Create combined quadratic form for entire design twophase_quad_form <- make_twophase_quad_form( sigma_1 = full_phase1_quad_form[phase2_inclusion, phase2_inclusion], sigma_2 = phase2_quad_form, phase_2_joint_probs = phase2_joint_probs ) combined_weights <- 1/(phase2_sample$SAMPLING_PROB * response_propensities) # Use for variance estimation rep_factors <- make_gen_boot_factors( Sigma = twophase_quad_form, num_replicates = 500 ) library(survey) twophase_rep_design <- svrepdesign( data = phase2_sample, type = 'other', repweights = rep_factors, weights = combined_weights, combined.weights = FALSE, scale = attr(rep_factors, 'scale'), rscales = attr(rep_factors, 'rscales') ) svymean(x = ~ LIBRARIA, design = twophase_rep_design) }"},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":null,"dir":"Reference","previous_headings":"","what":"Redistribute weight from one group to another — redistribute_weights","title":"Redistribute weight from one group to another — redistribute_weights","text":"Redistributes weight one group another: example, non-respondents respondents. Redistribution conducted full-sample weights well set replicate weights. can done separately combination set grouping variables, example implement nonresponse weighting class adjustment.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Redistribute weight from one group to another — redistribute_weights","text":"","code":"redistribute_weights(design, reduce_if, increase_if, by)"},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Redistribute weight from one group to another — redistribute_weights","text":"design survey design object, created either survey srvyr packages. reduce_if expression indicating cases weights set zero. Must evaluate logical vector values TRUE FALSE. increase_if expression indicating cases weights increased. Must evaluate logical vector values TRUE FALSE. (Optional) character vector names variables used group redistribution weights. example, data include variables named \"stratum\" \"wt_class\", one specify = c(\"stratum\", \"wt_class\").","code":""},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Redistribute weight from one group to another — redistribute_weights","text":"survey design object, updated full-sample weights updated replicate weights. resulting survey design object always value combined.weights set TRUE.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Redistribute weight from one group to another — redistribute_weights","text":"See Chapter 2 Heeringa, West, Berglund (2017) Chapter 13 Valliant, Dever, Kreuter (2018) overview nonresponse adjustment methods based redistributing weights. - Heeringa, S., West, B., Berglund, P. (2017). Applied Survey Data Analysis, 2nd edition. Boca Raton, FL: CRC Press. \"Applied Survey Data Analysis, 2nd edition.\" Boca Raton, FL: CRC Press. - Valliant, R., Dever, J., Kreuter, F. (2018). \"Practical Tools Designing Weighting Survey Samples, 2nd edition.\" New York: Springer.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Redistribute weight from one group to another — redistribute_weights","text":"","code":"# Load example data suppressPackageStartupMessages(library(survey)) data(api) dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) dclus1$variables$response_status <- sample(x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), size = nrow(dclus1), replace = TRUE) rep_design <- as.svrepdesign(dclus1) # Adjust weights for cases with unknown eligibility ue_adjusted_design <- redistribute_weights( design = rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\"), by = c(\"stype\") ) # Adjust weights for nonresponse nr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status %in% c(\"Nonrespondent\"), increase_if = response_status == \"Respondent\", by = c(\"stype\") )"},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":null,"dir":"Reference","previous_headings":"","what":"Rescale replicate factors — rescale_reps","title":"Rescale replicate factors — rescale_reps","text":"Rescale replicate factors. main application rescaling ensure replicate weights strictly positive. Note rescaling impact variance estimates totals (linear statistics), variance estimates nonlinear statistics affected rescaling.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Rescale replicate factors — rescale_reps","text":"","code":"rescale_reps(x, tau = NULL, min_wgt = 0.01, digits = 2)"},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Rescale replicate factors — rescale_reps","text":"x Either replicate survey design object, numeric matrix replicate weights. tau Either single positive number, NULL. rescaling constant \\(\\tau\\) used transformation \\(\\frac{w + \\tau - 1}{\\tau}\\), \\(w\\) original weight. tau=NULL left unspecified, argument min_wgt used instead, case, \\(\\tau\\) automatically set smallest value needed rescale replicate weights least min_wgt. min_wgt used tau=NULL tau left unspecified. Specifies minimum acceptable value rescaled weights, used automatically determine value \\(\\tau\\) used transformation \\(\\frac{w + \\tau - 1}{\\tau}\\), \\(w\\) original weight. Must least zero must less one. digits used argument min_wgt used. Specifies number decimal places use choosing tau. Using smaller number digits useful simply producing easier--read documentation.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Rescale replicate factors — rescale_reps","text":"input numeric matrix, returns rescaled matrix. input replicate survey design object, returns updated replicate survey design object. replicate survey design object, results depend whether object matrix replicate factors rather matrix replicate weights (product replicate factors sampling weights). design object combined.weights=FALSE, replication factors adjusted. design object combined.weights=TRUE, replicate weights adjusted. strongly recommended use rescaling method replication factors rather weights. replicate survey design object, scale element design object updated appropriately, element tau also added. input matrix instead survey design object, result matrix attribute named tau can retrieved using attr(x, 'tau').","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Rescale replicate factors — rescale_reps","text":"Let \\(\\mathbf{} = \\left[ \\mathbf{}^{(1)} \\cdots \\mathbf{}^{(b)} \\cdots \\mathbf{}^{(B)} \\right]\\) denote \\((n \\times B)\\) matrix replicate adjustment factors. eliminate negative adjustment factors, Beaumont Patak (2012) propose forming rescaled matrix nonnegative replicate factors \\(\\mathbf{}^S\\) rescaling adjustment factor \\(a_k^{(b)}\\) follows: $$ a_k^{S,(b)} = \\frac{a_k^{(b)} + \\tau - 1}{\\tau} $$ \\(\\tau \\geq 1 - a_k^{(b)} \\geq 1\\) \\(k\\) \\(\\left\\{ 1,\\ldots,n \\right\\}\\) \\(b\\) \\(\\left\\{1, \\ldots, B\\right\\}\\). value \\(\\tau\\) can set based realized adjustment factor matrix \\(\\mathbf{}\\) choosing \\(\\tau\\) prior generating adjustment factor matrix \\(\\mathbf{}\\) \\(\\tau\\) likely large enough prevent negative adjustment factors. adjustment factors rescaled manner, important adjust scale factor used estimating variance bootstrap replicates. example, bootstrap replicates, adjustment factor becomes \\(\\frac{\\tau^2}{B}\\) instead \\(\\frac{1}{B}\\). $$ \\textbf{Prior rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{1}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2 $$ $$ \\textbf{rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{\\tau^2}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{S*(b)}-\\hat{T}_y\\right)^2 $$","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Rescale replicate factors — rescale_reps","text":"method suggested Fay (1989) specific application creating replicate factors using generalized replication method. Beaumont Patak (2012) provided extended discussion rescaling method context rescaling generalized bootstrap replication factors avoid negative replicate weights. notation used documentation taken Beaumont Patak (2012). - Beaumont, Jean-François, Zdenek Patak. 2012. \"Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.\" International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x. - Fay, Robert. 1989. \"Theory Application Replicate Weighting Variance Calculations.\" , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Rescale replicate factors — rescale_reps","text":"","code":"# Example 1: Rescaling a matrix of replicate weights to avoid negative weights rep_wgts <- matrix( c(1.69742746694909, -0.230761178913411, 1.53333377634192, 0.0495043413294782, 1.81820367441039, 1.13229198793703, 1.62482013925955, 1.0866133494029, 0.28856654131668, 0.581930729719006, 0.91827012312825, 1.49979905894482, 1.26281337410693, 1.99327362761477, -0.25608700039304), nrow = 3, ncol = 5 ) rescaled_wgts <- rescale_reps(rep_wgts, min_wgt = 0.01) print(rep_wgts) #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.6974275 0.04950434 1.6248201 0.5819307 1.262813 #> [2,] -0.2307612 1.81820367 1.0866133 0.9182701 1.993274 #> [3,] 1.5333338 1.13229199 0.2885665 1.4997991 -0.256087 print(rescaled_wgts) #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.54915549 0.2515782 1.4919844 0.6708116 1.20693966 #> [2,] 0.03089671 1.6442549 1.0681995 0.9356458 1.78210522 #> [3,] 1.41994786 1.1041669 0.4398162 1.3935426 0.01095512 #> attr(,\"tau\") #> [1] 1.27 # Example 2: Rescaling replicate weights with a specified value of 'tau' rescaled_wgts <- rescale_reps(rep_wgts, tau = 2) print(rescaled_wgts) #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.3487137 0.5247522 1.3124101 0.7909654 1.1314067 #> [2,] 0.3846194 1.4091018 1.0433067 0.9591351 1.4966368 #> [3,] 1.2666669 1.0661460 0.6442833 1.2498995 0.3719565 #> attr(,\"tau\") #> [1] 2 # Example 3: Rescaling replicate weights of a survey design object set.seed(2023) library(survey) data('mu284', package = 'survey') ## First create a bootstrap design object svy_design_object <- svydesign( data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2 ) boot_design <- as_gen_boot_design( design = svy_design_object, variance_estimator = \"Stratified Multistage SRS\", replicates = 5, tau = 1 ) ## Rescale the weights rescaled_boot_design <- boot_design |> rescale_reps(min_wgt = 0.01) boot_wgts <- weights(boot_design, \"analysis\") rescaled_boot_wgts <- weights(rescaled_boot_design, 'analysis') print(boot_wgts) #> REP_1 REP_2 REP_3 REP_4 REP_5 #> [1,] 34.071074 -3.352195 7.031013 35.4547244 18.681422 #> [2,] -3.271131 12.579037 57.474328 9.3992013 25.014379 #> [3,] 12.204302 16.611771 14.029208 6.9869038 -8.727739 #> [4,] 40.124053 62.587721 29.834150 31.6263955 10.057763 #> [5,] 6.857688 48.936835 5.029175 42.1974205 67.126670 #> [6,] 38.866284 -7.883877 6.363613 35.3323662 14.104502 #> [7,] -2.705981 5.310800 51.191780 -18.8838183 34.232137 #> [8,] 23.948409 19.740921 21.950039 0.8683187 -2.397135 #> [9,] 38.102201 56.396306 39.516036 39.6713936 31.130900 #> [10,] 7.987330 41.986885 8.545987 47.8769539 66.314653 #> [11,] 35.747939 -13.746937 9.901870 41.9315736 8.610797 #> [12,] 1.384506 2.579634 50.469377 -26.8411849 19.800463 #> [13,] 22.153736 11.250766 19.117806 0.9281634 -1.226728 #> [14,] 48.183146 68.452257 28.322524 31.3003310 12.972211 #> [15,] 7.066647 63.713091 11.462660 41.8092991 64.604278 #> attr(,\"tau\") #> [1] 1 #> attr(,\"scale\") #> [1] 0.2 #> attr(,\"rscales\") #> [1] 1 1 1 1 1 print(rescaled_boot_wgts) #> REP_1 REP_2 REP_3 REP_4 REP_5 #> [1,] 25.24027 6.805158 11.92004 25.9218675 17.659157 #> [2,] 11.91898 19.726948 41.84285 18.1605261 25.852732 #> [3,] 14.46846 16.639624 15.36743 11.8983106 4.157107 #> [4,] 34.98722 46.053065 29.91830 30.8011800 20.176238 #> [5,] 15.21725 35.945896 14.31651 32.6259871 44.906406 #> [6,] 27.60244 4.572803 11.59127 25.8615925 15.404516 #> [7,] 12.19738 16.146535 38.74800 4.2280041 30.393500 #> [8,] 20.25373 18.181078 19.26931 8.8842293 7.275631 #> [9,] 33.99123 43.003106 34.68770 34.7642333 30.557093 #> [10,] 15.77373 32.522275 16.04893 35.4237868 44.506397 #> [11,] 26.06631 1.684596 13.33425 29.1124336 12.698258 #> [12,] 14.21240 14.801133 38.39214 0.3081191 23.284300 #> [13,] 19.36966 13.998735 17.87412 8.9137094 7.852187 #> [14,] 38.95721 48.941999 29.17366 30.6405572 21.611926 #> [15,] 15.32019 43.224840 17.48571 32.4347943 43.663848 #> attr(,\"tau\") #> [1] 2.03 #> attr(,\"scale\") #> [1] 0.82418 #> attr(,\"rscales\") #> [1] 1 1 1 1 1"},{"path":"https://bschneidr.github.io/svrep/reference/shift_weight.html","id":null,"dir":"Reference","previous_headings":"","what":"(Internal function) Shift weight from one set of cases to another — shift_weight","title":"(Internal function) Shift weight from one set of cases to another — shift_weight","text":"likely want use redistribute_weights instead. function shift_weight internal package used \"--hood.\"","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shift_weight.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(Internal function) Shift weight from one set of cases to another — shift_weight","text":"","code":"shift_weight(wt_set, is_upweight_case, is_downweight_case)"},{"path":"https://bschneidr.github.io/svrep/reference/shift_weight.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(Internal function) Shift weight from one set of cases to another — shift_weight","text":"wt_set numeric vector weights is_upweight_case logical vector indicating cases whose weight increased is_downweight_case logical vector indicating cases whose weight decreased","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shift_weight.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"(Internal function) Shift weight from one set of cases to another — shift_weight","text":"numeric vector adjusted weights, length wt_set.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":null,"dir":"Reference","previous_headings":"","what":"Shuffle the order of replicates in a survey design object — shuffle_replicates","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"Shuffle order replicates survey design object. words, order columns replicate weights randomly permuted.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"","code":"shuffle_replicates(design)"},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"design survey design object, created either survey srvyr packages.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"updated survey design object, order replicates shuffled (.e., order randomly permuted).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"","code":"library(survey) set.seed(2023) # Create an example survey design object sample_data <- data.frame( STRATUM = c(1,1,1,1,2,2,2,2), PSU = c(1,2,3,4,5,6,7,8) ) survey_design <- svydesign( data = sample_data, strata = ~ STRATUM, ids = ~ PSU, weights = ~ 1 ) rep_design <- survey_design |> as_fays_gen_rep_design(variance_estimator = \"Ultimate Cluster\") # Inspect replicates before shuffling rep_design |> getElement(\"repweights\") #> REP_1 REP_2 REP_3 REP_4 REP_5 REP_6 REP_7 #> [1,] 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 #> [2,] 1.0722540 0.6864786 1.3135214 0.9277460 0.6920437 1.5492236 0.4507764 #> [3,] 0.4135167 1.1689015 0.8310985 1.5864833 1.3507810 1.0668008 0.9331992 #> [4,] 1.1606758 1.4981733 0.5018267 0.8393242 0.6036219 0.7375290 1.2624710 #> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 #> [6,] 0.9342068 1.3506702 1.3506702 0.9342068 0.8300909 0.4136276 0.4136276 #> [7,] 1.2618712 0.6028047 0.6028047 1.2618712 0.5024265 1.1614930 1.1614930 #> [8,] 0.4503686 0.6929717 0.6929717 0.4503686 1.3139292 1.0713260 1.0713260 #> REP_8 #> [1,] 0.6464466 #> [2,] 1.3079563 #> [3,] 0.6492190 #> [4,] 1.3963781 #> [5,] 1.3535534 #> [6,] 0.8300909 #> [7,] 0.5024265 #> [8,] 1.3139292 #> attr(,\"scale\") #> [1] 1 #> attr(,\"rscales\") #> [1] 1 1 1 1 1 1 1 1 # Inspect replicates after shuffling rep_design |> shuffle_replicates() |> getElement(\"repweights\") #> REP_5 REP_1 REP_7 REP_8 REP_6 REP_3 REP_2 #> [1,] 1.3535534 1.3535534 1.3535534 0.6464466 0.6464466 1.3535534 0.6464466 #> [2,] 0.6920437 1.0722540 0.4507764 1.3079563 1.5492236 1.3135214 0.6864786 #> [3,] 1.3507810 0.4135167 0.9331992 0.6492190 1.0668008 0.8310985 1.1689015 #> [4,] 0.6036219 1.1606758 1.2624710 1.3963781 0.7375290 0.5018267 1.4981733 #> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 #> [6,] 0.8300909 0.9342068 0.4136276 0.8300909 0.4136276 1.3506702 1.3506702 #> [7,] 0.5024265 1.2618712 1.1614930 0.5024265 1.1614930 0.6028047 0.6028047 #> [8,] 1.3139292 0.4503686 1.0713260 1.3139292 1.0713260 0.6929717 0.6929717 #> REP_4 #> [1,] 0.6464466 #> [2,] 0.9277460 #> [3,] 1.5864833 #> [4,] 0.8393242 #> [5,] 1.3535534 #> [6,] 0.9342068 #> [7,] 1.2618712 #> [8,] 0.4503686 #> attr(,\"scale\") #> [1] 1 #> attr(,\"rscales\") #> [1] 1 1 1 1 1 1 1 1"},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":null,"dir":"Reference","previous_headings":"","what":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"Stack replicate designs: combine rows data, rows replicate weights, respective full-sample weights. can useful comparing estimates set adjustments made weights. Another delicate application combining sets replicate weights multiple years data survey, although must done carefully based guidance data provider.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"","code":"stack_replicate_designs(..., .id = \"Design_Name\")"},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"... Replicate-weights survey design objects combine. can supplied one two ways. Option 1 - series design objects, example 'adjusted' = adjusted_design, 'orig' = orig_design. Option 2 - list object containing design objects, example list('nr' = nr_adjusted_design, 'ue' = ue_adjusted_design). objects must specifications type, rho, mse, scales, rscales. .id single character value, becomes name new column identifiers created output data link row design taken. labels used identifiers taken named arguments.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"replicate-weights survey design object, class svyrep.design svyrep.stacked. resulting survey design object always value combined.weights set TRUE.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"","code":"# Load example data, creating a replicate design object suppressPackageStartupMessages(library(survey)) data(api) dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) dclus1$variables$response_status <- sample(x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), size = nrow(dclus1), replace = TRUE) orig_rep_design <- as.svrepdesign(dclus1) # Adjust weights for cases with unknown eligibility ue_adjusted_design <- redistribute_weights( design = orig_rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\"), by = c(\"stype\") ) # Adjust weights for nonresponse nr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status %in% c(\"Nonrespondent\"), increase_if = response_status == \"Respondent\", by = c(\"stype\") ) # Stack the three designs, using any of the following syntax options stacked_design <- stack_replicate_designs(orig_rep_design, ue_adjusted_design, nr_adjusted_design, .id = \"which_design\") stacked_design <- stack_replicate_designs('original' = orig_rep_design, 'unknown eligibility adjusted' = ue_adjusted_design, 'nonresponse adjusted' = nr_adjusted_design, .id = \"which_design\") list_of_designs <- list('original' = orig_rep_design, 'unknown eligibility adjusted' = ue_adjusted_design, 'nonresponse adjusted' = nr_adjusted_design) stacked_design <- stack_replicate_designs(list_of_designs, .id = \"which_design\")"},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":null,"dir":"Reference","previous_headings":"","what":"Retain only a random subset of the replicates in a design — subsample_replicates","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"Randomly subsamples replicates survey design object, keep subset. scale factor used estimation increased account subsampling.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"","code":"subsample_replicates(design, n_reps)"},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"design survey design object, created either survey srvyr packages. n_reps number replicates keep subsampling","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"updated survey design object, random selection replicates retained. overall 'scale' factor design (accessed design$scale) increased account sampling replicates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"Suppose initial replicate design \\(L\\) replicates, respective constants \\(c_k\\) \\(k=1,\\dots,L\\) used estimate variance formula $$v_{R} = \\sum_{k=1}^L c_k\\left(\\hat{T}_y^{(k)}-\\hat{T}_y\\right)^2$$ subsampling replicates, \\(L_0\\) original \\(L\\) replicates randomly selected, variances estimated using formula: $$v_{R} = \\frac{L}{L_0} \\sum_{k=1}^{L_0} c_k\\left(\\hat{T}_y^{(k)}-\\hat{T}_y\\right)^2$$ subsampling suggested certain replicate designs Fay (1989). Kim Wu (2013) provide detailed theoretical justification also propose alternative methods subsampling replicates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"Fay, Robert. 1989. \"Theory Application Replicate Weighting Variance Calculations.\" , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf Kim, J.K. Wu, C. 2013. \"Sparse Efficient Replication Variance Estimation Complex Surveys.\" Survey Methodology, Statistics Canada, 39(1), 91-120.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"","code":"library(survey) set.seed(2023) # Create an example survey design object sample_data <- data.frame( STRATUM = c(1,1,1,1,2,2,2,2), PSU = c(1,2,3,4,5,6,7,8) ) survey_design <- svydesign( data = sample_data, strata = ~ STRATUM, ids = ~ PSU, weights = ~ 1 ) rep_design <- survey_design |> as_fays_gen_rep_design(variance_estimator = \"Ultimate Cluster\") # Inspect replicates before subsampling rep_design |> getElement(\"repweights\") #> REP_1 REP_2 REP_3 REP_4 REP_5 REP_6 REP_7 #> [1,] 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 #> [2,] 1.0722540 0.6864786 1.3135214 0.9277460 0.6920437 1.5492236 0.4507764 #> [3,] 0.4135167 1.1689015 0.8310985 1.5864833 1.3507810 1.0668008 0.9331992 #> [4,] 1.1606758 1.4981733 0.5018267 0.8393242 0.6036219 0.7375290 1.2624710 #> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 #> [6,] 0.9342068 1.3506702 1.3506702 0.9342068 0.8300909 0.4136276 0.4136276 #> [7,] 1.2618712 0.6028047 0.6028047 1.2618712 0.5024265 1.1614930 1.1614930 #> [8,] 0.4503686 0.6929717 0.6929717 0.4503686 1.3139292 1.0713260 1.0713260 #> REP_8 #> [1,] 0.6464466 #> [2,] 1.3079563 #> [3,] 0.6492190 #> [4,] 1.3963781 #> [5,] 1.3535534 #> [6,] 0.8300909 #> [7,] 0.5024265 #> [8,] 1.3139292 #> attr(,\"scale\") #> [1] 1 #> attr(,\"rscales\") #> [1] 1 1 1 1 1 1 1 1 # Inspect replicates after subsampling rep_design |> subsample_replicates(n_reps = 4) |> getElement(\"repweights\") #> REP_5 REP_1 REP_7 REP_8 #> [1,] 1.3535534 1.3535534 1.3535534 0.6464466 #> [2,] 0.6920437 1.0722540 0.4507764 1.3079563 #> [3,] 1.3507810 0.4135167 0.9331992 0.6492190 #> [4,] 0.6036219 1.1606758 1.2624710 1.3963781 #> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 #> [6,] 0.8300909 0.9342068 0.4136276 0.8300909 #> [7,] 0.5024265 1.2618712 1.1614930 0.5024265 #> [8,] 1.3139292 0.4503686 1.0713260 1.3139292 #> attr(,\"scale\") #> [1] 4 #> attr(,\"rscales\") #> [1] 1 1 1 1"},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":null,"dir":"Reference","previous_headings":"","what":"Summarize the replicate weights — summarize_rep_weights","title":"Summarize the replicate weights — summarize_rep_weights","text":"Summarize replicate weights design","code":""},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Summarize the replicate weights — summarize_rep_weights","text":"","code":"summarize_rep_weights(rep_design, type = \"both\", by)"},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Summarize the replicate weights — summarize_rep_weights","text":"rep_design replicate design object, created either survey srvyr packages. type Default \"\". Use type = \"overall\", overall summary replicate weights. Use type = \"specific\" summary column replicate weights, column replicate weights summarized given row summary. Use type = \"\" list containing summaries, list containing names \"overall\" \"\". (Optional) character vector names variables used group summaries.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Summarize the replicate weights — summarize_rep_weights","text":"type = \"\" (default), result list data frames names \"overall\" \"specific\". type = \"overall\", result data frame providing overall summary replicate weights. contents \"overall\" summary following: \"nrows\": Number rows weights \"ncols\": Number columns replicate weights \"degf_svy_pkg\": degrees freedom according survey package R \"rank\": matrix rank determined QR decomposition \"avg_wgt_sum\": average column sum \"sd_wgt_sums\": standard deviation column sums \"min_rep_wgt\": minimum value replicate weight \"max_rep_wgt\": maximum value replicate weight type = \"specific\", result data frame providing summary column replicate weights, column replicate weights described given row data frame. contents \"specific\" summary following: \"Rep_Column\": name given column replicate weights. columns unnamed, column number used instead \"N\": number entries \"N_NONZERO\": number nonzero entries \"SUM\": sum weights \"MEAN\": average weights \"CV\": coefficient variation weights (standard deviation divided mean) \"MIN\": minimum weight \"MAX\": maximum weight","code":""},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Summarize the replicate weights — summarize_rep_weights","text":"","code":"# Load example data suppressPackageStartupMessages(library(survey)) data(api) dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) dclus1$variables$response_status <- sample(x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), size = nrow(dclus1), replace = TRUE) rep_design <- as.svrepdesign(dclus1) # Adjust weights for cases with unknown eligibility ue_adjusted_design <- redistribute_weights( design = rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\"), by = c(\"stype\") ) # Summarize replicate weights summarize_rep_weights(rep_design, type = \"both\") #> $overall #> nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 183 15 14 15 6194 403.1741 0 36.26464 #> #> $specific #> Rep_Column N N_NONZERO SUM MEAN CV MIN MAX #> 1 1 183 172 6237.518 34.08480 0.25358407 0 36.26464 #> 2 2 183 179 6491.370 35.47197 0.14989713 0 36.26464 #> 3 3 183 181 6563.900 35.86830 0.10540606 0 36.26464 #> 4 4 183 170 6164.989 33.68846 0.27729183 0 36.26464 #> 5 5 183 181 6563.900 35.86830 0.10540606 0 36.26464 #> 6 6 183 179 6491.370 35.47197 0.14989713 0 36.26464 #> 7 7 183 179 6491.370 35.47197 0.14989713 0 36.26464 #> 8 8 183 167 6056.195 33.09396 0.31037848 0 36.26464 #> 9 9 183 174 6310.047 34.48113 0.22805336 0 36.26464 #> 10 10 183 149 5403.431 29.52695 0.47900073 0 36.26464 #> 11 11 183 162 5874.872 32.10312 0.36102892 0 36.26464 #> 12 12 183 146 5294.637 28.93244 0.50479412 0 36.26464 #> 13 13 183 170 6164.989 33.68846 0.27729183 0 36.26464 #> 14 14 183 182 6600.164 36.06647 0.07432829 0 36.26464 #> 15 15 183 171 6201.253 33.88663 0.26563324 0 36.26464 #> # Summarize replicate weights by grouping variables summarize_rep_weights(ue_adjusted_design, type = 'overall', by = c(\"response_status\")) #> response_status nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums #> 1 Ineligible 39 15 13 14 1896.620 164.4527 #> 2 Nonrespondent 47 15 14 15 2296.912 133.9403 #> 3 Respondent 41 15 13 14 2000.468 130.3750 #> 4 Unknown eligibility 56 15 -1 0 0.000 0.0000 #> min_rep_wgt max_rep_wgt #> 1 0 56.98729 #> 2 0 56.98729 #> 3 0 56.98729 #> 4 0 0.00000 summarize_rep_weights(ue_adjusted_design, type = 'overall', by = c(\"stype\", \"response_status\")) #> stype response_status nrows ncols degf_svy_pkg rank avg_wgt_sum #> 1 E Ineligible 29 15 7 8 1413.77685 #> 2 H Ineligible 6 15 2 3 283.80822 #> 3 M Ineligible 4 15 3 4 199.03463 #> 4 E Nonrespondent 36 15 12 13 1753.97013 #> 5 H Nonrespondent 2 15 1 2 95.02487 #> 6 M Nonrespondent 9 15 5 6 447.91713 #> 7 E Respondent 35 15 10 11 1706.22048 #> 8 H Respondent 2 15 1 2 95.02487 #> 9 M Respondent 4 15 2 3 199.22315 #> 10 E Unknown eligibility 44 15 -1 0 0.00000 #> 11 H Unknown eligibility 4 15 -1 0 0.00000 #> 12 M Unknown eligibility 8 15 -1 0 0.00000 #> sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 149.65790 0 53.40068 #> 2 40.20252 0 56.98729 #> 3 23.57306 0 56.98729 #> 4 121.79339 0 53.40068 #> 5 18.18279 0 56.98729 #> 6 46.50443 0 56.98729 #> 7 135.21908 0 53.40068 #> 8 18.18279 0 56.98729 #> 9 32.77474 0 56.98729 #> 10 0.00000 0 0.00000 #> 11 0.00000 0 0.00000 #> 12 0.00000 0 0.00000 # Compare replicate weights rep_wt_summaries <- lapply(list('original' = rep_design, 'adjusted' = ue_adjusted_design), summarize_rep_weights, type = \"overall\") print(rep_wt_summaries) #> $original #> nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 183 15 14 15 6194 403.1741 0 36.26464 #> #> $adjusted #> nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 183 15 14 15 6194 403.1741 0 56.98729 #>"},{"path":"https://bschneidr.github.io/svrep/reference/svrep-package.html","id":null,"dir":"Reference","previous_headings":"","what":"svrep: Tools for Creating, Updating, and Analyzing Survey Replicate Weights — svrep-package","title":"svrep: Tools for Creating, Updating, and Analyzing Survey Replicate Weights — svrep-package","text":"Provides tools creating working survey replicate weights, extending functionality 'survey' package Lumley (2004) doi:10.18637/jss.v009.i08 . Implements bootstrap methods complex surveys, including generalized survey bootstrap described Beaumont Patak (2012) doi:10.1111/j.1751-5823.2011.00166.x . Methods provided applying nonresponse adjustments full-sample replicate weights described Rust Rao (1996) doi:10.1177/096228029600500305 . Implements methods sample-based calibration described Opsomer Erciulescu (2021) https://www150.statcan.gc.ca/n1/pub/12-001-x/2021002/article/00006-eng.htm. Diagnostic functions included compare weights weighted estimates different sets replicate weights.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/svrep-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"svrep: Tools for Creating, Updating, and Analyzing Survey Replicate Weights — svrep-package","text":"Maintainer: Ben Schneider benjamin.julius.schneider@gmail.com (ORCID)","code":""},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":null,"dir":"Reference","previous_headings":"","what":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"modified version svyby() function survey package. Whereas svyby() calculates statistics separately subset formed specified grouping variable, svyby_repwts() calculates statistics separately replicate design, addition additional user-specified grouping variables.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"","code":"svyby_repwts( rep_designs, formula, by, FUN, ..., deff = FALSE, keep.var = TRUE, keep.names = TRUE, verbose = FALSE, vartype = c(\"se\", \"ci\", \"ci\", \"cv\", \"cvpct\", \"var\"), drop.empty.groups = TRUE, return.replicates = FALSE, na.rm.by = FALSE, na.rm.all = FALSE, multicore = getOption(\"survey.multicore\") )"},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"rep_designs replicate-weights survey designs compared. Supplied either : named list replicate-weights survey design objects, example list('nr' = nr_adjusted_design, 'ue' = ue_adjusted_design). 'stacked' replicate-weights survey design object created stack_replicate_designs(). designs must number columns replicate weights, type (bootstrap, JKn, etc.) formula formula specifying variables pass FUN formula specifying factors define subsets FUN function taking formula survey design object first two arguments. Usually function survey package, svytotal svymean. ... arguments FUN deff value TRUE FALSE, indicating whether design effects estimated possible. keep.var value TRUE FALSE. FUN returns svystat object, indicates whether extract standard errors . keep.names Define row names based subsets verbose TRUE, print label subset processed. vartype Report variability one standard error, confidence interval, coefficient variation, percent coefficient variation, variance drop.empty.groups FALSE, report NA empty groups, TRUE drop output return.replicates TRUE, return replicates \"replicates\" attribute result. can useful want produce custom summaries estimates replicate. na.rm.true, omit groups defined NA values variables na.rm.true, check groups non-missing observations variables defined formula treat groups empty multicore Use multicore package distribute subsets multiple processors?","code":""},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"object class \"svyby\": data frame showing grouping factors results FUN combination grouping factors. first grouping factor always consists indicators replicate design used estimate.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"","code":"if (FALSE) { suppressPackageStartupMessages(library(survey)) data(api) dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) dclus1$variables$response_status <- sample(x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), size = nrow(dclus1), replace = TRUE) orig_rep_design <- as.svrepdesign(dclus1) # Adjust weights for cases with unknown eligibility ue_adjusted_design <- redistribute_weights( design = orig_rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\"), by = c(\"stype\") ) # Adjust weights for nonresponse nr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status %in% c(\"Nonrespondent\"), increase_if = response_status == \"Respondent\", by = c(\"stype\") ) # Compare estimates from the three sets of replicate weights list_of_designs <- list('original' = orig_rep_design, 'unknown eligibility adjusted' = ue_adjusted_design, 'nonresponse adjusted' = nr_adjusted_design) ##_ First compare overall means for two variables means_by_design <- svyby_repwts(formula = ~ api00 + api99, FUN = svymean, rep_design = list_of_designs) print(means_by_design) ##_ Next compare domain means for two variables domain_means_by_design <- svyby_repwts(formula = ~ api00 + api99, by = ~ stype, FUN = svymean, rep_design = list_of_designs) print(domain_means_by_design) # Calculate confidence interval for difference between estimates ests_by_design <- svyby_repwts(rep_designs = list('NR-adjusted' = nr_adjusted_design, 'Original' = orig_rep_design), FUN = svymean, formula = ~ api00 + api99) differences_in_estimates <- svycontrast(stat = ests_by_design, contrasts = list( 'Mean of api00: NR-adjusted vs. Original' = c(1,-1,0,0), 'Mean of api99: NR-adjusted vs. Original' = c(0,0,1,-1) )) print(differences_in_estimates) confint(differences_in_estimates, level = 0.95) }"},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":null,"dir":"Reference","previous_headings":"","what":"Variance Estimators — variance-estimators","title":"Variance Estimators — variance-estimators","text":"help page describes variance estimators commonly used survey samples. variance estimators can used basis generalized replication methods, implemented functions as_fays_gen_rep_design(), as_gen_boot_design(), make_fays_gen_rep_factors(), make_gen_boot_factors()","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"shared-notation","dir":"Reference","previous_headings":"","what":"Shared Notation","title":"Variance Estimators — variance-estimators","text":"Let \\(s\\) denote selected sample size \\(n\\), elements \\(=1,\\dots,n\\). Element \\(\\) sample probability \\(\\pi_i\\) included sample. pair elements \\(ij\\) sampled probability \\(\\pi_{ij}\\). population total variable denoted \\(Y = \\sum_{\\U}y_i\\), Horvitz-Thompson estimator \\(\\hat{Y}\\) denoted \\(\\hat{Y} = \\sum_{\\s} y_i/\\pi_i\\). convenience, denote \\(\\breve{y}_i = y_i/\\pi_i\\). true sampling variance \\(\\hat{Y}\\) denoted \\(V(\\hat{Y})\\), estimator sampling variance denoted \\(v(\\hat{Y})\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"horvitz-thompson","dir":"Reference","previous_headings":"","what":"Horvitz-Thompson","title":"Variance Estimators — variance-estimators","text":"Horvitz-Thompson variance estimator: $$ v(\\hat{Y}) = \\sum_{\\s}\\sum_{j \\s} (1 - \\frac{\\pi_i \\pi_j}{\\pi_{ij}}) \\frac{y_i}{\\pi_i} \\frac{y_j}{\\pi_j} $$","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"yates-grundy","dir":"Reference","previous_headings":"","what":"Yates-Grundy","title":"Variance Estimators — variance-estimators","text":"Yates-Grundy variance estimator: $$ v(\\hat{Y}) = -\\frac{1}{2}\\sum_{\\s}\\sum_{j \\s} (1 - \\frac{\\pi_i \\pi_j}{\\pi_{ij}}) (\\frac{y_i}{\\pi_i} - \\frac{y_j}{\\pi_j})^2 $$","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"poisson-horvitz-thompson","dir":"Reference","previous_headings":"","what":"Poisson Horvitz-Thompson","title":"Variance Estimators — variance-estimators","text":"Poisson Horvitz-Thompson variance estimator simply Horvitz-Thompson variance estimator, \\(\\pi_{ij}=\\pi_i \\times \\pi_j\\), case Poisson sampling.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"stratified-multistage-srs","dir":"Reference","previous_headings":"","what":"Stratified Multistage SRS","title":"Variance Estimators — variance-estimators","text":"Stratified Multistage SRS variance estimator recursive variance estimator proposed Bellhouse (1985) used 'survey' package's function svyrecvar. case simple random sampling without replacement (one stages), estimator exactly matches Horvitz-Thompson estimator. estimator can used number sampling stages. illustration, describe use two sampling stages. $$ v(\\hat{Y}) = \\hat{V}_1 + \\hat{V}_2 $$ $$ \\hat{V}_1 = \\sum_{h=1}^{H} (1 - \\frac{n_h}{N_h})\\frac{n_h}{n_h - 1} \\sum_{=1}^{n_h} (y_{hi.} - \\bar{y}_{hi.})^2 $$ $$ \\hat{V}_2 = \\sum_{h=1}^{H} \\frac{n_h}{N_h} \\sum_{=1}^{n_h}v_{hi}(y_{hi.}) $$ \\(n_h\\) number sampled clusters stratum \\(h\\), \\(N_h\\) number population clusters stratum \\(h\\), \\(y_{hi.}\\) weighted cluster total cluster \\(\\) stratum \\(h\\), \\(\\bar{y}_{hi.}\\) mean weighted cluster total stratum \\(h\\), (\\(\\bar{y}_{hi.} = \\frac{1}{n_h}\\sum_{=1}^{n_h}y_{hi.}\\)), \\(v_{hi}(y_{hi.})\\) estimated sampling variance \\(y_{hi.}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"ultimate-cluster","dir":"Reference","previous_headings":"","what":"Ultimate Cluster","title":"Variance Estimators — variance-estimators","text":"Ultimate Cluster variance estimator simply stratified multistage SRS variance estimator, ignoring variances later stages sampling. $$ v(\\hat{Y}) = \\hat{V}_1 $$ variance estimator used 'survey' package user specifies option(survey.ultimate.cluster = TRUE) uses svyrecvar(..., one.stage = TRUE). first-stage sampling fractions small, analysts often omit finite population corrections \\((1-\\frac{n_h}{N_h})\\) using ultimate cluster estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"sd-and-sd-successive-difference-estimators-","dir":"Reference","previous_headings":"","what":"SD1 and SD2 (Successive Difference Estimators)","title":"Variance Estimators — variance-estimators","text":"SD1 SD2 variance estimators \"successive difference\" estimators sometimes used systematic sampling designs. Ash (2014) describes estimator follows: $$ \\hat{v}_{S D 1}(\\hat{Y}) = \\left(1-\\frac{n}{N}\\right) \\frac{n}{2(n-1)} \\sum_{k=2}^n\\left(\\breve{y}_k-\\breve{y}_{k-1}\\right)^2 $$ $$ \\hat{v}_{S D 2}(\\hat{Y}) = \\left(1-\\frac{n}{N}\\right) \\frac{1}{2}\\left[\\sum_{k=2}^n\\left(\\breve{y}_k-\\breve{y}_{k-1}\\right)^2+\\left(\\breve{y}_n-\\breve{y}_1\\right)^2\\right] $$ \\(\\breve{y}_k = y_k/\\pi_k\\) weighted value unit \\(k\\) selection probability \\(\\pi_k\\). SD1 estimator recommended Wolter (1984). SD2 estimator basis successive difference replication estimator commonly used systematic sampling designs conservative. See Ash (2014) details. multistage samples, SD1 SD2 applied clusters stage, separately stratum. later stages sampling, variance estimate stratum multiplied product sampling fractions earlier stages sampling. example, third stage sampling, variance estimate third-stage stratum multiplied \\(\\frac{n_1}{N_1}\\frac{n_2}{N_2}\\), product sampling fractions first-stage stratum second-stage stratum.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"deville-and-deville-","dir":"Reference","previous_headings":"","what":"Deville 1 and Deville 2","title":"Variance Estimators — variance-estimators","text":"\"Deville-1\" \"Deville-2\" variance estimators clearly described Matei Tillé (2005), intended designs use fixed-size, unequal-probability random sampling without replacement. variance estimators shown effective designs use fixed sample size high-entropy sampling method. includes PPSWOR sampling methods, unequal-probability systematic sampling important exception. variance estimators take following form: $$ \\hat{v}(\\hat{Y}) = \\sum_{=1}^{n} c_i (\\breve{y}_i - \\frac{1}{\\sum_{=k}^{n}c_k}\\sum_{k=1}^{n}c_k \\breve{y}_k)^2 $$ \\(\\breve{y}_i = y_i/\\pi_i\\) weighted value variable interest, \\(c_i\\) depend method used: \"Deville-1\": $$c_i=\\left(1-\\pi_i\\right) \\frac{n}{n-1}$$ \"Deville-2\": $$c_i = (1-\\pi_i) \\left[1 - \\sum_{k=1}^{n} \\left(\\frac{1-\\pi_k}{\\sum_{k=1}^{n}(1-\\pi_k)}\\right)^2 \\right]^{-1}$$ case simple random sampling without replacement (SRSWOR), estimators identical usual stratified multistage SRS estimator (special case Horvitz-Thompson estimator). multistage samples, \"Deville-1\" \"Deville-2\" applied clusters stage, separately stratum. later stages sampling, variance estimate stratum multiplied product sampling probabilities earlier stages sampling. example, third stage sampling, variance estimate third-stage stratum multiplied \\(\\pi_1 \\times \\pi_{(2 | 1)}\\), \\(\\pi_1\\) sampling probability first-stage unit \\(\\pi_{(2|1)}\\) sampling probability second-stage unit within first-stage unit.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"deville-till-","dir":"Reference","previous_headings":"","what":"Deville-Tillé","title":"Variance Estimators — variance-estimators","text":"See Section 6.8 Tillé (2020) detail estimator, including explanation quadratic form. See Deville Tillé (2005) results simulation study comparing alternative estimators balanced sampling. estimator can written follows: $$ v(\\hat{Y})=\\sum_{k \\S} \\frac{c_k}{\\pi_k^2}\\left(y_k-\\hat{y}_k^*\\right)^2, $$ $$ \\hat{y}_k^*=\\mathbf{z}_k^{\\top}\\left(\\sum_{\\ell \\S} c_{\\ell} \\frac{\\mathbf{z}_{\\ell} \\mathbf{z}_{\\ell}^{\\prime}}{\\pi_{\\ell}^2}\\right)^{-1} \\sum_{\\ell \\S} c_{\\ell} \\frac{\\mathbf{z}_{\\ell} y_{\\ell}}{\\pi_{\\ell}^2} $$ \\(\\mathbf{z}_k\\) denotes vector auxiliary variables observation \\(k\\) included sample \\(S\\), inclusion probability \\(\\pi_k\\). value \\(c_k\\) set \\(\\frac{n}{n-q}(1-\\pi_k)\\), \\(n\\) number observations \\(q\\) number auxiliary variables.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Variance Estimators — variance-estimators","text":"Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59. Bellhouse, D.R. (1985). \"Computing Methods Variance Estimation Complex Surveys.\" Journal Official Statistics, Vol.1, .3. Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. Tillé, Y. (2020). \"Sampling estimation finite populations.\" (. Hekimi, Trans.). Wiley. Matei, Alina, Yves Tillé. (2005). “Evaluation Variance Approximations Estimators Maximum Entropy Sampling Unequal Probability Fixed Sample Size.” Journal Official Statistics, 21(4):543–70.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/wls_hat_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create the ","title":"Create the ","text":"Create \"hat matrix\" weighted least squares regression","code":""},{"path":"https://bschneidr.github.io/svrep/reference/wls_hat_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create the ","text":"","code":"wls_hat_matrix(X, w)"},{"path":"https://bschneidr.github.io/svrep/reference/wls_hat_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create the ","text":"X Matrix predictor variables, n rows w Vector weights (nonnegative), length n","code":""},{"path":"https://bschneidr.github.io/svrep/reference/wls_hat_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create the ","text":"\\(n \\times n\\) matrix. \"hat matrix\" WLS regression.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-063","dir":"Changelog","previous_headings":"","what":"svrep 0.6.3","title":"svrep 0.6.3","text":"CRAN release: 2023-09-09 Bumped version number CRAN submission. significant user-facing changes: just updates unit tests rendering examples/vignettes due temporary CRAN check issues development version R.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-062","dir":"Changelog","previous_headings":"","what":"svrep 0.6.2","title":"svrep 0.6.2","text":"Bug fixes: Bumped version number CRAN submission. significant user-facing changes: just updates unit tests rendering examples/vignettes due temporary CRAN check issues development version R. Changes specifically CRAN check: Removed 12 unmarked UTF-8 strings causing CRAN check note. Removed LaTeX ‘cases’ formatting documentation as_random_group_jackknife_design(), since old release MacOS throwing LaTeX error trying build manual. formatting might restored later ‘oldrel’ CRAN increases 4.3.X.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-061","dir":"Changelog","previous_headings":"","what":"svrep 0.6.1","title":"svrep 0.6.1","text":"CRAN release: 2023-08-30 Added support Fay’s generalized replication method, specifically version proposed Fay (1989): key functions as_fays_gen_rep_design() make_fays_gen_rep_factors(), nearly identical generalized bootstrap functions as_gen_boot_design() make_gen_boot_factors(). Added new variance estimator, \"Deville-Tille\", useful balanced sampling (including cube method). Currently works single-stage designs. functions as_gen_boot_design() as_fays_gen_rep_design() new argument aux_var_names meant used \"Deville-Tille\" variance estimator. Similarly, make_gen_boot_factors() make_fays_gen_rep_factors() argument named aux_vars.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-060","dir":"Changelog","previous_headings":"","what":"svrep 0.6.0","title":"svrep 0.6.0","text":"CRAN release: 2023-07-06 Added function as_random_group_jackknife_design() create random-group jackknife replicates. creation generalized bootstrap replicates designs many observations degrees freedom (e.g., stratified cluster samples) now much faster efficient. based using ‘Matrix’ package–particularly efficient representation sparse matrices arise stratified designs–well using compressed representation designs use cluster sampling. Now using ‘Matrix’ package improve speed memory usage large quadratic forms. primarily helpful making generalized bootstrap computationally feasible larger datasets. Better documentation bootstrap methods covered as_bootstrap_design(). following functions now work database-backed survey design objects (.e., objects class DBIsvydesign): as_data_frame_with_weights() as_gen_boot_design() as_bootstrap_design() redistribute_weights() calibrate_to_sample() calibrate_to_estimate() function as_data_frame_with_weights() gained argument vars_to_keep allows user indicate want keep specific list variables data. can useful, example, want keep weights unique identifiers. Minor updates bug fixes: function as_bootstrap_design() now throws informative error message supply invalid value type argument. Bug Fix: “Deville-1” “Deville-2” estimators threw errors strata one units selected certainty (.e., sampling probabilities 1). now fixed. Bug Fix: function as_gen_boot_design() sometimes fail detect input design PPS design, caused give user unnecessary error message.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-051","dir":"Changelog","previous_headings":"","what":"svrep 0.5.1","title":"svrep 0.5.1","text":"CRAN release: 2023-05-17 Added argument exact_vcov = TRUE as_gen_boot_design() make_gen_boot_factors(). argument forces generalized bootstrap variance-covariance estimates totals exactly match target variance estimator. words, eliminates bootstrap simulation error variance estimates totals. similar , simple survey designs, jackknife BRR give variance estimates totals exactly match Horvitz-Thompson estimates. Using exact_vcov requires number replicates strictly greater rank target variance estimator. Added new variance estimators (“Deville 1” “Deville 2”) available use generalized bootstrap, particularly useful single-stage PPSWOR designs multistage designs one stages PPSWOR sampling. See updated documentation as_gen_boot_design() make_quad_form_matrix(). ‘srvyr’ package loaded, functions ‘svrep’ return survey design objects always return tbl_svy input tbl_svy. makes easier use functions summarize() mutate(). Fixed bug as_bootstrap_design() wouldn’t create 50 replicates Rao-Wu, Preston, Canty-Davison types.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-050","dir":"Changelog","previous_headings":"","what":"svrep 0.5.0","title":"svrep 0.5.0","text":"CRAN release: 2023-02-07 release adds extensive new functionality two-phase designs. new vignette “Replication Methods Two-phase Sampling” describes new functionality well underlying statistical methods. function as_gen_boot_design() can now create generalized bootstrap weights two-phase survey design objects created ‘survey’ package’s twophase() function. user must specify list two variance estimators use phase, e.g. list('Stratified Multistage SRS', 'Ultimate Cluster'). function make_twophase_quad_form() can used create quadratic form two-phase variance estimator, combining quadratic forms phase. helper function get_nearest_psd_matrix() can used approximate quadratic form matrix nearest positive semidefinite matrix. can particularly useful two-phase designs, since double expansion estimator commonly used practice frequently variance estimator positive semidefinite. function as_gen_boot_design() new argument named psd_option, controls happen target variance estimator quadratic form matrix positive semi-definite. can occasionally happen, particularly two-phase designs. default, function warn user quadratic form positive semi-definite automatically approximate matrix nearest positive semi-definite matrix. Added new function get_design_quad_form(), determines quadratic form matrix specified variance estimator, parsing information stored survey design object created using ‘survey’ package. Added new function rescale_reps() implements rescaling replicate adjustment factors avoid negative replicate weights. functionality already existed as_gen_boot_design() make_gen_boot_factors(), now implemented help new function. Added helper function is_psd_matrix() checking whether matrix positive semi-definite, added helper function get_nearest_psd_matrix() approximating square matrix nearest positive semi-definite matrix. Minor improvements vignettes, particularly formatting.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-041","dir":"Changelog","previous_headings":"","what":"svrep 0.4.1","title":"svrep 0.4.1","text":"CRAN release: 2022-12-18 Fix bug #15, bootstrap conversion multistage survey design objects as_bootstrap_design() throw error user manually specified weights svydesign(). Creation Rao-Wu-Yue-Beaumont bootstrap replicate weights now faster takes less computer memory. Typo fix vignettes.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-040","dir":"Changelog","previous_headings":"","what":"svrep 0.4.0","title":"svrep 0.4.0","text":"CRAN release: 2022-12-11 release adds several functions creating bootstrap generalized bootstrap replicate weights. new vignette “Bootstrap methods surveys” provides guidance choosing bootstrap method selecting number bootstrap replicates use, along statistical details references. Added function as_bootstrap_design() convert survey design object replicate design replicate weights created using bootstrap method. essentially specialized version .svrepdesign() supports additional bootstrap methods detailed documentation bootstrap methods can used different types sampling designs. Added function as_gen_boot_design() convert survey design object replicate design replicate weights created using generalized survey bootstrap. user must supply name target variance estimator (e.g., “Horvitz-Thompson” “Ultimate Cluster”) used create generalized bootstrap factors. See new vignette details. Added functions help choose number bootstrap replicates. function estimate_boot_sim_cv() can used estimate simulation error bootstrap estimate caused using finite number bootstrap replicates. new function estimate_boot_reps_for_target_cv() estimates number bootstrap replicates needed reduce simulation error target level. Added function make_rwyb_bootstrap_weights(), creates bootstrap replicate weights wide range survey designs using method Rao-Wu-Yue-Beaumont (.e., Beaumont’s generalization Rao-Wu-Yue bootstrap method). function can used directly, users can specify as_bootstrap_design(type = \"Rao-Wu-Yue-Beaumont\"). Added function make_gen_boot_factors() create replicate adjustment factors using generalized survey bootstrap. key input make_gen_boot_factors() matrix quadratic form used represent variance estimator. new function make_quad_form_matrix() can used represent chosen variance estimator quadratic form, given information sample design. can used stratified multistage SRS designs (without replacement), systematic samples, PPS samples, without replacement. Minor Updates Bug Fixes: using as_data_frame_with_weights(), ensure full-sample weight named \"FULL_SAMPLE_WGT\" user specify something different. calibrate_to_estimate(), ensure output names list columns perturbed control columns col_selection instead perturbed_control_cols, name matches corresponding function argument, col_selection. Improvements documentation (formatting tweaks typo fixes)","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-030","dir":"Changelog","previous_headings":"","what":"svrep 0.3.0","title":"svrep 0.3.0","text":"CRAN release: 2022-07-05 Added helper function as_data_frame_with_weights() convert survey design object data frame columns weights (full-sample weights , applicable, replicate weights). useful saving data weights data file. Added argument summarize_rep_weights() allows specification one grouping variables use summaries (e.g. = c('stratum', 'response_status') can used summarize response status within stratum). Added small vignette “Nonresponse Adjustments” illustrate conduct nonresponse adjustments using redistribute_weights(). Minor Updates Bug Fixes: Internal code update avoid annoying harmless warning message rho calibrate_to_estimate(). Bug fix stack_replicate_designs() designs created .svrepdesign(..., type = 'mrbbootstrap') .svrepdesign(..., type = 'subbootstrap') threw error.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-020","dir":"Changelog","previous_headings":"","what":"svrep 0.2.0","title":"svrep 0.2.0","text":"CRAN release: 2022-05-12 Added functions calibrate_to_estimate() calibrate_to_sample() calibrating estimated control totals methods account sampling variance control totals. overview functions, please see new vignette “Calibrating Estimated Control Totals”. function calibrate_to_estimate() requires user supply vector control totals variance-covariance matrix. function applies Fuller’s proposed adjustments replicate weights, control totals varied across replicates perturbing control totals using spectral decomposition control totals’ variance-covariance matrix. function calibrate_to_sample() requires user supply replicate design primary survey interest well replicate design control survey used estimate control totals calibration. function applies Opsomer & Erciulescu’s method varying control totals across replicates primary survey matching primary survey replicate replicate control survey. Added example dataset, lou_vax_survey, simulated survey measuring Covid-19 vaccination status handful demographic variables, based simple random sample 1,000 residents Louisville, Kentucky approximately 50% response rate. accompanying dataset lou_pums_microdata provides person-level microdata American Community Survey (ACS) 2015-2019 public-use microdata sample (PUMS) data Louisville, KY. dataset lou_pums_microdata includes replicate weights use variance estimation can used generate control totals lou_vax_survey.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-010","dir":"Changelog","previous_headings":"","what":"svrep 0.1.0","title":"svrep 0.1.0","text":"CRAN release: 2022-03-30 Initial release package.","code":""}] +[{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"GNU General Public License","title":"GNU General Public License","text":"Version 3, 29 June 2007Copyright © 2007 Free Software Foundation, Inc.  Everyone permitted copy distribute verbatim copies license document, changing allowed.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"preamble","dir":"","previous_headings":"","what":"Preamble","title":"GNU General Public License","text":"GNU General Public License free, copyleft license software kinds works. licenses software practical works designed take away freedom share change works. contrast, GNU General Public License intended guarantee freedom share change versions program–make sure remains free software users. , Free Software Foundation, use GNU General Public License software; applies also work released way authors. can apply programs, . speak free software, referring freedom, price. General Public Licenses designed make sure freedom distribute copies free software (charge wish), receive source code can get want , can change software use pieces new free programs, know can things. protect rights, need prevent others denying rights asking surrender rights. Therefore, certain responsibilities distribute copies software, modify : responsibilities respect freedom others. example, distribute copies program, whether gratis fee, must pass recipients freedoms received. must make sure , , receive can get source code. must show terms know rights. Developers use GNU GPL protect rights two steps: (1) assert copyright software, (2) offer License giving legal permission copy, distribute /modify . developers’ authors’ protection, GPL clearly explains warranty free software. users’ authors’ sake, GPL requires modified versions marked changed, problems attributed erroneously authors previous versions. devices designed deny users access install run modified versions software inside , although manufacturer can . fundamentally incompatible aim protecting users’ freedom change software. systematic pattern abuse occurs area products individuals use, precisely unacceptable. Therefore, designed version GPL prohibit practice products. problems arise substantially domains, stand ready extend provision domains future versions GPL, needed protect freedom users. Finally, every program threatened constantly software patents. States allow patents restrict development use software general-purpose computers, , wish avoid special danger patents applied free program make effectively proprietary. prevent , GPL assures patents used render program non-free. precise terms conditions copying, distribution modification follow.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_0-definitions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"0. Definitions","title":"GNU General Public License","text":"“License” refers version 3 GNU General Public License. “Copyright” also means copyright-like laws apply kinds works, semiconductor masks. “Program” refers copyrightable work licensed License. licensee addressed “”. “Licensees” “recipients” may individuals organizations. “modify” work means copy adapt part work fashion requiring copyright permission, making exact copy. resulting work called “modified version” earlier work work “based ” earlier work. “covered work” means either unmodified Program work based Program. “propagate” work means anything , without permission, make directly secondarily liable infringement applicable copyright law, except executing computer modifying private copy. Propagation includes copying, distribution (without modification), making available public, countries activities well. “convey” work means kind propagation enables parties make receive copies. Mere interaction user computer network, transfer copy, conveying. interactive user interface displays “Appropriate Legal Notices” extent includes convenient prominently visible feature (1) displays appropriate copyright notice, (2) tells user warranty work (except extent warranties provided), licensees may convey work License, view copy License. interface presents list user commands options, menu, prominent item list meets criterion.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_1-source-code","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"1. Source Code","title":"GNU General Public License","text":"“source code” work means preferred form work making modifications . “Object code” means non-source form work. “Standard Interface” means interface either official standard defined recognized standards body, , case interfaces specified particular programming language, one widely used among developers working language. “System Libraries” executable work include anything, work whole, () included normal form packaging Major Component, part Major Component, (b) serves enable use work Major Component, implement Standard Interface implementation available public source code form. “Major Component”, context, means major essential component (kernel, window system, ) specific operating system () executable work runs, compiler used produce work, object code interpreter used run . “Corresponding Source” work object code form means source code needed generate, install, (executable work) run object code modify work, including scripts control activities. However, include work’s System Libraries, general-purpose tools generally available free programs used unmodified performing activities part work. example, Corresponding Source includes interface definition files associated source files work, source code shared libraries dynamically linked subprograms work specifically designed require, intimate data communication control flow subprograms parts work. Corresponding Source need include anything users can regenerate automatically parts Corresponding Source. Corresponding Source work source code form work.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_2-basic-permissions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"2. Basic Permissions","title":"GNU General Public License","text":"rights granted License granted term copyright Program, irrevocable provided stated conditions met. License explicitly affirms unlimited permission run unmodified Program. output running covered work covered License output, given content, constitutes covered work. License acknowledges rights fair use equivalent, provided copyright law. may make, run propagate covered works convey, without conditions long license otherwise remains force. may convey covered works others sole purpose make modifications exclusively , provide facilities running works, provided comply terms License conveying material control copyright. thus making running covered works must exclusively behalf, direction control, terms prohibit making copies copyrighted material outside relationship . Conveying circumstances permitted solely conditions stated . Sublicensing allowed; section 10 makes unnecessary.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_3-protecting-users-legal-rights-from-anti-circumvention-law","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"3. Protecting Users’ Legal Rights From Anti-Circumvention Law","title":"GNU General Public License","text":"covered work shall deemed part effective technological measure applicable law fulfilling obligations article 11 WIPO copyright treaty adopted 20 December 1996, similar laws prohibiting restricting circumvention measures. convey covered work, waive legal power forbid circumvention technological measures extent circumvention effected exercising rights License respect covered work, disclaim intention limit operation modification work means enforcing, work’s users, third parties’ legal rights forbid circumvention technological measures.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_4-conveying-verbatim-copies","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"4. Conveying Verbatim Copies","title":"GNU General Public License","text":"may convey verbatim copies Program’s source code receive , medium, provided conspicuously appropriately publish copy appropriate copyright notice; keep intact notices stating License non-permissive terms added accord section 7 apply code; keep intact notices absence warranty; give recipients copy License along Program. may charge price price copy convey, may offer support warranty protection fee.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_5-conveying-modified-source-versions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"5. Conveying Modified Source Versions","title":"GNU General Public License","text":"may convey work based Program, modifications produce Program, form source code terms section 4, provided also meet conditions: ) work must carry prominent notices stating modified , giving relevant date. b) work must carry prominent notices stating released License conditions added section 7. requirement modifies requirement section 4 “keep intact notices”. c) must license entire work, whole, License anyone comes possession copy. License therefore apply, along applicable section 7 additional terms, whole work, parts, regardless packaged. License gives permission license work way, invalidate permission separately received . d) work interactive user interfaces, must display Appropriate Legal Notices; however, Program interactive interfaces display Appropriate Legal Notices, work need make . compilation covered work separate independent works, nature extensions covered work, combined form larger program, volume storage distribution medium, called “aggregate” compilation resulting copyright used limit access legal rights compilation’s users beyond individual works permit. Inclusion covered work aggregate cause License apply parts aggregate.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_6-conveying-non-source-forms","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"6. Conveying Non-Source Forms","title":"GNU General Public License","text":"may convey covered work object code form terms sections 4 5, provided also convey machine-readable Corresponding Source terms License, one ways: ) Convey object code , embodied , physical product (including physical distribution medium), accompanied Corresponding Source fixed durable physical medium customarily used software interchange. b) Convey object code , embodied , physical product (including physical distribution medium), accompanied written offer, valid least three years valid long offer spare parts customer support product model, give anyone possesses object code either (1) copy Corresponding Source software product covered License, durable physical medium customarily used software interchange, price reasonable cost physically performing conveying source, (2) access copy Corresponding Source network server charge. c) Convey individual copies object code copy written offer provide Corresponding Source. alternative allowed occasionally noncommercially, received object code offer, accord subsection 6b. d) Convey object code offering access designated place (gratis charge), offer equivalent access Corresponding Source way place charge. need require recipients copy Corresponding Source along object code. place copy object code network server, Corresponding Source may different server (operated third party) supports equivalent copying facilities, provided maintain clear directions next object code saying find Corresponding Source. Regardless server hosts Corresponding Source, remain obligated ensure available long needed satisfy requirements. e) Convey object code using peer--peer transmission, provided inform peers object code Corresponding Source work offered general public charge subsection 6d. separable portion object code, whose source code excluded Corresponding Source System Library, need included conveying object code work. “User Product” either (1) “consumer product”, means tangible personal property normally used personal, family, household purposes, (2) anything designed sold incorporation dwelling. determining whether product consumer product, doubtful cases shall resolved favor coverage. particular product received particular user, “normally used” refers typical common use class product, regardless status particular user way particular user actually uses, expects expected use, product. product consumer product regardless whether product substantial commercial, industrial non-consumer uses, unless uses represent significant mode use product. “Installation Information” User Product means methods, procedures, authorization keys, information required install execute modified versions covered work User Product modified version Corresponding Source. information must suffice ensure continued functioning modified object code case prevented interfered solely modification made. convey object code work section , , specifically use , User Product, conveying occurs part transaction right possession use User Product transferred recipient perpetuity fixed term (regardless transaction characterized), Corresponding Source conveyed section must accompanied Installation Information. requirement apply neither third party retains ability install modified object code User Product (example, work installed ROM). requirement provide Installation Information include requirement continue provide support service, warranty, updates work modified installed recipient, User Product modified installed. Access network may denied modification materially adversely affects operation network violates rules protocols communication across network. Corresponding Source conveyed, Installation Information provided, accord section must format publicly documented (implementation available public source code form), must require special password key unpacking, reading copying.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_7-additional-terms","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"7. Additional Terms","title":"GNU General Public License","text":"“Additional permissions” terms supplement terms License making exceptions one conditions. Additional permissions applicable entire Program shall treated though included License, extent valid applicable law. additional permissions apply part Program, part may used separately permissions, entire Program remains governed License without regard additional permissions. convey copy covered work, may option remove additional permissions copy, part . (Additional permissions may written require removal certain cases modify work.) may place additional permissions material, added covered work, can give appropriate copyright permission. Notwithstanding provision License, material add covered work, may (authorized copyright holders material) supplement terms License terms: ) Disclaiming warranty limiting liability differently terms sections 15 16 License; b) Requiring preservation specified reasonable legal notices author attributions material Appropriate Legal Notices displayed works containing ; c) Prohibiting misrepresentation origin material, requiring modified versions material marked reasonable ways different original version; d) Limiting use publicity purposes names licensors authors material; e) Declining grant rights trademark law use trade names, trademarks, service marks; f) Requiring indemnification licensors authors material anyone conveys material (modified versions ) contractual assumptions liability recipient, liability contractual assumptions directly impose licensors authors. non-permissive additional terms considered “restrictions” within meaning section 10. Program received , part , contains notice stating governed License along term restriction, may remove term. license document contains restriction permits relicensing conveying License, may add covered work material governed terms license document, provided restriction survive relicensing conveying. add terms covered work accord section, must place, relevant source files, statement additional terms apply files, notice indicating find applicable terms. Additional terms, permissive non-permissive, may stated form separately written license, stated exceptions; requirements apply either way.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_8-termination","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"8. Termination","title":"GNU General Public License","text":"may propagate modify covered work except expressly provided License. attempt otherwise propagate modify void, automatically terminate rights License (including patent licenses granted third paragraph section 11). However, cease violation License, license particular copyright holder reinstated () provisionally, unless copyright holder explicitly finally terminates license, (b) permanently, copyright holder fails notify violation reasonable means prior 60 days cessation. Moreover, license particular copyright holder reinstated permanently copyright holder notifies violation reasonable means, first time received notice violation License (work) copyright holder, cure violation prior 30 days receipt notice. Termination rights section terminate licenses parties received copies rights License. rights terminated permanently reinstated, qualify receive new licenses material section 10.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_9-acceptance-not-required-for-having-copies","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"9. Acceptance Not Required for Having Copies","title":"GNU General Public License","text":"required accept License order receive run copy Program. Ancillary propagation covered work occurring solely consequence using peer--peer transmission receive copy likewise require acceptance. However, nothing License grants permission propagate modify covered work. actions infringe copyright accept License. Therefore, modifying propagating covered work, indicate acceptance License .","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_10-automatic-licensing-of-downstream-recipients","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"10. Automatic Licensing of Downstream Recipients","title":"GNU General Public License","text":"time convey covered work, recipient automatically receives license original licensors, run, modify propagate work, subject License. responsible enforcing compliance third parties License. “entity transaction” transaction transferring control organization, substantially assets one, subdividing organization, merging organizations. propagation covered work results entity transaction, party transaction receives copy work also receives whatever licenses work party’s predecessor interest give previous paragraph, plus right possession Corresponding Source work predecessor interest, predecessor can get reasonable efforts. may impose restrictions exercise rights granted affirmed License. example, may impose license fee, royalty, charge exercise rights granted License, may initiate litigation (including cross-claim counterclaim lawsuit) alleging patent claim infringed making, using, selling, offering sale, importing Program portion .","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_11-patents","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"11. Patents","title":"GNU General Public License","text":"“contributor” copyright holder authorizes use License Program work Program based. work thus licensed called contributor’s “contributor version”. contributor’s “essential patent claims” patent claims owned controlled contributor, whether already acquired hereafter acquired, infringed manner, permitted License, making, using, selling contributor version, include claims infringed consequence modification contributor version. purposes definition, “control” includes right grant patent sublicenses manner consistent requirements License. contributor grants non-exclusive, worldwide, royalty-free patent license contributor’s essential patent claims, make, use, sell, offer sale, import otherwise run, modify propagate contents contributor version. following three paragraphs, “patent license” express agreement commitment, however denominated, enforce patent (express permission practice patent covenant sue patent infringement). “grant” patent license party means make agreement commitment enforce patent party. convey covered work, knowingly relying patent license, Corresponding Source work available anyone copy, free charge terms License, publicly available network server readily accessible means, must either (1) cause Corresponding Source available, (2) arrange deprive benefit patent license particular work, (3) arrange, manner consistent requirements License, extend patent license downstream recipients. “Knowingly relying” means actual knowledge , patent license, conveying covered work country, recipient’s use covered work country, infringe one identifiable patents country reason believe valid. , pursuant connection single transaction arrangement, convey, propagate procuring conveyance , covered work, grant patent license parties receiving covered work authorizing use, propagate, modify convey specific copy covered work, patent license grant automatically extended recipients covered work works based . patent license “discriminatory” include within scope coverage, prohibits exercise , conditioned non-exercise one rights specifically granted License. may convey covered work party arrangement third party business distributing software, make payment third party based extent activity conveying work, third party grants, parties receive covered work , discriminatory patent license () connection copies covered work conveyed (copies made copies), (b) primarily connection specific products compilations contain covered work, unless entered arrangement, patent license granted, prior 28 March 2007. Nothing License shall construed excluding limiting implied license defenses infringement may otherwise available applicable patent law.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_12-no-surrender-of-others-freedom","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"12. No Surrender of Others’ Freedom","title":"GNU General Public License","text":"conditions imposed (whether court order, agreement otherwise) contradict conditions License, excuse conditions License. convey covered work satisfy simultaneously obligations License pertinent obligations, consequence may convey . example, agree terms obligate collect royalty conveying convey Program, way satisfy terms License refrain entirely conveying Program.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_13-use-with-the-gnu-affero-general-public-license","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"13. Use with the GNU Affero General Public License","title":"GNU General Public License","text":"Notwithstanding provision License, permission link combine covered work work licensed version 3 GNU Affero General Public License single combined work, convey resulting work. terms License continue apply part covered work, special requirements GNU Affero General Public License, section 13, concerning interaction network apply combination .","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_14-revised-versions-of-this-license","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"14. Revised Versions of this License","title":"GNU General Public License","text":"Free Software Foundation may publish revised /new versions GNU General Public License time time. new versions similar spirit present version, may differ detail address new problems concerns. version given distinguishing version number. Program specifies certain numbered version GNU General Public License “later version” applies , option following terms conditions either numbered version later version published Free Software Foundation. Program specify version number GNU General Public License, may choose version ever published Free Software Foundation. Program specifies proxy can decide future versions GNU General Public License can used, proxy’s public statement acceptance version permanently authorizes choose version Program. Later license versions may give additional different permissions. However, additional obligations imposed author copyright holder result choosing follow later version.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_15-disclaimer-of-warranty","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"15. Disclaimer of Warranty","title":"GNU General Public License","text":"WARRANTY PROGRAM, EXTENT PERMITTED APPLICABLE LAW. EXCEPT OTHERWISE STATED WRITING COPYRIGHT HOLDERS /PARTIES PROVIDE PROGRAM “” WITHOUT WARRANTY KIND, EITHER EXPRESSED IMPLIED, INCLUDING, LIMITED , IMPLIED WARRANTIES MERCHANTABILITY FITNESS PARTICULAR PURPOSE. ENTIRE RISK QUALITY PERFORMANCE PROGRAM . PROGRAM PROVE DEFECTIVE, ASSUME COST NECESSARY SERVICING, REPAIR CORRECTION.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_16-limitation-of-liability","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"16. Limitation of Liability","title":"GNU General Public License","text":"EVENT UNLESS REQUIRED APPLICABLE LAW AGREED WRITING COPYRIGHT HOLDER, PARTY MODIFIES /CONVEYS PROGRAM PERMITTED , LIABLE DAMAGES, INCLUDING GENERAL, SPECIAL, INCIDENTAL CONSEQUENTIAL DAMAGES ARISING USE INABILITY USE PROGRAM (INCLUDING LIMITED LOSS DATA DATA RENDERED INACCURATE LOSSES SUSTAINED THIRD PARTIES FAILURE PROGRAM OPERATE PROGRAMS), EVEN HOLDER PARTY ADVISED POSSIBILITY DAMAGES.","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"id_17-interpretation-of-sections-15-and-16","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"17. Interpretation of Sections 15 and 16","title":"GNU General Public License","text":"disclaimer warranty limitation liability provided given local legal effect according terms, reviewing courts shall apply local law closely approximates absolute waiver civil liability connection Program, unless warranty assumption liability accompanies copy Program return fee. END TERMS CONDITIONS","code":""},{"path":"https://bschneidr.github.io/svrep/LICENSE.html","id":"how-to-apply-these-terms-to-your-new-programs","dir":"","previous_headings":"","what":"How to Apply These Terms to Your New Programs","title":"GNU General Public License","text":"develop new program, want greatest possible use public, best way achieve make free software everyone can redistribute change terms. , attach following notices program. safest attach start source file effectively state exclusion warranty; file least “copyright” line pointer full notice found. Also add information contact electronic paper mail. program terminal interaction, make output short notice like starts interactive mode: hypothetical commands show w show c show appropriate parts General Public License. course, program’s commands might different; GUI interface, use “box”. also get employer (work programmer) school, , sign “copyright disclaimer” program, necessary. information , apply follow GNU GPL, see . GNU General Public License permit incorporating program proprietary programs. program subroutine library, may consider useful permit linking proprietary applications library. want , use GNU Lesser General Public License instead License. first, please read .","code":" Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'. This is free software, and you are welcome to redistribute it under certain conditions; type 'show c' for details."},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"choosing-a-bootstrap-method","dir":"Articles","previous_headings":"","what":"Choosing a Bootstrap Method","title":"Bootstrap Methods for Surveys","text":"Essentially every bootstrap method commonly used surveys can used simple random sampling replacement can easily applied stratified sampling (simply repeat method separately stratum). However, things become complicated types sampling care needed use bootstrap method appropriate survey design. many common designs used practice, possible (easy!) use one bootstrap methods described section vignette titled “Basic Bootstrap Methods.” design isn’t appropriate one basic bootstrap methods, may possible use generalized survey bootstrap described later section vignette. generalized survey bootstrap method can used especially complex designs, systematic sampling two-phase sampling designs. interested reader encouraged read Mashreghi, Haziza, Léger (2016) overview bootstrap methods developed survey samples.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"basic-bootstrap-methods","dir":"Articles","previous_headings":"","what":"Basic Bootstrap Methods","title":"Bootstrap Methods for Surveys","text":"sample designs used practice, three basic survey design features must considered choosing bootstrap method: Whether multiple stages sampling Whether design uses without-replacement sampling large sampling fractions Whether design uses unequal-probability sampling (commonly referred “probability proportional size (PPS)” sampling statistics jargon) ‘svrep’ ‘survey’ packages implement four basic bootstrap methods, can handle one survey design features. four methods, Rao-Wu-Yue-Beaumont bootstrap method (Beaumont Émond 2022) one able directly handle three design features thus default method used function as_bootstrap_design().1 following table summarizes four basic bootstrap methods appropriateness common design features described earlier. Designs Covered Bootstrap Method Data Required Bootstrap Method","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"implementation","dir":"Articles","previous_headings":"Basic Bootstrap Methods","what":"Implementation","title":"Bootstrap Methods for Surveys","text":"implement basic bootstrap methods, can create survey design object svydesign() function survey package, convert object bootstrap replicate design using as_bootstrap_design(). method can used multistage, stratified designs one different kinds sampling, provided “Rao-Wu-Yue-Beaumont” method used. Example 1: Multistage Simple Random Sampling without Replacement (SRSWOR) Example 2: Single-stage unequal probability sampling without replacement Example 3: Multistage Sampling Different Sampling Methods Stage designs use different sampling methods different stages, can use argument samp_method_by_stage ensure correct method used form bootstrap weights. general, multistage design uses unequal probability sampling stages, creating initial design object, stage-specific sampling probabilities supplied fpc argument svydesign() function, user specify pps = \"brewer\".","code":"library(survey) # For complex survey analysis library(svrep) set.seed(2022) # Load an example dataset from a multistage sample, with two stages of SRSWOR data(\"mu284\", package = 'survey') multistage_srswor_design <- svydesign(data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2) bootstrap_rep_design <- as_bootstrap_design(multistage_srswor_design, type = \"Rao-Wu-Yue-Beaumont\", replicates = 500) svytotal(x = ~ y1, design = multistage_srswor_design) #> total SE #> y1 15080 2274.3 svytotal(x = ~ y1, design = bootstrap_rep_design) #> total SE #> y1 15080 2311.1 # Load example dataset of U.S. counties and states with 2004 Presidential vote counts data(\"election\", package = 'survey') pps_wor_design <- svydesign(data = election_pps, pps = HR(), fpc = ~ p, # Inclusion probabilities ids = ~ 1) bootstrap_rep_design <- as_bootstrap_design(pps_wor_design, type = \"Rao-Wu-Yue-Beaumont\", replicates = 100) svytotal(x = ~ Bush + Kerry, design = pps_wor_design) svytotal(x = ~ Bush + Kerry, design = bootstrap_rep_design) # Declare a multistage design # where first-stage probabilities are PPSWOR sampling # and second-stage probabilities are based on SRSWOR multistage_design <- svydesign( data = library_multistage_sample, ids = ~ PSU_ID + SSU_ID, probs = ~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, pps = \"brewer\" ) # Convert to a bootstrap replicate design boot_design <- as_bootstrap_design( design = multistage_design, type = \"Rao-Wu-Yue-Beaumont\", samp_method_by_stage = c(\"PPSWOR\", \"SRSWOR\"), replicates = 1000 ) # Compare variance estimates svytotal(x = ~ TOTCIR, na.rm = TRUE, design = multistage_design) #> total SE #> TOTCIR 1634739229 250890030 svytotal(x = ~ TOTCIR, na.rm = TRUE, design = boot_design) #> total SE #> TOTCIR 1634739229 264207604"},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"generalized-survey-bootstrap","dir":"Articles","previous_headings":"","what":"Generalized Survey Bootstrap","title":"Bootstrap Methods for Surveys","text":"sample designs additional complex features beyond three highlighted , generalized survey bootstrap method can used. especially useful systematic samples, two-phase samples, complex designs one wishes use general-purpose estimator Horvitz-Thompson Yates-Grundy estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"statistical-background","dir":"Articles","previous_headings":"Generalized Survey Bootstrap","what":"Statistical Background","title":"Bootstrap Methods for Surveys","text":"generalized survey bootstrap based remarkable observation Fay (1984), summarized nicely Dippo, Fay, Morganstein (1984): …variance estimator based sums squares cross-products represented resampling plan. -- Dippo, Fay, Morganstein (1984) words, sample design textbook variance estimator totals can represented quadratic form (.e., sums squares cross-products), can make replication estimator . Fay developed general methodology producing replication estimators textbook estimator’s quadratic form, encompassing jackknife, bootstrap, balanced repeated replication special cases. Within framework, “generalized survey bootstrap” developed Bertail Combris (1997) one specific strategy making bootstrap replication estimators textbook variance estimators. See Beaumont Patak (2012) clear overview generalized survey bootstrap. starting point implementing generalized survey bootstrap method choose textbook variance estimator appropriate sampling design can represented quadratic form. Luckily, many useful variance estimators can represented quadratic forms. highlight prominent examples : stratified, multistage cluster samples: usual multistage variance estimator used ‘survey’ package, based adding variance contributions stage. estimator can used number sampling stages. Highly-general variance estimators work ‘measurable’ survey design (.e., designs every pair units population nonzero probability appearing sample). covers designs used practice, primary exceptions “one-PSU-per-stratum” designs systematic sampling designs. Horvitz-Thompson estimator Sen-Yates-Grundy estimator systematic samples: SD1 SD2 successive-differences estimators, basis commonly-used “successive-differences replication” (SDR) estimator (see Ash (2014) overview SDR). two-phase samples: double-expansion variance estimator described Section 9.3 Särndal, Swensson, Wretman (1992). textbook variance estimator selected quadratic form identified, generalized survey bootstrap method consists randomly generating set replicate weights multivariate distribution whose expectation \\(n\\)-vector \\(\\mathbf{1}_n\\) whose variance-covariance matrix matrix quadratic form used textbook variance estimator. ensures , expectation, bootstrap variance estimator total equals textbook variance estimator thus inherits properties design-unbiasedness design-consistency.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"details-and-notation-for-the-generalized-survey-bootstrap-method","dir":"Articles","previous_headings":"Generalized Survey Bootstrap","what":"Details and Notation for the Generalized Survey Bootstrap Method","title":"Bootstrap Methods for Surveys","text":"section, describe generalized survey bootstrap greater detail, using notation Beaumont Patak (2012).","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"quadratic-forms","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Details and Notation for the Generalized Survey Bootstrap Method","what":"Quadratic Forms","title":"Bootstrap Methods for Surveys","text":"Let \\(v( \\hat{T_y})\\) textbook variance estimator estimated population total \\(\\hat{T}_y\\) variable \\(y\\). base weight case \\(\\) sample \\(w_i\\), let \\(\\breve{y}_i\\) denote weighted value \\(w_iy_i\\). Suppose can represent textbook variance estimator quadratic form: \\(v(\\hat{T}_y) = \\breve{y}\\Sigma\\breve{y}^T\\), \\(n \\times n\\) matrix \\(\\Sigma\\). constraint \\(\\Sigma\\) , sample, must symmetric positive semi-definite (words, never lead negative variance estimate, matter value \\(\\breve{y}\\) ). example, popular Horvitz-Thompson estimator based first-order inclusion probabilities \\(\\pi_k\\) second-order inclusion probabilities \\(\\pi_{kl}\\) can represented positive semi-definite matrix entries \\((1-\\pi_k)\\) along main diagonal entries \\((1 - \\frac{\\pi_k \\pi_l}{\\pi_{kl}})\\) everywhere else. illustration sample \\(n=3\\) shown : \\[ \\Sigma_{HT} = \\begin{bmatrix} (1-\\pi_1) & (1 - \\frac{\\pi_1 \\pi_2}{\\pi_{12}}) & (1 - \\frac{\\pi_1 \\pi_3}{\\pi_{13}}) \\\\ (1 - \\frac{\\pi_2 \\pi_1}{\\pi_{21}}) & (1 - \\pi_2) & (1 - \\frac{\\pi_2 \\pi_3}{\\pi_{23}}) \\\\ (1 - \\frac{\\pi_3 \\pi_1}{\\pi_{31}}) & (1 - \\frac{\\pi_3 \\pi_2}{\\pi_{32}}) & (1 - \\pi_3) \\end{bmatrix} \\] another example, successive-difference variance estimator systematic sample can represented positive semi-definite matrix whose diagonal entries \\(1\\), whose superdiagonal subdiagonal entries \\(-1/2\\), whose top right bottom left entries \\(-1/2\\) (Ash 2014). illustration sample \\(n=5\\) shown : \\[ \\Sigma_{SD2} = \\begin{bmatrix} 1 & -1/2 & 0 & 0 & -1/2\\\\ -1/2 & 1 & -1/2 & 0 & 0 \\\\ 0 & -1/2 & 1 & -1/2 & 0 \\\\ 0 & 0 & -1/2 & 1& -1/2 \\\\ -1/2 & 0 & 0 & -1/2 & 1 \\end{bmatrix} \\] obtain quadratic form matrix variance estimator, can use function make_quad_form_matrix(), takes inputs name variance estimator relevant survey design information. example, following code produces quadratic form matrix “SD2” variance estimator saw earlier. following example, use method estimate variance stratified systematic sample U.S. public libraries. First, create quadratic form matrix represent SD2 successive-difference estimator. can done using svydesign() function describe survey design using get_design_quad_form() obtain quadratic form specified variance estimator. Next, estimate sampling variance estimated total TOTCIR variable using quadratic form.","code":"make_quad_form_matrix( variance_estimator = \"SD2\", cluster_ids = c(1,2,3,4,5) |> data.frame(), strata_ids = c(1,1,1,1,1) |> data.frame(), sort_order = c(1,2,3,4,5) ) #> 5 x 5 sparse Matrix of class \"dsCMatrix\" #> #> [1,] 1.0 -0.5 . . -0.5 #> [2,] -0.5 1.0 -0.5 . . #> [3,] . -0.5 1.0 -0.5 . #> [4,] . . -0.5 1.0 -0.5 #> [5,] -0.5 . . -0.5 1.0 # Load an example dataset of a stratified systematic sample data('library_stsys_sample', package = 'svrep') # First, sort the rows in the order used in sampling library_stsys_sample <- library_stsys_sample |> dplyr::arrange(SAMPLING_SORT_ORDER) # Create a survey design object survey_design <- svydesign( data = library_stsys_sample, ids = ~ 1, strata = ~ SAMPLING_STRATUM, fpc = ~ STRATUM_POP_SIZE ) # Obtain the quadratic form for the target estimator sd2_quad_form <- get_design_quad_form( design = survey_design, variance_estimator = \"SD2\" ) #> For `variance_estimator='SD2', assumes rows of data are sorted in the same order used in sampling. class(sd2_quad_form) #> [1] \"dsCMatrix\" #> attr(,\"package\") #> [1] \"Matrix\" dim(sd2_quad_form) #> [1] 219 219 # Obtain weighted values wtd_y <- as.matrix(library_stsys_sample[['LIBRARIA']] / library_stsys_sample[['SAMPLING_PROB']]) wtd_y[is.na(wtd_y)] <- 0 # Obtain point estimate for a population total point_estimate <- sum(wtd_y) # Obtain the variance estimate using the quadratic form variance_estimate <- t(wtd_y) %*% sd2_quad_form %*% wtd_y std_error <- sqrt(variance_estimate[1,1]) # Summarize results sprintf(\"Estimate: %s\", round(point_estimate)) #> [1] \"Estimate: 65642\" sprintf(\"Standard Error: %s\", round(std_error)) #> [1] \"Standard Error: 13972\""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"forming-adjustment-factors","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Details and Notation for the Generalized Survey Bootstrap Method","what":"Forming Adjustment Factors","title":"Bootstrap Methods for Surveys","text":"goal form \\(B\\) sets bootstrap weights, \\(b\\)-th set bootstrap weights vector length \\(n\\) denoted \\(\\mathbf{}^{(b)}\\), whose \\(k\\)-th value denoted \\(a_k^{(b)}\\). gives us \\(B\\) replicate estimates population total, \\(\\hat{T}_y^{*(b)}=\\sum_{k \\s} a_k^{(b)} \\breve{y}_k\\), \\(b=1, \\ldots B\\), can easily calculate estimate sampling variance. \\[ v_B\\left(\\hat{T}_y\\right)=\\frac{\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2}{B} \\] can write bootstrap variance estimator quadratic form: \\[ \\begin{aligned} v_B\\left(\\hat{T}_y\\right) &=\\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}} \\\\ \\textit{}& \\\\ \\boldsymbol{\\Sigma}_B &= \\frac{\\sum_{b=1}^B\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)^{\\prime}}{B} \\end{aligned} \\] Note vector adjustment factors \\(\\mathbf{}^{(b)}\\) expectation \\(\\mathbf{1}_n\\) variance-covariance matrix \\(\\boldsymbol{\\Sigma}\\), bootstrap expectation \\(E_{*}\\left( \\boldsymbol{\\Sigma}_B \\right) = \\boldsymbol{\\Sigma}\\). Since bootstrap process takes sample values \\(\\breve{y}\\) fixed, bootstrap expectation variance estimator \\(E_{*} \\left( \\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}}\\right)= \\mathbf{\\breve{y}}^{\\prime}\\Sigma \\mathbf{\\breve{y}}\\). Thus, can produce bootstrap variance estimator expectation textbook variance estimator simply randomly generating \\(\\mathbf{}^{(b)}\\) distribution following two conditions: Condition 1: \\(\\quad \\mathbf{E}_*(\\mathbf{})=\\mathbf{1}_n\\) Condition 2: \\(\\quad \\mathbf{E}_*\\left(\\mathbf{}-\\mathbf{1}_n\\right)\\left(\\mathbf{}-\\mathbf{1}_n\\right)^{\\prime}=\\mathbf{\\Sigma}\\) simplest, general way generate adjustment factors simulate multivariate normal distribution \\(\\mathbf{} \\sim MVN(\\mathbf{1}_n, \\boldsymbol{\\Sigma})\\), method used package. However, method can lead negative adjustment factors hence negative bootstrap weights, –perfectly valid variance estimation–may undesirable practical point view. Thus, following subsection, describe one method adjusting replicate factors nonnegative \\(E_{*} \\left( \\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}}\\right) =\\mathbf{\\breve{y}}^{\\prime}\\Sigma \\mathbf{\\breve{y}}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"adjusting-generalized-survey-bootstrap-replicates-to-avoid-negative-weights","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Details and Notation for the Generalized Survey Bootstrap Method","what":"Adjusting Generalized Survey Bootstrap Replicates to Avoid Negative Weights","title":"Bootstrap Methods for Surveys","text":"Let \\(\\mathbf{} = \\left[ \\mathbf{}^{(1)} \\cdots \\mathbf{}^{(b)} \\cdots \\mathbf{}^{(B)} \\right]\\) denote \\((n \\times B)\\) matrix bootstrap adjustment factors. eliminate negative adjustment factors, Beaumont Patak (2012) propose forming rescaled matrix nonnegative replicate factors \\(\\mathbf{}^S\\) rescaling adjustment factor \\(a_k^{(b)}\\) follows: \\[ \\begin{aligned} a_k^{S,(b)} &= \\frac{a_k^{(b)} + \\tau - 1}{\\tau} \\\\ \\textit{} \\tau &\\geq 1 - a_k^{(b)} \\geq 1 \\\\ &\\textit{}k \\textit{ } \\left\\{ 1,\\ldots,n \\right\\} \\\\ &\\textit{}b \\textit{ } \\left\\{1, \\ldots, B\\right\\} \\\\ \\end{aligned} \\] value \\(\\tau\\) can set based realized adjustment factor matrix \\(\\mathbf{}\\) choosing \\(\\tau\\) prior generating adjustment factor matrix \\(\\mathbf{}\\) \\(\\tau\\) likely large enough prevent negative bootstrap weights. adjustment factors rescaled manner, important adjust scale factor used estimating variance bootstrap replicates, becomes \\(\\frac{\\tau^2}{B}\\) instead \\(\\frac{1}{B}\\). \\[ \\begin{aligned} \\textbf{Prior rescaling: } v_B\\left(\\hat{T}_y\\right) &= \\frac{1}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2 \\\\ \\textbf{rescaling: } v_B\\left(\\hat{T}_y\\right) &= \\frac{\\tau^2}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{S*(b)}-\\hat{T}_y\\right)^2 \\\\ \\end{aligned} \\] sharing dataset uses rescaled weights generalized survey bootstrap, documentation dataset instruct user use replication scale factor \\(\\frac{\\tau^2}{B}\\) rather \\(\\frac{1}{B}\\) estimating sampling variances.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"implementation-1","dir":"Articles","previous_headings":"Generalized Survey Bootstrap","what":"Implementation","title":"Bootstrap Methods for Surveys","text":"two ways implement generalized survey bootstrap.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"option-1-convert-an-existing-design-to-a-generalized-bootstrap-design","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Implementation","what":"Option 1: Convert an existing design to a generalized bootstrap design","title":"Bootstrap Methods for Surveys","text":"simplest method convert existing survey design object generalized bootstrap design. approach, create survey design object using svydesign() function. allows us represent information stratification clustering (potentially multiple stages), well information finite population corrections. Next, convert survey design object replicate design using function as_gen_boot_design(). function argument variance_estimator allows us specify name variance estimator use basis creating replicate weights. PPS design uses Horvitz-Thompson Yates-Grundy estimator, can create generalized bootstrap estimator expectation. example , create PPS design ‘survey’ package convert generalied bootstrap design. can also use generalized bootstrap designs use multistage, stratified simple random sampling without replacement. Unless specified otherwise, as_gen_boot_design() automatically selects rescaling value \\(\\tau\\) use eliminating negative adjustment factors. scale attribute resulting replicate survey design object thus set equal \\(\\tau^2/B\\). specific value \\(\\tau\\) can retrieved replicate design object, follows.","code":"# Load example data from stratified systematic sample data('library_stsys_sample', package = 'svrep') # First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] # Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) # Convert to generalized bootstrap replicate design gen_boot_design_sd2 <- as_gen_boot_design( design = design_obj, variance_estimator = \"SD2\", replicates = 2000 ) #> For `variance_estimator='SD2', assumes rows of data are sorted in the same order used in sampling. # Estimate sampling variances svymean(x = ~ TOTSTAFF, na.rm = TRUE, design = gen_boot_design_sd2) #> mean SE #> TOTSTAFF 19.756 4.238 # Load example data of a PPS survey of counties and states data('election', package = 'survey') # Create survey design object pps_design_ht <- svydesign( data = election_pps, id = ~1, fpc = ~p, pps = ppsmat(election_jointprob), variance = \"HT\" ) # Convert to generalized bootstrap replicate design gen_boot_design_ht <- pps_design_ht |> as_gen_boot_design(variance_estimator = \"Horvitz-Thompson\", replicates = 5000, tau = \"auto\") # Compare sampling variances from bootstrap vs. Horvitz-Thompson estimator svytotal(x = ~ Bush + Kerry, design = pps_design_ht) svytotal(x = ~ Bush + Kerry, design = gen_boot_design_ht) library(dplyr) # For data manipulation # Create a multistage survey design multistage_design <- svydesign( data = library_multistage_sample |> mutate(Weight = 1/SAMPLING_PROB), ids = ~ PSU_ID + SSU_ID, fpc = ~ PSU_POP_SIZE + SSU_POP_SIZE, weights = ~ Weight ) # Convert to a generalized bootstrap design multistage_boot_design <- as_gen_boot_design( design = multistage_design, variance_estimator = \"Stratified Multistage SRS\" ) # Compare variance estimates svytotal(x = ~ TOTCIR, na.rm = TRUE, design = multistage_design) #> total SE #> TOTCIR 1634739229 251589313 svytotal(x = ~ TOTCIR, na.rm = TRUE, design = multistage_boot_design) #> total SE #> TOTCIR 1634739229 250754550 # View overall scale factor overall_scale_factor <- multistage_boot_design$scale print(overall_scale_factor) #> [1] 0.0458882 # Check that the scale factor was calculated correctly tau <- multistage_boot_design$tau print(tau) #> [1] 4.79 B <- ncol(multistage_boot_design$repweights) print(B) #> [1] 500 print( (tau^2) / B ) #> [1] 0.0458882"},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"option-2-create-the-quadratic-form-matrix-and-then-use-it-to-create-bootstrap-weights","dir":"Articles","previous_headings":"Generalized Survey Bootstrap > Implementation","what":"Option 2: Create the quadratic form matrix and then use it to create bootstrap weights","title":"Bootstrap Methods for Surveys","text":"generalized survey bootstrap can implemented two-step process: Step 1: Use make_quad_form_matrix() represent variance estimator quadratic form’s matrix. Step 2: Use make_gen_boot_factors() generate replicate factors based target quadratic form. function argument tau can used avoid negative adjustment factors using previously-described method. actual value tau used can extracted function’s output using attr() function. convenience, values use scale rscales arguments svrepdesign() included attributes adjustment factors created make_gen_boot_factors(). Using adjustment factors thus created, can create replicate survey design object using function svrepdesign(), arguments type = \"\" specifying scale argument use factor \\(\\tau^2/B\\). allows us estimate sampling variances, even quite complex sampling designs.","code":"# Load an example dataset of a stratified systematic sample data('library_stsys_sample', package = 'svrep') # Represent the SD2 successive-difference estimator as a quadratic form, # and obtain the matrix of that quadratic form sd2_quad_form <- make_quad_form_matrix( variance_estimator = 'SD2', cluster_ids = library_stsys_sample |> select(FSCSKEY), strata_ids = library_stsys_sample |> select(SAMPLING_STRATUM), strata_pop_sizes = library_stsys_sample |> select(STRATUM_POP_SIZE), sort_order = library_stsys_sample |> pull(\"SAMPLING_SORT_ORDER\") ) rep_adj_factors <- make_gen_boot_factors( Sigma = sd2_quad_form, num_replicates = 500, tau = \"auto\" ) tau <- attr(rep_adj_factors, 'tau') B <- ncol(rep_adj_factors) # Retrieve value of 'scale' rep_adj_factors |> attr('scale') #> [1] 0.041405 # Compare to manually-calculated value (tau^2) / B #> [1] 0.041405 # Retrieve value of 'rscales' rep_adj_factors |> attr('rscales') |> head() # Only show first 5 values #> [1] 1 1 1 1 1 1 gen_boot_design <- svrepdesign( data = library_stsys_sample |> mutate(SAMPLING_WEIGHT = 1/SAMPLING_PROB), repweights = rep_adj_factors, weights = ~ SAMPLING_WEIGHT, combined.weights = FALSE, type = \"other\", scale = attr(rep_adj_factors, 'scale'), rscales = attr(rep_adj_factors, 'rscales') ) gen_boot_design |> svymean(x = ~ TOTSTAFF, na.rm = TRUE, deff = TRUE) #> mean SE DEff #> TOTSTAFF 19.756 4.149 0.9455"},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"choosing-the-number-of-bootstrap-replicates","dir":"Articles","previous_headings":"","what":"Choosing the Number of Bootstrap Replicates","title":"Bootstrap Methods for Surveys","text":"bootstrap suffers unavoidable “simulation error” (also referred “Monte Carlo” error) caused using finite number replicates simulate ideal bootstrap estimate obtain used infinite number replicates. general, simulation error can reduced using larger number bootstrap replicates.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"general-strategy","dir":"Articles","previous_headings":"Choosing the Number of Bootstrap Replicates","what":"General Strategy","title":"Bootstrap Methods for Surveys","text":"many rule--thumb values number replicates used (say 500, others say 1,000), advisable instead use principled strategy choosing number replicates. One general strategy proposed Beaumont Patak (2012) follows: Step 1: Determine largest acceptable level simulation error key survey estimates. example, one might determine , average, bootstrap standard error estimate \\(\\pm 5\\%\\) different ideal bootstrap estimate. Step 2: Estimate key statistics interest using large number bootstrap replicates (5,000) save estimates bootstrap replicate. can conveniently done using function ‘survey’ package svymean(..., return.replicates = TRUE) withReplicates(..., return.replicates = TRUE). Step 3: Estimate minimum number bootstrap replicates needed reduce level simulation error target level. can done using ‘svrep’ function estimate_boot_reps_for_target_cv().","code":""},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"measuring-and-estimating-simulation-error","dir":"Articles","previous_headings":"Choosing the Number of Bootstrap Replicates","what":"Measuring and Estimating Simulation Error","title":"Bootstrap Methods for Surveys","text":"Simulation error can measured “simulation coefficient variation” (CV), ratio standard error bootstrap estimator expectation bootstrap estimator, expectation standard error evaluated respect bootstrapping process given selected sample. statistic \\(\\hat{\\theta}\\), simulation CV bootstrap variance estimator \\(v_{B}(\\hat{\\theta})\\) based \\(B\\) replicate estimates \\(\\hat{\\theta}^{\\star}_1,\\dots,\\hat{\\theta}^{\\star}_B\\) defined follows: simulation CV statistic, denoted \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\), can estimated given number replicates \\(B\\) estimating \\(CV_{\\star}(E_2)\\) using observed values dividing \\(\\sqrt{B}\\). result, one can thereby estimate number bootstrap replicates needed obtain target simulation CV, useful strategy determining number bootstrap replicates use survey. ‘svrep’ package, possible estimate number bootstrap replicates required obtain target simulation CV statistic. estimate simulation CV current number replicates used, possible use function estimate_boot_sim_cv().","code":"library(survey) data('api', package = 'survey') # Declare a bootstrap survey design object ---- boot_design <- svydesign( data = apistrat, weights = ~pw, id = ~1, strata = ~stype, fpc = ~fpc ) |> svrep::as_bootstrap_design(replicates = 5000) # Produce estimates of interest, and save the estimate from each replicate ---- estimated_means_and_proportions <- svymean(x = ~ api00 + api99 + stype, design = boot_design, return.replicates = TRUE) # Estimate the number of replicates needed to obtain a target simulation CV ---- estimate_boot_reps_for_target_cv( svrepstat = estimated_means_and_proportions, target_cv = c(0.01, 0.05, 0.10) ) #> TARGET_CV MAX_REPS api00 api99 stypeE stypeH stypeM #> 1 0.01 15068 6651 6650 15068 15068 15068 #> 2 0.05 603 267 266 603 603 603 #> 3 0.10 151 67 67 151 151 151 estimate_boot_sim_cv(estimated_means_and_proportions) #> STATISTIC SIMULATION_CV N_REPLICATES #> 1 api00 0.01153261 5000 #> 2 api99 0.01153177 5000 #> 3 stypeE 0.01735956 5000 #> 4 stypeH 0.01735950 5000 #> 5 stypeM 0.01735951 5000"},{"path":"https://bschneidr.github.io/svrep/articles/bootstrap-replicates.html","id":"the-bootstrap-vs--other-replication-methods","dir":"Articles","previous_headings":"","what":"The Bootstrap vs. Other Replication Methods","title":"Bootstrap Methods for Surveys","text":"Far better approximate answer right question, often vague, exact answer wrong question, can always made precise. -- John Tukey Ok, approximate answer , like, really approximate requires whole lot computing? -- Survey sampling statisticians Survey bootstrap methods directly applicable wider variety sample designs jackknife balanced repeated replication (BRR). Nonetheless, complex survey designs often shoehorned jackknife BRR variance estimation pretending actual survey design something simpler. BRR method, instance, applicable samples two clusters sampled stratum, statisticians frequently use designs three sampled clusters grouping actual clusters two pseudo-clusters. designs large number sampling units stratum, exact jackknife (JK1 JKn) requires large number replicates often replaced “delete--group jackknife” (DAGJK) clusters randomly grouped larger pseudo-clusters. statisticians go effort shoehorn variance estimation problem jackknife BRR methods just use bootstrap? simple answer bootstrap methods generally require many replicates methods order obtain stable variance estimate. using large number replicates can problem large amount computing dataset large ’re concerned storage costs. Statistical agencies particularly sensitive concerns publish microdata, since agencies often serve large number end-users varying computational resources. use bootstrap? bootstrap tends works well larger class statistics jackknife. example, estimating sampling variance estimated median quantiles, jackknife tends perform poorly bootstrap methods least adequate job. Bootstrap methods enable different options forming confidence intervals. standard replication methods (BRR, Jackknife, etc.), confidence intervals generally formed using Wald interval (\\(\\hat{\\theta} \\pm \\hat{se}(\\hat{\\theta}) \\times z_{1-\\frac{\\alpha}{2}}\\)).2 certain bootstrap methods, possible also form confidence intervals using approaches, bootstrap percentile method. can analyze design rather approximation design, can reduce costs better control errors. use BRR general survey designs, approximate actual survey design “two PSUs per stratum” design. works surprisingly well many cases, requires careful work part specially-trained statistician. jackknife large number sampling units, either end number replicates bootstrap method randomly group sampling units smaller number can use DAGJK method essentially approximate actual survey design simpler one. , takes careful work part specially-trained statistician. analyzing design , don’t pay specially-trained statistician meticulously approximate design can shoehorned jackknife BRR variance estimation problem, perhaps best use limited budget. variance estimation based bootstrap method tailored actual survey design, replication error variance estimates key statistics unbiased can quantified controlled function number replicates. contrast, variance estimation based approximating design can shoehorned jackknife BRR variance estimation problem, replication error variance estimates difficult quantify can consist noise bias. statisticians, ’s probably easier learn. bootstrap well-known replication method among general statisticians, point ’s often taught first-year undergraduate statistics courses. basic idea already familiar even statisticians passing familiarity complex survey sampling. BRR, contrast, takes specialized training learn entails pre-requisite concepts Hadamard matrices, partial balancing, . Outside survey statistics, jackknife tends much less used (taught) compared bootstrap, due limitations non-smooth statistics complexity required make work efficiently large sample sizes.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"creating-initial-replicate-weights","dir":"Articles","previous_headings":"","what":"Creating initial replicate weights","title":"Nonresponse Adjustments","text":"begin , ’ll create bootstrap replicate weights. cases, can simply describing survey design using svydesign() function using function create appropriate replicate weights. function .svrepdesign() ‘survey’ package can used create several types replicate weights, using argument type (options 'JK1', 'JKn', 'bootstrap', 'BRR', 'Fay', etc.) addition, function as_bootstrap_design() can used create bootstrap weights using additional methods supported ‘survey’ package. convenience, ’ll convert survey design object object class tbl_svy, allows us use convenient tidyverse/dplyr syntax (group_by(), summarize(), etc.) well helpful functions srvyr package.","code":"# Describe the survey design lou_vax_survey <- svydesign(ids = ~ 1, weights = ~ SAMPLING_WEIGHT, data = lou_vax_survey) print(lou_vax_survey) #> Independent Sampling design (with replacement) #> svydesign(ids = ~1, weights = ~SAMPLING_WEIGHT, data = lou_vax_survey) # Create appropriate replicate weights lou_vax_survey <- lou_vax_survey |> as_bootstrap_design(replicates = 100, mse = TRUE, type = \"Rao-Wu-Yue-Beaumont\") print(lou_vax_survey) #> Call: as_bootstrap_design(lou_vax_survey, replicates = 100, mse = TRUE, #> type = \"Rao-Wu-Yue-Beaumont\") #> Survey bootstrap with 100 replicates and MSE variances. lou_vax_survey <- lou_vax_survey |> as_survey() print(lou_vax_survey) #> Call: Called via srvyr #> Survey bootstrap with 100 replicates and MSE variances. #> Data variables: RESPONSE_STATUS (chr), RACE_ETHNICITY (chr), SEX (chr), #> EDUC_ATTAINMENT (chr), VAX_STATUS (chr), SAMPLING_WEIGHT (dbl)"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"redistributing-weight-from-nonrespondents-to-respondents","dir":"Articles","previous_headings":"","what":"Redistributing weight from nonrespondents to respondents","title":"Nonresponse Adjustments","text":"common form nonresponse adjustment simply ‘redistribute’ weight nonrespondents respondents. words, weight nonrespondent set \\(0\\), weight respondent increased factor greater one sum adjusted weights sample respondents equals sum unadjusted weights full sample. example, sum weights among respondents \\(299,544.4\\) sum weights among nonrespondents \\(297,157.6\\), basic nonresponse adjustment set weights among nonrespondents \\(0\\) multiply weight respondent adjustment factor equal \\(1 + (297,157.6/299,544.4)\\). type adjustment succinctly described mathematical notation . ’ll illustrate type adjustment Louisville vaccination survey. First, ’ll inspect sum sampling weights respondents, nonrespondents, overall sample. Next, ’ll redistribute weight nonrespondents respondents using redistribute_weights() function, adjusts full-sample weights well set replicate weights. specify subset data weights reduced, supply logical expression argument reduce_if. specify subset data weights increased, supply logical expression argument increase_if. making adjustment, can check weight nonrespondents redistributed respondents.","code":"# Weights before adjustment lou_vax_survey |> group_by(RESPONSE_STATUS) |> cascade( `Sum of Weights` = sum(cur_svy_wts()), .fill = \"TOTAL\" ) #> # A tibble: 3 × 2 #> RESPONSE_STATUS `Sum of Weights` #> #> 1 Nonrespondent 297158. #> 2 Respondent 299544. #> 3 TOTAL 596702 # Conduct a basic nonresponse adjustment nr_adjusted_survey <- lou_vax_survey |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\" ) # Check the sum of full-sample weights by response status nr_adjusted_survey |> group_by(RESPONSE_STATUS) |> cascade( `Sum of Weights` = sum(cur_svy_wts()), .fill = \"TOTAL\" ) #> # A tibble: 3 × 2 #> RESPONSE_STATUS `Sum of Weights` #> #> 1 Nonrespondent 0 #> 2 Respondent 596702 #> 3 TOTAL 596702 # Check sums of replicate weights by response status nr_adjusted_survey |> summarize_rep_weights( type = \"specific\", by = \"RESPONSE_STATUS\" ) |> arrange(Rep_Column, RESPONSE_STATUS) |> head(10) #> RESPONSE_STATUS Rep_Column N N_NONZERO SUM MEAN CV MIN #> 1 Nonrespondent 1 498 0 0 0.000 NaN 0 #> 2 Respondent 1 502 318 596702 1188.649 0.9910419 0 #> 3 Nonrespondent 2 498 0 0 0.000 NaN 0 #> 4 Respondent 2 502 314 596702 1188.649 1.0464442 0 #> 5 Nonrespondent 3 498 0 0 0.000 NaN 0 #> 6 Respondent 3 502 314 596702 1188.649 1.0122793 0 #> 7 Nonrespondent 4 498 0 0 0.000 NaN 0 #> 8 Respondent 4 502 321 596702 1188.649 1.0189112 0 #> 9 Nonrespondent 5 498 0 0 0.000 NaN 0 #> 10 Respondent 5 502 325 596702 1188.649 0.9681041 0 #> MAX #> 1 0.000 #> 2 5850.020 #> 3 0.000 #> 4 8001.751 #> 5 0.000 #> 6 5850.020 #> 7 0.000 #> 8 5967.020 #> 9 0.000 #> 10 5884.635"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"conducting-weighting-class-adjustments","dir":"Articles","previous_headings":"","what":"Conducting weighting class adjustments","title":"Nonresponse Adjustments","text":"Nonresponse bias liable occur different subpopulations systematically differ terms response rates survey also differ terms survey trying measure (case, vaccination status). example, can see fairly large differences response rates across different race/ethnicity groups. Weighting adjustments may able help reduce nonresponse bias caused differences response rates. One standard form adjustment known weighting class adjustment redistribute weights nonrespondents respondents separately different categories auxiliary variables (race/ethnicity). survey textbook Heeringa, West, Berglund (2017) provides excellent overview weighting class adjustments. implement weighting class adjustment svrep package, can simply use argument redistribute_weights(). Multiple grouping variables may supplied argument. example, one can specify = c(\"STRATUM\", \"RACE_ETHNICITY\") redistribute weights separately combinations stratum race/ethnicity category.","code":"lou_vax_survey |> group_by(RACE_ETHNICITY) |> summarize(Response_Rate = mean(RESPONSE_STATUS == \"Respondent\"), Sample_Size = n(), n_Respondents = sum(RESPONSE_STATUS == \"Respondent\")) #> # A tibble: 4 × 4 #> RACE_ETHNICITY Response_Rate Sample_Size n_Respondents #> #> 1 Black or African American alone, not … 0.452 188 85 #> 2 Hispanic or Latino 0.378 45 17 #> 3 Other Race, not Hispanic or Latino 0.492 59 29 #> 4 White alone, not Hispanic or Latino 0.524 708 371 nr_adjusted_survey <- lou_vax_survey |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\", by = c(\"RACE_ETHNICITY\") )"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"propensity-cell-adjustment","dir":"Articles","previous_headings":"Conducting weighting class adjustments","what":"Propensity cell adjustment","title":"Nonresponse Adjustments","text":"popular method forming weighting classes based estimated response propensities (known propensity cell adjustment) can also used, example adding variable PROPENSITY_CELL data using redistribute_weights(..., = \"PROPENSITY_CELL\").","code":"# Fit a response propensity model response_propensity_model <- lou_vax_survey |> mutate(IS_RESPONDENT = ifelse(RESPONSE_STATUS == \"Respondent\", 1, 0)) |> svyglm(formula = IS_RESPONDENT ~ RACE_ETHNICITY + EDUC_ATTAINMENT, family = quasibinomial(link = 'logit')) # Predict response propensities for individual cases lou_vax_survey <- lou_vax_survey |> mutate( RESPONSE_PROPENSITY = predict(response_propensity_model, newdata = cur_svy(), type = \"response\") ) # Divide sample into propensity classes lou_vax_survey <- lou_vax_survey |> mutate(PROPENSITY_CELL = ntile(x = RESPONSE_PROPENSITY, n = 5)) lou_vax_survey |> group_by(PROPENSITY_CELL) |> summarize(n = n(), min = min(RESPONSE_PROPENSITY), mean = mean(RESPONSE_PROPENSITY), max = max(RESPONSE_PROPENSITY)) #> # A tibble: 5 × 5 #> PROPENSITY_CELL n min mean max #> #> 1 1 200 0.357 0.424 0.459 #> 2 2 200 0.459 0.484 0.488 #> 3 3 200 0.488 0.488 0.512 #> 4 4 200 0.512 0.551 0.564 #> 5 5 200 0.564 0.564 0.564 # Redistribute weights by propensity class nr_adjusted_survey <- lou_vax_survey |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\", by = \"PROPENSITY_CELL\" ) # Inspect weights before adjustment lou_vax_survey |> summarize_rep_weights(type = \"specific\", by = c(\"PROPENSITY_CELL\")) |> arrange(Rep_Column, PROPENSITY_CELL) |> select(PROPENSITY_CELL, Rep_Column, N_NONZERO, SUM) |> head(10) #> PROPENSITY_CELL Rep_Column N_NONZERO SUM #> 1 1 1 127 116473.4 #> 2 2 1 122 121251.8 #> 3 3 1 133 122446.4 #> 4 4 1 128 117668.0 #> 5 5 1 128 118862.6 #> 6 1 2 122 123043.7 #> 7 2 2 121 109305.8 #> 8 3 2 125 131405.8 #> 9 4 2 124 120654.5 #> 10 5 2 125 112292.3 # Inspect weights after adjustment nr_adjusted_survey |> summarize_rep_weights(type = \"specific\", by = c(\"PROPENSITY_CELL\", \"RESPONSE_STATUS\")) |> arrange(Rep_Column, PROPENSITY_CELL, RESPONSE_STATUS) |> select(PROPENSITY_CELL, RESPONSE_STATUS, Rep_Column, N_NONZERO, SUM) |> head(10) #> PROPENSITY_CELL RESPONSE_STATUS Rep_Column N_NONZERO SUM #> 1 1 Nonrespondent 1 0 0.0 #> 2 1 Respondent 1 57 116473.4 #> 3 2 Nonrespondent 1 0 0.0 #> 4 2 Respondent 1 56 121251.8 #> 5 3 Nonrespondent 1 0 0.0 #> 6 3 Respondent 1 68 122446.4 #> 7 4 Nonrespondent 1 0 0.0 #> 8 4 Respondent 1 64 117668.0 #> 9 5 Nonrespondent 1 0 0.0 #> 10 5 Respondent 1 73 118862.6"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"saving-the-final-weights-to-a-data-file","dir":"Articles","previous_headings":"","what":"Saving the final weights to a data file","title":"Nonresponse Adjustments","text":"’re satisfied weights, can create data frame analysis variables columns replicate weights. format easy export data files can loaded R software later.","code":"data_frame_with_nr_adjusted_weights <- nr_adjusted_survey |> as_data_frame_with_weights( full_wgt_name = \"NR_ADJ_WGT\", rep_wgt_prefix = \"NR_ADJ_REP_WGT_\" ) # Preview first few column names colnames(data_frame_with_nr_adjusted_weights) |> head(12) #> [1] \"RESPONSE_STATUS\" \"RACE_ETHNICITY\" \"SEX\" #> [4] \"EDUC_ATTAINMENT\" \"VAX_STATUS\" \"SAMPLING_WEIGHT\" #> [7] \"RESPONSE_PROPENSITY\" \"PROPENSITY_CELL\" \"NR_ADJ_WGT\" #> [10] \"NR_ADJ_REP_WGT_1\" \"NR_ADJ_REP_WGT_2\" \"NR_ADJ_REP_WGT_3\" # Write the data to a CSV file write.csv( x = data_frame_with_nr_adjusted_weights, file = \"survey-data-with-nonresponse-adjusted-weights.csv\" )"},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"statistical-background","dir":"Articles","previous_headings":"","what":"Statistical background","title":"Nonresponse Adjustments","text":"motivation making adjustment standard methods statistical inference assume every person population known, nonzero probability participating survey (.e. nonzero chance sampled nonzero chance responding sampled), denoted \\(p_{,overall}\\). Basic results survey sampling theory guarantee assumption true, can produce unbiased estimates population means totals weighting data respondent weight \\(1/{p_{,overall}}\\). Crucially, overall probability participation \\(p_{,overall}\\) product two components: probability person sampled (denoted \\(\\pi_i\\)), probability person respond survey sampled (denoted \\(p_i\\) referred “response propensity”). sampling probability \\(\\pi_i\\) known since can control method sampling, response propensity \\(p_i\\) unknown can estimated. \\[ \\begin{aligned} w^{*}_i &= 1/p_{,overall} \\text{ (weights needed unbiased estimation)} \\\\ p_{,overall} &= \\pi_i \\times p_i \\\\ \\pi_i &= \\textbf{Sampling probability} \\\\ &\\textit{.e. probability case }\\textit{ randomly sampled } \\text{ (}\\textit{Known}\\text{)} \\\\ p_i &= \\textbf{Response propensity} \\\\ &\\textit{.e. probability case }\\textit{ responds, sampled } \\text{ (}\\textit{Unknown}\\text{)} \\\\ \\end{aligned} \\] component \\(p_i\\) must estimated using data (estimate \\(\\hat{p}_i\\)) nonresponse-adjusted weights respondents can formed \\(w_{NR,} = 1/(\\pi_i \\times \\hat{p}_i)\\) used obtain approximately unbiased estimates population means totals. use earlier notation, nonresponse adjustment factor respondents \\(f_{NR,}\\) actually defined using \\(1/\\hat{p}_i\\). \\[ \\begin{aligned} w_i &= \\textit{Original sampling weight case }\\\\ &= 1/\\pi_i, \\textit{ } \\pi_i \\textit{ probability case }\\textit{sampled}\\\\ w_{NR, } &= w_i \\times f_{NR,} = \\textit{Weight case }\\textit{ nonresponse adjustment} \\\\ \\\\ f_{NR,} &= \\begin{cases} 0 & \\text{case } \\text{ nonrespondent} \\\\ 1 / \\hat{p}_i & \\text{case } \\text{ respondent} \\\\ \\end{cases} \\\\ \\hat{p}_i &= \\textbf{Estimated response propensity} \\end{aligned} \\] essence, different methods nonresponse weighting adjustments vary terms estimate \\(\\hat{p}_i\\). basic weight redistribution method effect estimates \\(p_i\\) constant across \\(\\), equal overall weighted response rate, uses form weights. words, basic weight redistribution essentially way forming adjustment factor \\(f_{NR,}\\) based estimated response propensity \\(\\hat{p}_i = \\frac{\\sum_{\\s_{resp}}w_i}{\\sum_{\\s}w_i}\\). Weighting class adjustments propensity cell adjustments essentially refined ways forming \\(f_{NR,}\\) estimating \\(p_i\\) realistic model, \\(p_i\\) constant across entire sample instead varies among weighting classes propensity cells. reason conducting weighting adjustments full-sample weights replicate weights account nonresponse adjustment process estimating sampling variances inferential statistics confidence intervals. random sampling, adjustment factors used nonresponse adjustment vary one sample next, applying weighting adjustments separately replicate reflects variability. ’ve seen vignette, redistribute_weights() function handles us: nonresponse adjustment, weight replicate redistributed manner weight redistributed full-sample weights.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/nonresponse-adjustments.html","id":"recommended-reading","dir":"Articles","previous_headings":"","what":"Recommended Reading","title":"Nonresponse Adjustments","text":"See Chapter 2, Section 2.7.3 “Applied Survey Data Analysis” statistical explanation weighting adjustments described vignette. Heeringa, S., West, B., Berglund, P. (2017). Applied Survey Data Analysis, 2nd edition. Boca Raton, FL: CRC Press. Chapter 13 “Practical Tools Designing Weighting Survey Samples” also provides excellent overview nonresponse adjustment methods. Valliant, R., Dever, J., Kreuter, F. (2018). Practical Tools Designing Weighting Survey Samples, 2nd edition. New York: Springer.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"sample-based-calibration-an-introduction","dir":"Articles","previous_headings":"","what":"Sample-based Calibration: An Introduction","title":"Calibrating to Estimated Control Totals","text":"Calibration weighting adjustments post-stratification raking often helpful reducing sampling variance non-sampling errors nonresponse bias. Typically, benchmark data used calibration adjustments estimates published agencies United States Census Bureau. example, pollsters United States frequently rake polling data estimates variables age educational attainment match benchmark estimates American Community Survey (ACS). benchmark data (also known control totals) raking calibration often treated “true” population values, usually estimates sampling variance margin error. calibrate estimated control totals rather “true” population values, may need account variance estimated control totals ensure calibrated estimates appropriately reflect sampling error primary survey interest survey control totals estimated. especially important control totals large margins error. handful statistical methods developed problem conducting replication variance estimation sample-based calibration; see Opsomer Erciulescu (2021) clear overview literature topic. methods apply calibration weighting adjustment full-sample weights column replicate weights. key “trick” methods adjust column replicate weights slightly different set control totals, varying control totals used across replicates way variation across columns sense proportionate sampling variance control totals. statistical methods differ way generate different control totals column replicate weights type data require analyst use. method Fuller (1998) requires analyst variance-covariance matrix estimated control totals, method Opsomer Erciulescu (2021) requires analyst use full dataset control survey along associated replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"functions-for-implementing-sample-based-calibration","dir":"Articles","previous_headings":"","what":"Functions for Implementing Sample-Based Calibration","title":"Calibrating to Estimated Control Totals","text":"‘svrep’ package provides two functions implement sample-based calibration. function calibrate_to_estimate(), adjustments replicate weights conducted using method Fuller (1998), requiring variance-covariance matrix estimated control totals. function calibrate_to_sample(), adjustments replicate weights conducted using method proposed Opsomer Erciulescu (2021), requiring dataset replicate weights use estimating control totals sampling variance. functions, possible use variety calibration options survey package’s calibrate() function. example, user can specify specific calibration function use, calfun = survey::cal.linear implement post-stratification calfun = survey::cal.raking implement raking. bounds argument can used specify bounds calibration weights, arguments maxit epsilon allow finer control Newton-Raphson algorithm used implement calibration.","code":"calibrate_to_estimate( rep_design = rep_design, estimate = vector_of_control_totals, vcov_estimate = variance_covariance_matrix_for_controls, cal_formula = ~ CALIBRATION_VARIABLE_1 + CALIBRATION_VARIABLE_2 + ..., ) calibrate_to_sample( primary_rep_design = primary_rep_design, control_rep_design = control_rep_design cal_formula = ~ CALIBRATION_VARIABLE_1 + CALIBRATION_VARIABLE_2 + ..., )"},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"an-example-using-a-vaccination-survey","dir":"Articles","previous_headings":"","what":"An Example Using a Vaccination Survey","title":"Calibrating to Estimated Control Totals","text":"illustrate different methods conducting sample-based calibration, ’ll use example survey measuring Covid-19 vaccination status handful demographic variables, based simple random sample 1,000 residents Louisville, Kentucky. purpose variance estimation, ’ll create jackknife replicate weights. survey’s key outcome, vaccination status, measured respondents, ’ll quick nonresponse weighting adjustment help make reasonable estimates outcome. work far given us replicate design primary survey, prepared calibration. Now need obtain benchmark data can use calibration. ’ll use Public-Use Microdata Sample (PUMS) dataset ACS source benchmark data race/ethnicity, sex, educational attainment. Next, ’ll prepare PUMS data use replication variance estimation using provided replicate weights. conduction calibration, make sure data control survey represent population primary survey. Since Louisville vaccination survey represents adults, need subset control survey design adults. addition, need ensure control survey design calibration variables align variables primary survey design interest. may require data manipulation.","code":"# Load the data library(svrep) data(\"lou_vax_survey\") # Inspect the first few rows head(lou_vax_survey) |> knitr::kable() suppressPackageStartupMessages( library(survey) ) lou_vax_survey_rep <- svydesign( data = lou_vax_survey, ids = ~ 1, weights = ~ SAMPLING_WEIGHT ) |> as.svrepdesign(type = \"JK1\", mse = TRUE) #> Call: as.svrepdesign.default(svydesign(data = lou_vax_survey, ids = ~1, #> weights = ~SAMPLING_WEIGHT), type = \"JK1\", mse = TRUE) #> Unstratified cluster jacknife (JK1) with 1000 replicates and MSE variances. # Conduct nonresponse weighting adjustment nr_adjusted_design <- lou_vax_survey_rep |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\" ) |> subset(RESPONSE_STATUS == \"Respondent\") # Inspect the result of the adjustment rbind( 'Original' = summarize_rep_weights(lou_vax_survey_rep, type = 'overall'), 'NR-adjusted' = summarize_rep_weights(nr_adjusted_design, type = 'overall') )[,c(\"nrows\", \"rank\", \"avg_wgt_sum\", \"sd_wgt_sums\")] #> nrows rank avg_wgt_sum sd_wgt_sums #> Original 1000 1000 596702 0.000000e+00 #> NR-adjusted 502 502 596702 8.219437e-11 data(\"lou_pums_microdata\") # Inspect some of the rows/columns of data ---- tail(lou_pums_microdata, n = 5) |> dplyr::select(AGE, SEX, RACE_ETHNICITY, EDUC_ATTAINMENT) |> knitr::kable() # Convert to a survey design object ---- pums_rep_design <- svrepdesign( data = lou_pums_microdata, weights = ~ PWGTP, repweights = \"PWGTP\\\\d{1,2}\", type = \"successive-difference\", variables = ~ AGE + SEX + RACE_ETHNICITY + EDUC_ATTAINMENT, mse = TRUE ) pums_rep_design #> Call: svrepdesign.default(data = lou_pums_microdata, weights = ~PWGTP, #> repweights = \"PWGTP\\\\d{1,2}\", type = \"successive-difference\", #> variables = ~AGE + SEX + RACE_ETHNICITY + EDUC_ATTAINMENT, #> mse = TRUE) #> with 80 replicates and MSE variances. # Subset to only include adults pums_rep_design <- pums_rep_design |> subset(AGE >= 18) suppressPackageStartupMessages( library(dplyr) ) # Check that variables match across data sources ---- pums_rep_design$variables |> dplyr::distinct(RACE_ETHNICITY) #> RACE_ETHNICITY #> 1 Black or African American alone, not Hispanic or Latino #> 2 White alone, not Hispanic or Latino #> 3 Hispanic or Latino #> 4 Other Race, not Hispanic or Latino setdiff(lou_vax_survey_rep$variables$RACE_ETHNICITY, pums_rep_design$variables$RACE_ETHNICITY) #> character(0) setdiff(lou_vax_survey_rep$variables$SEX, pums_rep_design$variables$SEX) #> character(0) setdiff(lou_vax_survey_rep$variables$EDUC_ATTAINMENT, pums_rep_design$variables$EDUC_ATTAINMENT) #> character(0) # Estimates from the control survey (ACS) svymean( design = pums_rep_design, x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) #> mean #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 0.19950 #> RACE_ETHNICITYHispanic or Latino 0.04525 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 0.04631 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 0.70894 #> SEXMale 0.47543 #> SEXFemale 0.52457 #> EDUC_ATTAINMENTHigh school or beyond 0.38736 #> EDUC_ATTAINMENTLess than high school 0.61264 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 0.0010 #> RACE_ETHNICITYHispanic or Latino 0.0002 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 0.0008 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 0.0007 #> SEXMale 0.0007 #> SEXFemale 0.0007 #> EDUC_ATTAINMENTHigh school or beyond 0.0033 #> EDUC_ATTAINMENTLess than high school 0.0033 # Estimates from the primary survey (Louisville vaccination survey) svymean( design = nr_adjusted_design, x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) #> mean #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 0.169323 #> RACE_ETHNICITYHispanic or Latino 0.033865 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 0.057769 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 0.739044 #> SEXFemale 0.535857 #> SEXMale 0.464143 #> EDUC_ATTAINMENTHigh school or beyond 0.458167 #> EDUC_ATTAINMENTLess than high school 0.541833 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 0.0168 #> RACE_ETHNICITYHispanic or Latino 0.0081 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 0.0104 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 0.0196 #> SEXFemale 0.0223 #> SEXMale 0.0223 #> EDUC_ATTAINMENTHigh school or beyond 0.0223 #> EDUC_ATTAINMENTLess than high school 0.0223"},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"raking-to-estimated-control-totals","dir":"Articles","previous_headings":"An Example Using a Vaccination Survey","what":"Raking to estimated control totals","title":"Calibrating to Estimated Control Totals","text":"’ll start raking estimates ACS race/ethnicity, sex, educational attainment, first using calibrate_to_sample() method using calibrate_to_estimate() method. calibrate_to_sample() method, need obtain vector point estimates control totals, accompanying variance-covariance matrix estimates. Crucially, note vector control totals names estimates produced using svytotal() primary survey design object whose weights plan adjust. calibrate design estimates, supply estimates variance-covariance matrix calibrate_to_estimate(), supply cal_formula argument formula use svytotal(). use raking adjustment, specify calfun = survey::cal.raking. Now can compare estimated totals calibration variables actual control totals. might intuitively expect, estimated totals survey now match control totals, standard errors estimated totals match standard errors control totals. can now see effect raking adjustment primary estimate interest, overall Covid-19 vaccination rate. raking adjustment reduced estimate vaccination rate one percentage point results similar standard error estimate. Instead raking using vector control totals variance-covariance matrix, instead done raking simply supplying two replicate design objects function calibrate_to_sample(). uses Opsomer-Erciulescu method adjusting replicate weights, contrast calibrate_to_estimate(), uses Fuller’s method adjusting replicate weights. can see two methods yield identical point estimates full-sample weights, standard errors match nearly exactly calibration variables (race/ethnicity, sex, educational attainment). However, small slightly noticeable differences standard errors variables, VAX_STATUS, resulting fact two methods different methods adjusting replicate weights. Opsomer Erciulescu (2021) explain differences two methods discuss Opsomer-Erciulescu method used calibrate_to_sample() may better statistical properties Fuller method used calibrate_to_estimate().","code":"acs_control_totals <- svytotal( x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, design = pums_rep_design ) control_totals_for_raking <- list( 'estimates' = coef(acs_control_totals), 'variance-covariance' = vcov(acs_control_totals) ) # Inspect point estimates control_totals_for_raking$estimates #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino #> 119041 #> RACE_ETHNICITYHispanic or Latino #> 27001 #> RACE_ETHNICITYOther Race, not Hispanic or Latino #> 27633 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino #> 423027 #> SEXMale #> 283688 #> SEXFemale #> 313014 #> EDUC_ATTAINMENTHigh school or beyond #> 231136 #> EDUC_ATTAINMENTLess than high school #> 365566 # Inspect a few rows of the control totals' variance-covariance matrix control_totals_for_raking$`variance-covariance`[5:8,5:8] |> `colnames<-`(NULL) #> [,1] [,2] [,3] [,4] #> SEXMale 355572.45 -29522.95 129208.95 196840.6 #> SEXFemale -29522.95 379494.65 81455.95 268515.8 #> EDUC_ATTAINMENTHigh school or beyond 129208.95 81455.95 4019242.10 -3808577.2 #> EDUC_ATTAINMENTLess than high school 196840.55 268515.75 -3808577.20 4273933.5 svytotal(x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, design = nr_adjusted_design) #> total #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 101035 #> RACE_ETHNICITYHispanic or Latino 20207 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 34471 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 440989 #> SEXFemale 319747 #> SEXMale 276955 #> EDUC_ATTAINMENTHigh school or beyond 273389 #> EDUC_ATTAINMENTLess than high school 323313 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 10003.0 #> RACE_ETHNICITYHispanic or Latino 4824.4 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 6222.7 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 11713.1 #> SEXFemale 13301.6 #> SEXMale 13301.6 #> EDUC_ATTAINMENTHigh school or beyond 13289.2 #> EDUC_ATTAINMENTLess than high school 13289.2 raked_design <- calibrate_to_estimate( rep_design = nr_adjusted_design, estimate = control_totals_for_raking$estimates, vcov_estimate = control_totals_for_raking$`variance-covariance`, cal_formula = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, calfun = survey::cal.raking, # Required for raking epsilon = 1e-9 ) #> Selection of replicate columns whose control totals will be perturbed will be done at random. #> For tips on reproducible selection, see `help('calibrate_to_estimate')` # Estimated totals after calibration svytotal(x = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, design = raked_design) #> total #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 119041 #> RACE_ETHNICITYHispanic or Latino 27001 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 27633 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 423027 #> SEXFemale 313014 #> SEXMale 283688 #> EDUC_ATTAINMENTHigh school or beyond 231136 #> EDUC_ATTAINMENTLess than high school 365566 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 633.63 #> RACE_ETHNICITYHispanic or Latino 107.98 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 472.41 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 594.14 #> SEXFemale 616.03 #> SEXMale 596.30 #> EDUC_ATTAINMENTHigh school or beyond 2004.80 #> EDUC_ATTAINMENTLess than high school 2067.35 # Matches the control totals! cbind( 'total' = control_totals_for_raking$estimates, 'SE' = control_totals_for_raking$`variance-covariance` |> diag() |> sqrt() ) #> total #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 119041 #> RACE_ETHNICITYHispanic or Latino 27001 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 27633 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 423027 #> SEXMale 283688 #> SEXFemale 313014 #> EDUC_ATTAINMENTHigh school or beyond 231136 #> EDUC_ATTAINMENTLess than high school 365566 #> SE #> RACE_ETHNICITYBlack or African American alone, not Hispanic or Latino 633.6287 #> RACE_ETHNICITYHispanic or Latino 107.9829 #> RACE_ETHNICITYOther Race, not Hispanic or Latino 472.4107 #> RACE_ETHNICITYWhite alone, not Hispanic or Latino 594.1448 #> SEXMale 596.2990 #> SEXFemale 616.0314 #> EDUC_ATTAINMENTHigh school or beyond 2004.8048 #> EDUC_ATTAINMENTLess than high school 2067.3494 estimates_by_design <- svyby_repwts( rep_designs = list( \"NR-adjusted\" = nr_adjusted_design, \"Raked\" = raked_design ), FUN = svytotal, formula = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) t(estimates_by_design[,-1]) |> knitr::kable() raked_design_opsomer_erciulescu <- calibrate_to_sample( primary_rep_design = nr_adjusted_design, control_rep_design = pums_rep_design, cal_formula = ~ RACE_ETHNICITY + SEX + EDUC_ATTAINMENT, calfun = survey::cal.raking, epsilon = 1e-9 ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')` estimates_by_design <- svyby_repwts( rep_designs = list( \"calibrate_to_estimate()\" = raked_design, \"calibrate_to_sample()\" = raked_design_opsomer_erciulescu ), FUN = svytotal, formula = ~ VAX_STATUS + RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) t(estimates_by_design[,-1]) |> knitr::kable()"},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"post-stratification","dir":"Articles","previous_headings":"An Example Using a Vaccination Survey","what":"Post-stratification","title":"Calibrating to Estimated Control Totals","text":"primary difference post-stratification raking post-stratification essentially involves single calibration variable, population benchmarks provided value variable. Louisville vaccination survey, variable called POSTSTRATUM based combinations race/ethnicity, sex, educational attainment. post-stratify design, can either supply estimates variance-covariance matrix calibrate_to_estimate(), can supply two replicate design objects calibrate_to_sample(). either method, need supply cal_formula argument formula use svytotal(). use post-stratification adjustment (rather raking), specify calfun = survey::cal.linear. raking example, can see full-sample post-stratified estimates exactly two methods. standard errors post-stratification variables essentially identical, standard errors variables differ slightly.","code":"# Create matching post-stratification variable in both datasets nr_adjusted_design <- nr_adjusted_design |> transform(POSTSTRATUM = interaction(RACE_ETHNICITY, SEX, EDUC_ATTAINMENT, sep = \"|\")) pums_rep_design <- pums_rep_design |> transform(POSTSTRATUM = interaction(RACE_ETHNICITY, SEX, EDUC_ATTAINMENT, sep = \"|\")) levels(pums_rep_design$variables$POSTSTRATUM) <- levels( nr_adjusted_design$variables$POSTSTRATUM ) # Estimate control totals acs_control_totals <- svytotal( x = ~ POSTSTRATUM, design = pums_rep_design ) poststratification_totals <- list( 'estimate' = coef(acs_control_totals), 'variance-covariance' = vcov(acs_control_totals) ) # Inspect the control totals poststratification_totals$estimate |> as.data.frame() |> `colnames<-`('estimate') |> knitr::kable() # Post-stratify the design using the estimates poststrat_design_fuller <- calibrate_to_estimate( rep_design = nr_adjusted_design, estimate = poststratification_totals$estimate, vcov_estimate = poststratification_totals$`variance-covariance`, cal_formula = ~ POSTSTRATUM, # Specify the post-stratification variable calfun = survey::cal.linear # This option is required for post-stratification ) #> Selection of replicate columns whose control totals will be perturbed will be done at random. #> For tips on reproducible selection, see `help('calibrate_to_estimate')` # Post-stratify the design using the two samples poststrat_design_opsomer_erciulescu <- calibrate_to_sample( primary_rep_design = nr_adjusted_design, control_rep_design = pums_rep_design, cal_formula = ~ POSTSTRATUM, # Specify the post-stratification variable calfun = survey::cal.linear # This option is required for post-stratification ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')` estimates_by_design <- svyby_repwts( rep_designs = list( \"calibrate_to_estimate()\" = poststrat_design_fuller, \"calibrate_to_sample()\" = poststrat_design_opsomer_erciulescu ), FUN = svymean, formula = ~ VAX_STATUS + RACE_ETHNICITY + SEX + EDUC_ATTAINMENT ) t(estimates_by_design[,-1]) |> knitr::kable()"},{"path":"https://bschneidr.github.io/svrep/articles/sample-based-calibration.html","id":"reproducibility","dir":"Articles","previous_headings":"","what":"Reproducibility","title":"Calibrating to Estimated Control Totals","text":"calibration methods calibrate_to_estimate() calibrate_to_sample() involve one element randomization: determining columns replicate weights assigned given perturbation control totals. calibrate_to_sample() method Fuller (1998), control totals vector dimension \\(p\\), \\(p\\) columns replicate weights calibrated \\(p\\) different vectors perturbed control totals, formed using \\(p\\) scaled eigenvectors spectral decomposition control totals’ variance-covariance matrix (sorted order largest smallest eigenvalues). control columns replicate weights calibrated set perturbed control totals, can use function argument col_selection. calibrated survey design object contains element perturbed_control_cols indicates columns calibrated perturbed control totals; can useful save use input col_selection ensure reproducibility. calibrate_to_sample(), matching done columns replicate weights primary survey columns replicate weights control survey. matching done random unless user specifies otherwise using argument control_col_matches. Louisville Vaccination Survey, primary survey 1,000 replicates control survey 80 columns. can match 80 columns 1,000 replicates specifying 1,000 values consisting NA integers 1 80. calibrated survey design object contains element control_column_matches control survey replicate primary survey replicate column matched .","code":"# Randomly select which columns will be assigned to each set of perturbed control totals dimension_of_control_totals <- length(poststratification_totals$estimate) columns_to_perturb <- sample(x = 1:ncol(nr_adjusted_design$repweights), size = dimension_of_control_totals) print(columns_to_perturb) #> [1] 258 355 489 325 764 697 894 903 760 917 768 33 401 465 403 799 # Perform the calibration poststratified_design <- calibrate_to_estimate( rep_design = nr_adjusted_design, estimate = poststratification_totals$estimate, vcov_estimate = poststratification_totals$`variance-covariance`, cal_formula = ~ POSTSTRATUM, calfun = survey::cal.linear, col_selection = columns_to_perturb # Specified for reproducibility ) poststratified_design$perturbed_control_cols #> NULL # Randomly match the primary replicates to control replicates set.seed(1999) column_matching <- rep(NA, times = ncol(nr_adjusted_design$repweights)) column_matching[sample(x = 1:1000, size = 80)] <- 1:80 str(column_matching) #> int [1:1000] NA NA NA 34 NA NA NA 68 NA NA ... # Perform the calibration poststratified_design <- calibrate_to_sample( primary_rep_design = nr_adjusted_design, control_rep_design = pums_rep_design, cal_formula = ~ POSTSTRATUM, calfun = survey::cal.linear, control_col_matches = column_matching ) str(poststratified_design$control_column_matches) #> int [1:1000] NA NA NA 34 NA NA NA 68 NA NA ..."},{"path":[]},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"two-phase-sampling-vs--multistage-sampling","dir":"Articles","previous_headings":"","what":"Two-phase Sampling vs. Multistage Sampling","title":"Replication Methods for Two-phase Sampling","text":"Two-phase sampling (also known “double sampling”) common feature surveys. two-phase sample, large first-phase sample selected, smaller second-phase sample selected first-phase sample. Multistage cluster sampling special case two-phase sampling, second-phase sample secondary sampling units (SSUs) selected first-phase sample primary sampling units (PSUs). specific case multistage sampling, second-phase sampling SSUs must sample least one SSU within PSU must sample independently across PSUs (words, PSU treated stratum second-phase sampling). Two-phase sampling general restrictions: second-phase sample design can arbitrary, primary sampling units might appear second-phase sample.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"applications-of-two-phase-sampling","dir":"Articles","previous_headings":"","what":"Applications of Two-Phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"flexibility two-phase sampling can quite valuable, reason two-phase samples commonly-used practice. highlight two common applications two-phase sampling : given survey conducted using online panel necessarily two-phase sample, panel recruitment represents first phase sampling process requesting panelists participate specific survey represents second phase sampling. Often, recruitment sampling quite complex (e.g., three-stage stratified cluster sampling), sampling panelists given survey conducted using simple random sampling stratified simple random sampling list panelists. Statistical agencies often reduce cost small survey drawing sample respondents larger survey ’s already conducted. example, U.S. Census Bureau conducts National Survey College Graduates (NSCG) sampling households responded American Community Survey (ACS). Similarly, National Study Caregiving (NSOC) conducted sampling respondents National Health Aging Trends Study (NHATS). information first-phase sample useful design analysis second-phase sample. design standpoint, information collected first-phase sample can used stratify units assign unequal sampling probabilities second-phase sampling, can result precise estimates relative using simple random sampling. analysis standpoint, information collected first-phase sample can also used improve estimators, using raking, post-stratification, generalized regression (GREG) calibrate small second-phase sample large first-phase sample.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"replicate-variance-estimation-with-the-svrep-package","dir":"Articles","previous_headings":"","what":"Replicate Variance Estimation with the ‘svrep’ Package","title":"Replication Methods for Two-phase Sampling","text":"vignette, ’ll show use generalized bootstrap estimate sampling variances estimates based two-phase sample designs. types replication jackknife balanced repeated replication (BRR) can theoretically used, ‘svrep’ package implements two-phase replication methods generalized bootstrap Fay’s generalized replication method. theory, replication methods can used two-phase samples, applicability much limited.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"overview-of-the-generalized-bootstrap","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Overview of the Generalized Bootstrap","title":"Replication Methods for Two-phase Sampling","text":"basic idea generalized bootstrap “mimic” target variance estimator population totals, target variance estimator appropriate particular sampling design can written quadratic form. example, generalized bootstrap can mimic Horvitz-Thompson estimator usual variance estimator used simple random sampling. precise, “mimic”, mean generalized bootstrap variance estimate population total average exactly matches variance estimate produced target variance estimator. order mimic target variance estimator, specify target variance estimator population total \\(\\hat{Y}=\\sum_{=1}^{n}(y_i/\\pi_i)\\) quadratic form. , specify variance estimator \\(v(\\hat{Y})\\) \\(v(\\hat{Y})=\\sum_{=1}^{n}\\sum_{=1}^{n} \\sigma_{ij}(w_iy_i)(w_jy_j)\\), set values \\(\\sigma_{ij},,j \\\\{1,\\dots,n\\}\\). matrix notation, write \\(v(\\hat{Y})=\\breve{y}^{\\prime}\\Sigma\\breve{y}\\), \\(\\Sigma\\) symmetric, positive semi-definite matrix dimension \\(n \\times n\\), element \\(ij\\) equal \\(\\sigma_{ij}\\), \\(\\breve{y}\\) vector whose \\(\\)-th element \\(w_iy_i\\). using generalized bootstrap, difficult part variance estimation process simply identifying quadratic form. quadratic form written , easy create replicate weights using generalized bootstrap. Fortunately, ‘svrep’ package can automatically identify appropriate quadratic form use variance estimators many single-phase two-phase sample designs. user simply needs supply necessary data, describe survey design, select target variance estimator use phase sampling. broad overview generalized survey bootstrap use ‘svrep’ package, reader encouraged read ‘svrep’ package vignette titled “Bootstrap Methods Surveys”. thorough overview generalized survey bootstrap theory, Beaumont Patak (2012) provide clear introduction several useful suggestions implementation practice. present vignette simply describes application generalized bootstrap two-phase samples can implemented ‘svrep’ package.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"creating-example-data","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Creating Example Data","title":"Replication Methods for Two-phase Sampling","text":"example , create two-phase survey design: first phase stratified multistage sample, first stage sample PSUs selected using unequal probability sampling without replacement (PPSWOR) second stage sample selected using simple random sampling without replacement (SRSWOR). second phase sample simple random sample without replacement first phase sample. type design fairly typical survey conducted online panel, panel recruitment uses complex design sampling panelists given survey uses simple random sampling panelists. particular dataset ’ll use comes Public Libraries Survey (PLS), annual survey public libraries U.S, data FY2020.","code":"data('library_multistage_sample', package = 'svrep') # Load first-phase sample twophase_sample <- library_multistage_sample # Select second-phase sample set.seed(2020) twophase_sample[['SECOND_PHASE_SELECTION']] <- sampling::srswor( n = 100, N = nrow(twophase_sample) ) |> as.logical()"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"describing-the-two-phase-survey-design","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Describing the Two-phase Survey Design","title":"Replication Methods for Two-phase Sampling","text":"Next, use ‘survey’ package’s function twophase() describe sample design phase, terms stratification, clustering, probabilities, population sizes. Note use list() arguments, first element list describes first phase sampling, second element list describes second phase sampling.","code":"# Declare survey design twophase_design <- twophase( method = \"full\", data = twophase_sample, # Identify the subset of first-phase elements # which were selected into the second-phase sample subset = ~ SECOND_PHASE_SELECTION, # Describe clusters, probabilities, and population sizes # at each phase of sampling id = list(~ PSU_ID + SSU_ID, ~ 1), probs = list(~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, NULL), fpc = list(~ PSU_POP_SIZE + SSU_POP_SIZE, NULL) )"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"creating-generalized-bootstrap-replicates","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Creating Generalized Bootstrap Replicates","title":"Replication Methods for Two-phase Sampling","text":"two-phase design described, can use as_gen_boot_design() function create generalized bootstrap replicate weights. requires us specify desired number replicates target variance estimator phase sampling. Note different target variance estimators may used phase, since phase might different design. result replicate survey design object can used estimation usual functions ‘survey’ ‘srvyr’ packages. using as_gen_boot_design() two-phase designs, ’s useful know often see warning message needing approximate first-phase variance estimator’s quadratic form. can see output , function emitted warning message. generalized bootstrap works mimicking variance estimator requires variance estimator represented positive semidefinite qudratic form. two-phase designs, however, often case usual variance estimator represented exactly positive semidefinite quadratic form. cases, Beaumont Patak (2012) suggest using approximation actual quadratic form matrix similar positive semidefinite matrix. approximation general never lead underestimation variance, Beaumont Patak (2012) argue produce small overestimate variance practice. Section 5 vignette provides details approximation.","code":"# Obtain a generalized bootstrap replicates # based on # - The phase 1 estimator is the usual variance estimator # for stratified multistage simple random sampling # - The phase 2 estimator is the usual variance estimator # for single-stage simple random sampling twophase_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( \"Phase 1\" = \"Stratified Multistage SRS\", \"Phase 2\" = \"Ultimate Cluster\" ), replicates = 1000 ) twophase_boot_design |> svymean(x = ~ LIBRARIA, na.rm = TRUE) #> mean SE #> LIBRARIA 7.6044 1.8419 twophase_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( \"Phase 1\" = \"Stratified Multistage SRS\", \"Phase 2\" = \"Ultimate Cluster\" ) ) #> Warning in as_gen_boot_design.twophase2(design = twophase_design, #> variance_estimator = list(`Phase 1` = \"Stratified Multistage SRS\", : The sample #> quadratic form matrix for this design and variance estimator is not positive #> semidefinite. It will be approximated by the nearest positive semidefinite #> matrix."},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"create-replicates-using-fays-generalized-replication-method","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Create Replicates Using Fay’s Generalized Replication Method","title":"Replication Methods for Two-phase Sampling","text":"Instead generalized bootstrap, can instead use Fay’s generalized replication method. R code looks almost exactly generalized bootstrap. key difference programming standpoint use argument max_replicates specify maximum number replicates can created. function determines fewer max_replicates needed obtain fully-efficient variance estimator, actual number replicates created less max_replicates.","code":"twophase_genrep_design <- as_fays_gen_rep_design( design = twophase_design, variance_estimator = list( \"Phase 1\" = \"Stratified Multistage SRS\", \"Phase 2\" = \"Ultimate Cluster\" ), max_replicates = 500 ) #> Warning in as_fays_gen_rep_design.twophase2(design = twophase_design, #> variance_estimator = list(`Phase 1` = \"Stratified Multistage SRS\", : The sample #> quadratic form matrix for this design and variance estimator is not positive #> semidefinite. It will be approximated by the nearest positive semidefinite #> matrix."},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"calibrating-second-phase-weights-to-first-phase-estimates","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Calibrating Second-phase Weights to First-phase Estimates","title":"Replication Methods for Two-phase Sampling","text":"two-phase sampling, can helpful calibrate weights small second-phase sample using estimates produced larger, reliable first-phase sample. main reason produce precise estimates variables measured second-phase sample, calibration effective calibration variables associated second-phase variables interest. calibration also nice forces second-phase estimates calibration variables match first-phase estimates, thus improving consistency two sets estimates. Calibrating weights second-phase sample straightforward can done using usual software methods. However, care needed ensure resulting variance estimates appropriately reflect fact calibrating estimates rather known population values. fairly easy replication methods used variance estimation, requires use appropriate functions ‘svrep’ package. Section 4.3.1 memo discusses theory replicate variance estimation two-phase calibration, based detailed treatments topic Fuller (1998) Lohr (2022). general process using ‘svrep’ package calibrate second-phase sample first-phase estimates ensuring replicate weights adjusted appropriately purpose variance estimation. two useful functions ‘svrep’ package purpose, present “Option 1” “Option 2” following overview.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"preliminaries","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package > Calibrating Second-phase Weights to First-phase Estimates","what":"Preliminaries","title":"Replication Methods for Two-phase Sampling","text":"Ensure calibration variables missing values First, need ensure variables want use calibration missing values either first-phase second-phase sample. imputation might necessary. (haven’t already) Create replicate weights second-phase sample calibration, need create replicate weights second-phase sample appropriately reflect sampling variance entire two-phase design. already document, ’ll repeat code .","code":"# Impute missing values (if necessary) twophase_sample <- twophase_sample |> mutate( TOTCIR = ifelse( is.na(TOTCIR), stats::weighted.mean(TOTCIR, na.rm = TRUE, w = 1/SAMPLING_PROB), TOTCIR ), TOTSTAFF = ifelse( is.na(TOTSTAFF), stats::weighted.mean(TOTSTAFF, na.rm = TRUE, w = 1/SAMPLING_PROB), TOTSTAFF ) ) # Describe the two-phase survey design twophase_design <- twophase( method = \"full\", data = twophase_sample, # Identify the subset of first-phase elements # which were selected into the second-phase sample subset = ~ SECOND_PHASE_SELECTION, # Describe clusters, probabilities, and population sizes # at each phase of sampling id = list(~ PSU_ID + SSU_ID, ~ 1), probs = list(~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, NULL), fpc = list(~ PSU_POP_SIZE + SSU_POP_SIZE, NULL) ) # Create replicate weights for the second-phase sample # (meant to reflect variance of the entire two-phase design) twophase_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( \"Phase 1\" = \"Stratified Multistage SRS\", \"Phase 2\" = \"Ultimate Cluster\" ), replicates = 1000, mse = TRUE )"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"option-1-calibrate-to-a-set-of-estimates-and-their-variance-covariance-matrix","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package > Calibrating Second-phase Weights to First-phase Estimates","what":"Option 1: Calibrate to a set of estimates and their variance-covariance matrix","title":"Replication Methods for Two-phase Sampling","text":"approach, use data first-phase sample produce estimated totals use calibration second-phase sample. ensure calibration second-phase sample appropriately reflects variance first-phase estimated totals, also need estimate variance first-phase totals. many ways estimate first-phase variance, convenience ’ll use generalized bootstrap. ’ve estimated first-phase totals, can use function calibrate_to_estimate() calibrate two-phase survey design object first-phase totals. function discussed detail vignette titled “Sample-based Calibration”, underlying method described Fuller (1998). Let’s examine results calibration. First, ’ll check calibrated second-phase estimates match first-phase estimates. Next, ’ll inspect estimate variable wasn’t used calibration.","code":"# Extract a survey design object representing the first phase sample first_phase_design <- twophase_design$phase1$full # Create replicate weights for the first-phase sample first_phase_gen_boot <- as_gen_boot_design( design = first_phase_design, variance_estimator = \"Stratified Multistage SRS\", replicates = 1000 ) # Estimate first-phase totals and their sampling-covariance first_phase_estimates <- svytotal( x = ~ TOTCIR + TOTSTAFF, design = first_phase_gen_boot ) first_phase_totals <- coef(first_phase_estimates) first_phase_vcov <- vcov(first_phase_estimates) print(first_phase_totals) #> TOTCIR TOTSTAFF #> 1648795905.4 152846.6 print(first_phase_vcov) #> TOTCIR TOTSTAFF #> TOTCIR 6.606150e+16 5.853993e+12 #> TOTSTAFF 5.853993e+12 5.747174e+08 #> attr(,\"means\") #> [1] 1648121469.6 152702.4 calibrated_twophase_design <- calibrate_to_estimate( rep_design = twophase_boot_design, # Specify the variables in the data to use for calibration cal_formula = ~ TOTCIR + TOTSTAFF, # Supply the first-phase estimates and their variance estimate = first_phase_totals, vcov_estimate = first_phase_vcov, ) #> Selection of replicate columns whose control totals will be perturbed will be done at random. #> For tips on reproducible selection, see `help('calibrate_to_estimate')` # Display second-phase estimates for calibration variables svytotal( x = ~ TOTCIR + TOTSTAFF, design = calibrated_twophase_design ) #> total SE #> TOTCIR 1648795905 257024311 #> TOTSTAFF 152847 23973 # Display the original first-phase estimates (which are identical!) print(first_phase_estimates) #> total SE #> TOTCIR 1648795905 257024311 #> TOTSTAFF 152847 23973 # Inspect calibrated second-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = calibrated_twophase_design ) #> total SE #> LIBRARIA 57355 12308 # Compare to uncalibrated second-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = twophase_boot_design ) #> total SE #> LIBRARIA 54368 12039 # Compare to first-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = first_phase_gen_boot ) #> total SE #> LIBRARIA 55696 9171.3"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"option-2-calibrate-to-independently-generated-first-phase-replicates","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package > Calibrating Second-phase Weights to First-phase Estimates","what":"Option 2: Calibrate to independently-generated first-phase replicates","title":"Replication Methods for Two-phase Sampling","text":"data first-phase sample available replicate weights created first-phase sample, arguably better method available handle calibration. can simply produce replicate estimates first-phase totals using first-phase replicate, can calibrate second-phase replicate one first-phase replicate totals. , first create replicate weights first-phase design using generalized bootstrap (replication method). ’ve created first-phase replicates, can use function calibrate_to_sample() calibrate two-phase survey design object replicate estimates created using first-phase replicate design. function discussed detail vignette titled “Sample-based Calibration”. See Section 4.3.1 vignette underlying theory, based Fuller (1998) Opsomer Erciulescu (2021).1 Let’s examine results calibration. First, ’ll check calibrated second-phase estimates match first-phase estimates. expected, variance estimate calibrated second-phase estimate variance estimate first-phase estimate, allowing small tolerance numeric differences. Next, ’ll inspect estimate variable wasn’t used calibration.","code":"# Extract a survey design object representing the first phase sample first_phase_design <- twophase_design$phase1$full # Create replicate weights for the first-phase sample first_phase_gen_boot <- as_gen_boot_design( design = first_phase_design, variance_estimator = \"Stratified Multistage SRS\", replicates = 1000 ) calibrated_twophase_design <- calibrate_to_sample( primary_rep_design = twophase_boot_design, # Supply the first-phase replicate design control_rep_design = first_phase_gen_boot, # Specify the variables in the data to use for calibration cal_formula = ~ TOTCIR + TOTSTAFF ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')` # Display second-phase estimates for calibration variables calibrated_ests <- svytotal( x = ~ TOTCIR + TOTSTAFF, design = calibrated_twophase_design ) print(calibrated_ests) #> total SE #> TOTCIR 1648795905 242527993 #> TOTSTAFF 152847 22856 # Display the original first-phase estimates (which are identical!) first_phase_ests <- svytotal( x = ~ TOTCIR + TOTSTAFF, design = first_phase_gen_boot ) print(first_phase_ests) #> total SE #> TOTCIR 1648795905 242515035 #> TOTSTAFF 152847 22854 ratio_of_variances <- vcov(calibrated_ests)/vcov(first_phase_ests) ratio_of_variances #> TOTCIR TOTSTAFF #> TOTCIR 1.0001069 0.9998445 #> TOTSTAFF 0.9998445 1.0002008 #> attr(,\"means\") #> TOTCIR TOTSTAFF #> 1648795905.4 152846.6 # Inspect calibrated second-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = calibrated_twophase_design ) #> total SE #> LIBRARIA 57355 11958 # Compare to uncalibrated second-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = twophase_boot_design ) #> total SE #> LIBRARIA 54368 12039 # Compare to first-phase estimate svytotal( x = ~ LIBRARIA, na.rm = TRUE, design = first_phase_gen_boot ) #> total SE #> LIBRARIA 55696 8876.4"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"ratio-estimation","dir":"Articles","previous_headings":"Replicate Variance Estimation with the ‘svrep’ Package","what":"Ratio Estimation","title":"Replication Methods for Two-phase Sampling","text":"special case calibration commonly used two-phase samples ratio estimation. Whether use function calibrate_to_sample() calibrate_to_estimate(), syntax similar. Note ratio estimation, calibration formula includes -1 ensure ratio estimation used instead regression estimation. similar , fitting regression model R, use lm(y ~ -1 + x) fit linear model without intercept. Specifying parameter variance = 1 indicates working model used calibration homoskedastic, adjustment factor used every case’s weights. can seen compare adjusted weights unadjusted weights. Note adjustment factor weights simply ratio first-phase estimated total second-phase estimated total.","code":"ratio_calib_design <- calibrate_to_sample( primary_rep_design = twophase_boot_design, # Supply the first-phase replicate design control_rep_design = first_phase_gen_boot, # Specify the GREG formula. # For ratio estimation, we add `-1` to the formula # (i.e., we remove the intercept from the working model) # and specify only a single variable cal_formula = ~ -1 + TOTSTAFF, variance = 1 ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')` ratio_adjusted_weights <- weights(ratio_calib_design, type = \"sampling\") unadjusted_weights <- weights(twophase_boot_design, type = \"sampling\") adjustment_factors <- ratio_adjusted_weights/unadjusted_weights head(adjustment_factors) #> 1 3 5 7 10 13 #> 1.090189 1.090189 1.090189 1.090189 1.090189 1.090189 phase1_total <- svytotal( x = ~ TOTSTAFF, first_phase_design ) |> coef() phase2_total <- svytotal( x = ~ TOTSTAFF, twophase_boot_design ) |> coef() phase1_total/phase2_total #> TOTSTAFF #> 1.090189"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"design-based-estimators-for-two-phase-sampling","dir":"Articles","previous_headings":"","what":"Design-based Estimators for Two-phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"section , first describe double expansion estimator (DEE) produces unbiased estimates two-phase samples, using information sampling design phases. Next, describe calibration estimators adjust weights double-expansion estimator sampling variances can reduced using information first-phase sample. ’ll examine theoretical sampling variance estimator well approaches estimating variance using replication methods. interested reader encouraged consult chapter 9.3 Särndal, Swensson, Wretman (1992) chapter 12 Lohr (2022) detailed discussion two-phase sampling.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"notation","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling","what":"Notation","title":"Replication Methods for Two-phase Sampling","text":"use following notation denote sample size.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"notation-for-samples-and-sample-size","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > Notation","what":"Notation for Samples and Sample Size","title":"Replication Methods for Two-phase Sampling","text":"\\[ \\begin{aligned} s_a &: \\text{set units first-phase sample} \\\\ s_b &: \\text{set units second-phase sample} \\\\ & \\space \\space \\space \\text{Note }s_b \\text{ subset } s_a \\\\ n_a &: \\text{number units }s_1 \\\\ n_b &: \\text{number units }s_2 \\\\ \\end{aligned} \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"notation-for-probabilities-and-weights","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > Notation","what":"Notation for Probabilities and Weights","title":"Replication Methods for Two-phase Sampling","text":"use following notation denote inclusion probability unit, phase: \\[ \\begin{aligned} \\pi^{()}_{} &: \\text{probability unit }\\text{ included } s_a \\\\ \\pi^{(b|s_a)}_{} &: \\text{conditional probability unit }\\text{ included } s_b, \\\\ & \\text{ given realized first-phase sample }s_a \\\\ \\pi_i &: \\text{} \\textbf{unconditional} \\text{ probability unit }\\text{ included }s_b \\\\ \\end{aligned} \\] practice, probability \\(\\pi_i\\) prohibitively difficult calculate, requires us figure \\(\\pi^{(b|s_a)}_{}\\) every possible first-phase sample \\(s_a\\), just particular \\(s_a\\) actually selected. instead, define useful quantity \\(\\pi^{*}\\), depends particular first-phase sample \\(s_a\\) actually selected. \\[ \\pi_i^{*} := \\pi^{(b|s_a)}_{} \\times \\pi^{()}_{} \\] variance estimation, ’s also necessary consider joint inclusion probability (sometimes referred “second order probability”), simply probability pair units \\(\\) \\(j\\) included sample. \\[ \\begin{aligned} \\pi^{()}_{ij} &: \\text{probability units }\\text{ } j \\text{ included } s_a \\\\ \\pi^{(b|s_a)}_{ij} &: \\text{conditional probability units }\\text{ } j \\text{ included } s_b, \\\\ & \\text{ given realized first-phase sample }s_a \\\\ \\end{aligned} \\] also define quantity \\(\\pi^{*}_{ij}\\) similar \\(\\pi^{*}_i\\). \\[ \\pi_{ij}^{*} := \\pi^{(b|s_a)}_{ij} \\times \\pi^{()}_{ij} \\] probabilities \\(\\pi_{}^{*}\\) values used define sampling weights survey. \\[ \\begin{aligned} w^{()}_i &:= 1/\\pi^{()}_i \\\\ w^{(b|s_a)}_i &:= 1/\\pi^{(b|s_a)}_{} \\\\ w^{*}_i &:= 1/\\pi^{*}_i = w^{(b|s_a)}_i \\times w^{()}_i \\end{aligned} \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"the-double-expansion-estimator","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling","what":"The Double Expansion Estimator","title":"Replication Methods for Two-phase Sampling","text":"Suppose wish estimate population total \\(Y\\), using observed values \\(y_i\\) second-phase sample, \\(s_b\\). Särndal, Swensson, Wretman (1992) show can produce unbiased estimate \\(Y\\) using second-phase sample \\(s_b\\), follows: \\[ \\begin{aligned} \\hat{Y}^{(b)} &= \\sum_{=1}^{n_{(b)}} w^{*}_i \\times y_i \\\\ &= \\sum_{=1}^{n_{(b)}} w^{(b|s_a)}_i \\times w^{()}_i \\times y_i \\end{aligned} \\] estimator dubbed “double expansion estimator”, using sampling jargon refers weighting sample value \\(y_i\\) “expanding” \\(y_i\\) sample population. name “double expansion” used weight \\(w^{*}_i\\) can thought first using weight \\(w^{(b|s_a)}_i\\) “expand” quantity \\(y_i\\) using weight \\(w^{()}_i\\) expand quantity \\(w^{(b|s_a)}_i \\times y_i\\).","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"variance-of-the-double-expansion-estimator","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > The Double Expansion Estimator","what":"Variance of the Double Expansion Estimator","title":"Replication Methods for Two-phase Sampling","text":"sampling variance double expansion estimator sum two different components. \\[ \\begin{aligned} V\\left(\\hat{Y}^{(b)}\\right) &= V\\left(\\hat{Y}^{()}\\right)+E\\left(V\\left[\\hat{Y}^{(b)} \\mid s_a \\right]\\right) \\\\ \\\\ \\text{: }& \\hat{Y}^{()} = \\sum_{=1}^{n_{()}} w^{()}_i \\times y_i \\\\ \\text{}& V\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\text{ variance } \\hat{Y}^{(b)} \\\\ &\\text{ across samples } s_b \\\\ &\\text{ drawn given } s_a \\end{aligned} \\] first component variance estimate \\(\\hat{Y}^{()}\\) obtain used entire first-phase sample \\(s_a\\) estimate, rather using subset \\(s_b\\). second component additional variance caused using subset \\(s_b\\) instead \\(s_a\\). equal expected value (across samples \\(s_a\\)) conditional variance \\(\\hat{Y}^{(b)}\\) across samples \\(s_b\\) (conditioning given first-phase sample \\(s_a\\)).","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"estimating-the-variance-of-the-double-expansion-estimator","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > The Double Expansion Estimator > Variance of the Double Expansion Estimator","what":"Estimating the Variance of the Double Expansion Estimator","title":"Replication Methods for Two-phase Sampling","text":"variance components can estimated using values \\(y_i\\) observed \\(s_b\\). second component, simply estimate \\(V\\left[\\hat{Y}^{(b)} \\mid s_a \\right]\\), unbiased estimate expectation, \\(E\\left(V\\left[\\hat{Y}^{(b)} \\mid s_a \\right]\\right)\\). Thus, variance estimate double expansion estimator takes following form: \\[ \\hat{V}\\left(\\hat{Y}^{(b)}\\right) = \\hat{V}\\left[\\hat{Y}^{()} \\right] + \\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"estimating-the-second-phase-variance-component","dir":"Articles","previous_headings":"","what":"Replication Methods for Two-phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"estimating \\(\\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right]\\), simply choose variance estimator second-phase design, taking first-phase sample given. assume variance estimator can written quadratic form. \\[ \\begin{aligned} \\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] &= \\sum_{=1}^{n_b} \\sum_{=1}^{n_b} \\sigma^{(b)}_{ij} (w^{*}_i y_i) (w^{*}_j y_j) \\\\ \\end{aligned} \\] Horvitz-Thompson estimator, instance, use \\(\\sigma^{(b)}_{ij}=\\left(1 - \\frac{\\pi^{b|s_a}_i\\pi^{b|s_a}_j}{\\pi^{b|s_a}_{ij}}\\right)\\). quadratic form can also written matrix notation: \\[ \\begin{aligned} \\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] &= {(W^{*} y)}^{\\prime} \\Sigma_b {(W^{*} y)} \\\\ \\text{}& \\Sigma_b \\text{ } n_b \\times n_b \\text{ symmetric matrix} \\\\ & \\text{ entry } ij \\text{ equal } \\sigma^{(b)}_{ij} \\\\ \\text{} & W^{*} \\text{ } n_b \\times n_b \\text{ diagonal matrix} \\\\ & \\text{ entry } ii \\text{ equal } w^{*}_i \\\\ & y \\text{ } n_b \\times 1 \\text{ vector values} \\\\ & \\text{variable interest} \\end{aligned} \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"estimating-the-first-phase-variance-component","dir":"Articles","previous_headings":"","what":"Replication Methods for Two-phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"Estimating first variance component, \\(V\\left(\\hat{Y}^{()}\\right)\\), slightly trickier. First, need choose variance estimator appropriate first-phase design, use \\(y_i\\) observed entire sample \\(s_a\\). ’ll denote variance estimator \\(\\tilde{V}\\left[\\hat{Y}^{()}\\right]\\). \\[ \\begin{aligned} \\tilde{V}\\left[\\hat{Y}^{()} \\right] &= \\sum_{=1}^{n_a} \\sum_{=1}^{n_a} \\sigma^{()}_{ij} (w^{()}_i y_i) (w^{()}_i y_j) \\\\ \\end{aligned} \\] matrix notation, can write: \\[ \\begin{aligned} \\tilde{V}\\left[\\hat{Y}^{()} \\right] &= {(W^{()} y)}^{\\prime} (\\Sigma_{} ) {(W^{()} y)} \\\\ \\text{}& \\Sigma_{} \\text{ } n_a \\times n_a \\text{ symmetric matrix} \\\\ & \\text{ entry } ij \\text{ equal } \\sigma_{ij} \\\\ \\text{} & W^{()} \\text{ } n_a \\times n_a \\text{ diagonal matrix} \\\\ & \\text{ entry } ii \\text{ equal } w^{()}_i \\end{aligned} \\] However, since ’re working subsample \\(s_b\\) instead \\(s_a\\), need estimate \\(\\tilde{V}\\left[\\hat{Y}^{()} \\right]\\) using data \\(s_b\\). can use second-phase joint inclusion probabilities \\(\\pi^{(b \\mid s_a)}_{ij}\\) produce unbiased estimate \\(\\tilde{V}\\left[\\hat{Y}^{()} \\right]\\) using data \\(s_b\\). \\[ \\begin{aligned} \\hat{V}\\left[\\hat{Y}^{()} \\right] &= \\sum_{=1}^{n_b} \\sum_{=1}^{n_b} \\frac{1}{\\pi^{(b \\mid s_a)}_{ij}} \\sigma^{()}_{ij} (w^{()}_i y_i) (w^{()}_i y_j) \\\\ \\end{aligned} \\] can also write matrix notation: \\[ \\begin{aligned} \\hat{V}\\left[\\hat{Y}^{()} \\right] &= {(W^{()} y)}^{\\prime} (\\Sigma_{^{\\prime}} \\circ D_b ) {(W^{()} y)} \\\\ \\text{}& \\Sigma_{^{\\prime}} \\text{ } n_b \\times n_b \\text{ symmetric matrix} \\\\ & \\text{ entry } ij \\text{ equal } \\sigma_{ij} \\\\ \\text{} & W^{()} \\text{ } n_b \\times n_b \\text{ diagonal matrix} \\\\ & \\text{ entry } ii \\text{ equal } w^{()}_i \\\\ \\text{ }& D_b \\text{ } n_b \\times n_b \\text{ symmetric matrix} \\\\ & \\text{ entry } ij \\text{ equal } \\frac{1}{\\pi^{(b \\mid s_a)}_{ij}}\\\\ \\end{aligned} \\] sidenote, matrix \\(D_b\\) likely source warning messages ’ll see two-phase variance estimator positive semidefinite. 2","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"combining-the-two-estimated-variance-components","dir":"Articles","previous_headings":"","what":"Replication Methods for Two-phase Sampling","title":"Replication Methods for Two-phase Sampling","text":"Putting two estimated variance components together, thus obtain following unbiased variance estimator double expansion estimator. \\[ \\begin{aligned} \\hat{V}\\left(\\hat{Y}^{(b)}\\right) &= \\hat{V}\\left(\\hat{Y}^{()}\\right)+\\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\\\ &= \\sum_{=1}^{n_b} \\sum_{=1}^{n_b} \\frac{1}{\\pi^{(b \\mid s_a)}_{ij}} \\sigma^{()}_{ij} (w^{()}_i y_i) (w^{()}_i y_j) \\\\ &+ \\sum_{=1}^{n_b} \\sum_{=1}^{n_b} \\sigma^{(b)}_{ij} (w^{*}_i y_i) (w^{*}_j y_j) \\\\ \\end{aligned} \\] matrix notation, can write follows: \\[ \\begin{aligned} \\hat{V}\\left(\\hat{Y}^{(b)}\\right) &= \\hat{V}\\left(\\hat{Y}^{()}\\right)+\\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\\\ &= {(W^{()} y)}^{\\prime} (\\Sigma_{^{\\prime}} \\circ D_b ) {(W^{()} y)} \\\\ &+ {(W^{*} y)}^{\\prime} \\Sigma_b {(W^{*} y)} \\\\ \\end{aligned} \\] quadratic forms additive \\(W^{*}=W^{()}W^{(b \\mid s_a)}\\), can compactly write estimator follows: \\[ \\begin{aligned} \\hat{V}\\left(\\hat{Y}^{(b)}\\right) &= (W^{*}y)^{\\prime} \\Sigma_{ab} (W^{*}y) \\\\ \\text{} & \\\\ \\Sigma_{ab} &= {W^{(b)}}^{-1} (\\Sigma_{^{\\prime}} \\circ D_b ) {W^{(b)}}^{-1} + \\Sigma_b \\\\ \\text{} & W^{(b)} \\text{ } n_b \\times n_b \\text{ diagonal matrix} \\\\ & \\text{ entry } ii \\text{ equal } w^{(b \\mid s_a)}_i \\end{aligned} \\] ‘svrep’ package, \\(\\Sigma_{ab}\\) can constructed inputs \\(\\Sigma_{^{\\prime}}\\), \\(\\Sigma_b\\), \\((1/D_b)\\), using function make_twophase_quad_form(). matrix notation useful understanding replication methods variance estimation two-phase samples. unbiased replication variance estimator two-phase samples generate set adjustment factors sets replicate weights expectation \\(\\mathbf{1}_{n_b}\\) variance-covariance matrix \\(\\boldsymbol{\\Sigma}_{ab}\\). generalized bootstrap generating draws multivariate normal distribution parameters. specific combinations simple first-phase second-phase designs, jackknife BRR methods developed accomplish goal (see Lohr (2022) examples). generalized bootstrap however much easier use complex designs actually encountered settings also enjoys advantages 3.","code":"set.seed(2022) y <- rnorm(n = 100) # Select first phase sample, SRS without replacement phase_1_sample_indicators <- sampling::srswor(n = 50, N = 100) |> as.logical() phase_1_sample <- y[phase_1_sample_indicators] # Make variance estimator for first-phase variance component Sigma_a <- make_quad_form_matrix( variance_estimator = \"Ultimate Cluster\", cluster_ids = as.matrix(1:50), strata_ids = rep(1, times = 50) |> as.matrix(), strata_pop_sizes = rep(100, times = 50) |> as.matrix() ) # Select second stage sample, SRS without replacment phase_2_sample_indicators <- sampling::srswor(n = 5, N = 50) |> as.logical() phase_2_sample <- phase_1_sample[phase_2_sample_indicators] # Estimate two-phase variance Sigma_a_prime <- Sigma_a[phase_2_sample_indicators, phase_2_sample_indicators] phase_2_joint_probs <- outer(rep(5/50, times = 5), rep(4/49, times = 5)) diag(phase_2_joint_probs) <- rep(5/50, times = 5) Sigma_b <- make_quad_form_matrix( variance_estimator = \"Ultimate Cluster\", cluster_ids = as.matrix(1:5), strata_ids = rep(1, times = 5) |> as.matrix(), strata_pop_sizes = rep(50, times = 5) |> as.matrix() ) sigma_ab <- make_twophase_quad_form( sigma_1 = Sigma_a_prime, sigma_2 = Sigma_b, phase_2_joint_probs = phase_2_joint_probs ) wts <- rep( (50/100)^(-1) * (5/50)^(-1), times = 5 ) W_star <- diag(wts) W_star_y <- W_star %*% phase_2_sample t(W_star_y) %*% sigma_ab %*% (W_star_y) #> 1 x 1 Matrix of class \"dgeMatrix\" #> [,1] #> [1,] 2182.221 # Since both phases are SRS without replacement, # variance estimate for a total should be similar to the following 5 * var(W_star_y) #> [,1] #> [1,] 2297.075"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"calibration-estimators","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling","what":"Calibration Estimators","title":"Replication Methods for Two-phase Sampling","text":"section describes calibration estimators (raking, post-stratification, ratio estimators) commonly used two-phase designs. detailed treatment estimators, see Chapter 11 Lohr (2022) Chapter 6 Särndal, Swensson, Wretman (1992). two-phase sampling, can helpful calibrate weights small second-phase sample \\(s_b\\) estimates variables \\(x_1, \\dots, x_p\\) measured phases match estimates produced using larger, reliable sample \\(s_a\\). variable \\(y\\) measured second-phase sample, can lead precise estimates calibration variables \\(x_1, \\dots, x_p\\) associated \\(y\\). generalized regression (GREG) used, two-phase GREG estimator can written follows: \\[ \\hat{Y}^{(b)}_{\\text{GREG}} = \\hat{Y}^{()} + \\left(\\hat{\\mathbf{X}}^{()} - \\hat{\\mathbf{X}}^{(b)}\\right)\\hat{\\mathbf{B}}^{(b)} \\] \\(\\hat{\\mathbf{X}}^{()}\\) \\(p\\)-length vector estimated population totals variables \\(x_1, \\dots, x_p\\) estimates using first-phase data, \\(\\hat{\\mathbf{X}}^{(b)}\\) vector estimated population totals using second-phase data, \\(\\hat{\\mathbf{B}}^{(b)}\\) estimated using following: \\[ \\hat{\\mathbf{B}}^{(b)} = \\left(\\sum_{=1}^{n_{(b)}} w^{*}_i \\frac{1}{\\sigma_i^2} \\mathbf{x}_i \\mathbf{x}_i^T\\right)^{-1} \\sum_{=1}^{n_{(b)}} w^{*}_i \\frac{1}{\\sigma_i^2} \\mathbf{x}_i y_i \\] constants \\(\\sigma_i\\) chosen based specific type calibration desired.4 GREG estimator can also expressed weighted estimator based modified weights \\(\\tilde{w}^{*}_i := g_i w^{*}_i\\) modification factor \\(g\\) suitably chosen specific method calibration used (post-stratification, raking, etc.) \\[ \\begin{aligned} \\hat{Y}^{(b)}_{\\text{GREG}} &= \\sum_{=1}^{n_{(b)}} \\tilde{w}^{*}_i y_i = \\sum_{=1}^{n_{(b)}} (g_i w^{*}_i) y_i \\end{aligned} \\] modification factors \\(g_i\\) (commonly referred “g-weights”) can expressed : \\[ g_i = 1+ \\left(\\hat{\\mathbf{X}}^{()} - \\hat{\\mathbf{X}}^{(b)}\\right)^{\\prime} \\left(\\sum_{=1}^{n_{(b)}} w^{*}_i \\frac{1}{\\sigma_i^2} \\mathbf{x}_i \\mathbf{x}_i^T\\right)^{-1} \\sum_{=1}^{n_{(b)}} w^{*}_i \\frac{1}{\\sigma_i^2} \\mathbf{x}_i \\] calibrated second-phase weights \\(\\tilde{w}^{*}_i = g_i w^{*}_i\\) GREG estimator ensure second-phase estimates variables \\(x_1, \\dots, x_p\\) match first-phase estimates. \\[ \\sum_{=1}^{n_{(b)}} \\tilde{w}^{*}_ix_i = \\sum_{=1}^{n_{()}} w^{()}x_i \\]","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"variance-of-the-calibration-estimator","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > Calibration Estimators","what":"Variance of the Calibration Estimator","title":"Replication Methods for Two-phase Sampling","text":"assume second-phase calibration estimator \\(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\) unbiased first-phase estimate \\(\\hat{Y}^{()}\\) (least approximately case), can decompose calibration estimator’s variance first-phase component second-phase component follows: \\[ \\begin{aligned} V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\right) &= V\\left[E\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)} \\mid \\mathbf{Z}\\right)\\right]+E\\left[V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)} \\mid \\mathbf{Z}\\right)\\right] \\\\ &= V\\left[\\hat{Y}^{()}\\right]+E\\left[V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)} \\mid \\mathbf{Z}\\right)\\right] \\end{aligned} \\] first term first-phase variance component second term second-phase variance component. Using second-phase sample, variance calibration estimator can thus estimated unbiasedly following estimator: \\[ V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\right) =\\hat{V}\\left[\\hat{Y}^{()}\\right] + \\hat{V}\\left[\\hat{E}^{(b)} \\mid s_a\\right] \\] \\(\\hat{E}^{(b)} = \\sum_{=1}^{n_{(b)}} w^{*}e_i\\) \\(e_i= y_i - \\mathbf{x}^{\\prime}_i\\hat{\\mathbf{B}}^{(b)}\\) “residual” GREG model. variance estimator saw earlier uncalibrated estimator, \\(\\hat{Y}^{(b)}\\), except second-phase component GREG estimator uses \\(\\hat{E}^{(b)}\\) place \\(\\hat{Y}^{(b)}\\) \\[ \\hat{V}\\left(\\hat{Y}^{(b)}\\right) = \\hat{V}\\left[\\hat{Y}^{()} \\right] + \\hat{V}\\left[\\hat{Y}^{(b)} \\mid s_a \\right] \\] decomposition useful understanding theoretical variance calibration estimator can estimated general.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"replication-variance-estimation","dir":"Articles","previous_headings":"Design-based Estimators for Two-phase Sampling > Calibration Estimators","what":"Replication Variance Estimation","title":"Replication Methods for Two-phase Sampling","text":"variance estimation using replication methods, another (approximate) decomposition proves useful. Fuller (1998) decomposes two-phase calibration estimator’s variance follows. \\[ V\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\right) \\approx E \\left[ V \\left( \\tilde{E}^{(b)} \\mid s_a \\right) \\right] + \\mathbf{B}^{\\prime} \\mathbf{V}\\left(\\hat{\\mathbf{X}}^{()}\\right)\\mathbf{B} \\] \\(\\mathbf{B}\\) finite-population version \\(\\hat{\\mathbf{B}}^{(b)}\\) calculate data entire population rather just second-phase sample \\(s_b\\), \\(\\tilde{E}^{(b)}=\\sum_{=1}^{n_{(b)}} w^{*}_i\\left(y_i - \\mathbf{x}_i^{\\prime}\\mathbf{B}\\right)\\) weighted sum second-phase residuals based using \\(\\mathbf{B}\\). decomposition variance suggests following estimator: \\[ \\hat{V}\\left(\\hat{Y}_{\\mathrm{GREG}}^{(b)}\\right) := \\hat{V} \\left( \\hat{E}^{(b)} \\mid s_a \\right) + (\\hat{\\mathbf{B}}^{(b)})^{\\prime} \\hat{\\mathbf{V}}\\left(\\hat{\\mathbf{X}}^{()}\\right)(\\hat{\\mathbf{B}}^{(b)}) \\] first component estimated using second-phase data conditional variance estimator second-phase design (taking selected first-phase sample given). second component depends first-phase estimates \\(\\hat{\\mathbf{X}}^{()}\\) well first-phase variance estimate \\(\\hat{V}(\\hat{\\mathbf{X}}^{()})\\) values \\(\\mathbf{B}^{(b)}\\) used calibration. Fuller (1998) proposed replication-based version estimator. describe estimator, first suppose developed two-phase replicate weights appropriate double-expansion estimator. \\[ \\begin{aligned} \\hat{V}\\left(\\hat{Y}^{(b)}\\right) &= K_{(b)}\\sum_{r=1}^{R_{(b)}} \\left( \\hat{Y}^{(b)}_{(r)} - \\hat{Y}^{(b)} \\right)^2 \\\\ \\text{}& \\hat{Y}^{(b)}_{(r)}= \\sum_{=1}^{n_{(b)}}w_{r,} y_i \\\\ & \\text{}r\\text{-th} \\text{ replicate estimate} \\\\ & \\text{second-phase sample } \\\\ \\text{}& K_{(b)}\\text{ constant specific} \\\\ &\\text{replication method} \\end{aligned} \\] Now suppose \\(k\\)-length vector estimated first-phase totals, \\(\\hat{\\mathbf{X}}^{()}\\), used calibration second phase weights. suppose estimated totals also estimated variance-covariance matrix, denoted \\(\\hat{\\mathbf{V}}\\left(\\hat{\\mathbf{X}}^{()}\\right)\\), \\(k \\times k\\) matrix. can decompose variance-covariance matrix follows: \\[ \\hat{\\mathbf{V}}\\left(\\hat{\\mathbf{X}}^{()}\\right) = K_{(b)} \\sum_{=1}^{R_{(b)}} \\boldsymbol{\\delta}_i^{\\prime} \\boldsymbol{\\delta}_i \\] \\(\\boldsymbol{\\delta}_i\\) vector dimension \\(k\\), \\(K_{(b)}\\) constant mentioned earlier. multiple ways decomposition. Two particularly useful methods either use eigendecomposition, suggested Fuller (1998), instead use replicate estimates first-phase survey, suggested Opsomer Erciulescu (2021). Fuller demonstrates can obtain reasonable variance estimator two-phase calibration estimator using \\(R_{(b)}\\) vectors \\(\\boldsymbol{\\delta}_{r}\\) form \\(R_{(b)}\\) different control totals use calibration targets \\(R_{(b)}\\) second-phase replicates. words, simply calibrate \\(r\\)-th set replicate weights \\(r\\)-th control total \\(\\hat{\\mathbf{X}}^{()} + \\boldsymbol{\\delta}_{r}\\). Crucially, order vectors \\(\\boldsymbol{\\delta}_{r}\\) totally random, vectors \\(\\boldsymbol{\\delta}_{r}\\) independent sets replicate weights \\(\\mathbf{w}_{r}\\). Fuller (1998) shows calibrating second-phase replicates random calibration targets described results variance estimator consistent variance two-phase calibration estimator. underlying estimator described R code earlier vignette use functions calibrate_to_estimate() calibrate_to_sample(). essential difference two functions form vectors \\(\\boldsymbol{\\delta}_r\\). function calibrate_to_estimate() forms vectors \\(\\boldsymbol{\\delta}_{r}\\) using eigen-decomposition specified variance-covariance matrix. contrast, function calibrate_to_sample() forms vectors \\(\\boldsymbol{\\delta}_{r}\\) using replicate estimates first-phase sample.","code":"# Print first phase estimates and their variance-covariance print(first_phase_totals) #> TOTCIR TOTSTAFF #> 1648795905.4 152846.6 print(first_phase_vcov) #> TOTCIR TOTSTAFF #> TOTCIR 6.606150e+16 5.853993e+12 #> TOTSTAFF 5.853993e+12 5.747174e+08 #> attr(,\"means\") #> [1] 1648121469.6 152702.4 # Calibrate the two-phase replicate design # to the totals estimated from the first-phase sample calibrated_twophase_design <- calibrate_to_estimate( rep_design = twophase_boot_design, # Specify the variables in the data to use for calibration cal_formula = ~ TOTCIR + TOTSTAFF, # Supply the first-phase estimates and their variance estimate = first_phase_totals, vcov_estimate = first_phase_vcov, ) #> Selection of replicate columns whose control totals will be perturbed will be done at random. #> For tips on reproducible selection, see `help('calibrate_to_estimate')` calibrated_twophase_design <- calibrate_to_sample( primary_rep_design = twophase_boot_design, # Supply the first-phase replicate design control_rep_design = first_phase_gen_boot, # Specify the variables in the data to use for calibration cal_formula = ~ TOTCIR + TOTSTAFF ) #> Matching between primary and control replicates will be done at random. #> For tips on reproducible matching, see `help('calibrate_to_sample')`"},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"ensuring-the-variance-estimator-is-positive-semidefinite","dir":"Articles","previous_headings":"","what":"Ensuring the Variance Estimator is Positive Semidefinite","title":"Replication Methods for Two-phase Sampling","text":"’ve made far vignette, ’re probably now well-aware variance estimators two-phase designs often positive semidefinite quadratic form ’d like . Instead, ’re usually close quite positive semidefinite quadratic form, owing difficulty estimating first-phase variance component.5 One solution handling quadratic form matrix \\(\\Sigma_{ab}\\) positive semidefinite approximate \\(\\tilde{\\Sigma}_{ab} = \\Gamma \\Lambda^{*} \\Gamma^{\\prime}\\), \\(\\Gamma\\) matrix eigenvalues \\(\\Sigma_{ab}\\), \\(\\Lambda\\) diagonal matrix eigenvalues \\(\\Sigma_{ab}\\), \\(\\Lambda^{*}\\) updated version \\(\\Lambda\\) negative eigenvalues replaced \\(0\\). solution suggested Beaumont Patak (2012) general-purpose solution implementing generalized bootstrap target variance estimator ’s mimicking isn’t positive semidefinite. Beaumont Patak (2012) argue using \\(\\tilde{\\Sigma}_{ab}\\) instead \\(\\Sigma_{ab}\\) result small overestimation.","code":""},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"usage-with-the-generalized-bootstrap","dir":"Articles","previous_headings":"Ensuring the Variance Estimator is Positive Semidefinite","what":"Usage with the Generalized Bootstrap","title":"Replication Methods for Two-phase Sampling","text":"function as_gen_boot_design() used create generalized bootstrap replicate weights, warn target variance estimator positive semidefinite let know therefore approximate target variance estimator using method described .","code":"gen_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( 'Phase 1' = \"Ultimate Cluster\", 'Phase 2' = \"Ultimate Cluster\" ) ) #> Warning in as_gen_boot_design.twophase2(design = twophase_design, #> variance_estimator = list(`Phase 1` = \"Ultimate Cluster\", : The sample #> quadratic form matrix for this design and variance estimator is not positive #> semidefinite. It will be approximated by the nearest positive semidefinite #> matrix."},{"path":"https://bschneidr.github.io/svrep/articles/two-phase-sampling.html","id":"helper-functions-for-ensuring-an-estimator-is-positive-semidefinite","dir":"Articles","previous_headings":"Ensuring the Variance Estimator is Positive Semidefinite","what":"Helper Functions for Ensuring an Estimator is Positive Semidefinite","title":"Replication Methods for Two-phase Sampling","text":"‘svrep’ package two functions can helpful dealing matrices hope positive semidefinite might . function is_psd_matrix() simply checks whether matrix positive semidefinite. works estimating matrix’s eigenvalues determining whether negative. matrix isn’t positive semidefinite (least symmetric), function get_nearest_psd_matrix() implement approximation method described earlier. Approximating quadratic form one positive semidefinite leads similar (slightly larger) estimated standard error. example two-phase design based library survey earlier, can see approximation results standard error estimate slightly larger standard error estimate based quadratic form wasn’t quite positive semidefinite.","code":"twophase_quad_form_matrix <- get_design_quad_form( design = twophase_design, variance_estimator = list( 'Phase 1' = \"Ultimate Cluster\", 'Phase 2' = \"Ultimate Cluster\" ) ) twophase_quad_form_matrix |> is_psd_matrix() #> [1] FALSE approx_quad_form <- get_nearest_psd_matrix(twophase_quad_form_matrix) # Extract weights and a single variable from the second-phase sample ## NOTE: To get second-phase data, ## we use `my_design$phase1$sample$variables`. ## To get first-phase data, ## we use `my_design$phase1$full$variables wts <- weights(twophase_design, type = \"sampling\") y <- twophase_design$phase1$sample$variables$TOTSTAFF wtd_y <- as.matrix(wts * y) # Estimate standard errors std_error <- as.numeric( t(wtd_y) %*% twophase_quad_form_matrix %*% wtd_y ) |> sqrt() approx_std_error <- as.numeric( t(wtd_y) %*% approx_quad_form %*% wtd_y ) |> sqrt() print(approx_std_error) #> [1] 20498.68 print(std_error) #> [1] 19765.59 approx_std_error / std_error #> [1] 1.037089"},{"path":[]},{"path":"https://bschneidr.github.io/svrep/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Ben Schneider. Author, maintainer.","code":""},{"path":"https://bschneidr.github.io/svrep/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Schneider, B. (2023). \"svrep: Tools Creating, Updating, Analyzing Survey Replicate Weights\". R package version 0.6.0.","code":"@Misc{, author = {Benjamin Schneider}, year = {2023}, title = {svrep: Tools for Creating, Updating, and Analyzing Survey Replicate Weights}, note = {R package version 0.6.0}, url = {https://CRAN.R-project.org/package=svrep}, }"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"svrep","dir":"","previous_headings":"","what":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"svrep provides methods creating, updating, analyzing replicate weights surveys. Functions svrep can used implement adjustments replicate designs (e.g. nonresponse weighting class adjustments) analyze effect replicate weights estimates interest. Facilitates creation bootstrap generalized bootstrap replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"can install released version svrep CRAN : can install development version GitHub :","code":"install.packages(\"svrep\") # install.packages(\"devtools\") devtools::install_github(\"bschneidr/svrep\")"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"using ‘svrep’ package, please make sure cite resulting publications. appreciated package maintainer helps incentivize ongoing development, maintenance, support. Schneider B. (2023). “svrep: Tools Creating, Updating, Analyzing Survey Replicate Weights”. R package version 0.6.0. using ‘svrep’ package, please also cite ‘survey’ package R , since essential use ‘svrep’. Call citation('svrep'), citation('survey'), citation('base') information generate BibTex entries citing packages well R.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/index.html","id":"creating-replicate-weights","dir":"","previous_headings":"Example usage","what":"Creating replicate weights","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"Suppose data survey selected using complex sampling method cluster sampling. represent complex survey design, can create survey design object using survey package. help us estimate sampling variances, can create bootstrap replicate weights. function as_bootstrap_design() creates bootstrap replicate weights appropriate common complex sampling designs, using bootstrapping methods ‘survey’ package well additional methods Rao-Wu-Yue-Beaumont method (generalization Rao-Wu bootstrap). especially complex survey designs (e.g., systematic samples), generalized survey bootstrap can used. relatively simple designs, can also use random-groups jackknife.","code":"library(survey) library(svrep) data(api, package = \"survey\") set.seed(2021) # Create a survey design object for a sample # selected using a single-stage cluster sample without replacement dclus1 <- svydesign(data = apiclus1, id = ~dnum, weights = ~pw, fpc = ~fpc) # Create replicate-weights survey design orig_rep_design <- as_bootstrap_design(dclus1, replicates = 500, type = \"Rao-Wu-Yue-Beaumont\") print(orig_rep_design) #> Call: as_bootstrap_design(dclus1, replicates = 500, type = \"Rao-Wu-Yue-Beaumont\") #> Survey bootstrap with 500 replicates. # Load example data for a stratified systematic sample data('library_stsys_sample', package = 'svrep') # First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] # Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) # Convert to generalized bootstrap replicate design gen_boot_design_sd2 <- as_gen_boot_design( design = design_obj, variance_estimator = \"SD2\", replicates = 500 ) #> For `variance_estimator='SD2', assumes rows of data are sorted in the same order used in sampling. # Create random-group jackknife replicates # for a single-stage survey with many first-stage sampling units rand_grp_jk_design <- apisrs |> svydesign(data = _, ids = ~ 1, weights = ~ pw) |> as_random_group_jackknife_design( replicates = 20 )"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"adjusting-for-non-response-or-unknown-eligibility","dir":"","previous_headings":"Example usage","what":"Adjusting for non-response or unknown eligibility","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"social surveys, unit nonresponse extremely common. also somewhat common respondent cases classified “ineligible” survey based response. general, sampled cases typically classified “respondents”, “nonrespondents”, “ineligible cases”, “unknown eligibility” cases. common practice adjust weights non-response sampled cases whose eligibility survey unknown. common form adjustment “weight redistribution”: example, weights non-respondents reduced zero, weights respondents correspondingly increased total weight sample unchanged. order account adjustments estimating variances survey statistics, adjustments repeated separately set replicate weights. process can easily implemented using redistribute_weights() function. supplying column names argument redistribute_weights(), adjustments conducted separately different groups. can used conduct nonresponse weighting class adjustments.","code":"# Create variable giving response status orig_rep_design$variables[['response_status']] <- sample( x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), prob = c(0.6, 0.2, 0.1, 0.1), size = nrow(orig_rep_design), replace = TRUE ) table(orig_rep_design$variables$response_status) #> #> Ineligible Nonrespondent Respondent Unknown eligibility #> 16 32 119 16 # Adjust weights for unknown eligibility ue_adjusted_design <- redistribute_weights( design = orig_rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\") ) nr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status == \"Nonrespondent\", increase_if = response_status == \"Respondent\", by = c(\"stype\") )"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"comparing-estimates-from-different-sets-of-weights","dir":"","previous_headings":"Example usage","what":"Comparing estimates from different sets of weights","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"order assess whether weighting adjustments impact estimates care , want compare estimates different sets weights. function svyby_repwts() makes easy compare estimates different sets weights. can even test differences estimates two sets weights calculate confidence intervals difference.","code":"# Estimate overall means (and their standard errors) from each design overall_estimates <- svyby_repwts( rep_designs = list('original' = orig_rep_design, 'nonresponse-adjusted' = nr_adjusted_design), formula = ~ api00, FUN = svymean ) print(overall_estimates, row.names = FALSE) #> Design_Name api00 se #> nonresponse-adjusted 641.2030 25.54368 #> original 644.1694 23.06284 # Estimate domain means (and their standard errors) from each design domain_estimates <- svyby_repwts( rep_designs = list('original' = orig_rep_design, 'nonresponse-adjusted' = nr_adjusted_design), formula = ~ api00, by = ~ stype, FUN = svymean ) print(domain_estimates, row.names = FALSE) #> Design_Name stype api00 se #> nonresponse-adjusted E 649.9188 25.56366 #> original E 648.8681 22.31347 #> nonresponse-adjusted H 603.5390 45.26079 #> original H 618.5714 37.39448 #> nonresponse-adjusted M 616.3260 36.27983 #> original M 631.4400 31.03957 estimates <- svyby_repwts( rep_designs = list('original' = orig_rep_design, 'nonresponse-adjusted' = nr_adjusted_design), formula = ~ api00, FUN = svymean ) vcov(estimates) #> nonresponse-adjusted original #> nonresponse-adjusted 652.4793 585.5253 #> original 585.5253 531.8947 diff_between_ests <- svycontrast(stat = estimates, contrasts = list( \"Original vs. Adjusted\" = c(-1,1) )) print(diff_between_ests) #> contrast SE #> Original vs. Adjusted 2.9664 3.6501 confint(diff_between_ests) #> 2.5 % 97.5 % #> Original vs. Adjusted -4.187705 10.12056"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"diagnosing-potential-issues-with-weights","dir":"","previous_headings":"Example usage","what":"Diagnosing potential issues with weights","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"adjusting replicate weights, several diagnostics can used ensure adjustments carried correctly good harm. function summarize_rep_weights() helps allowing quickly summarize replicate weights. example, carrying nonresponse adjustments, might want verify weights nonrespondents set zero replicate. can use summarize_rep_weights() compare summary statistics replicate, can use argument group summaries one variables. end adjustment process, can inspect number rows columns examine variability weights across replicates.","code":"summarize_rep_weights( rep_design = nr_adjusted_design, type = 'specific', by = \"response_status\" ) |> subset(Rep_Column %in% 1:2) #> response_status Rep_Column N N_NONZERO SUM MEAN CV #> 1 Ineligible 1 16 16 608.1360 38.00850 1.2415437 #> 2 Ineligible 2 16 16 739.2634 46.20397 0.7578107 #> 501 Nonrespondent 1 32 0 0.0000 0.00000 NaN #> 502 Nonrespondent 2 32 0 0.0000 0.00000 NaN #> 1001 Respondent 1 119 119 6236.0577 52.40385 1.0431318 #> 1002 Respondent 2 119 119 6426.4544 54.00382 0.8345243 #> 1501 Unknown eligibility 1 16 0 0.0000 0.00000 NaN #> 1502 Unknown eligibility 2 16 0 0.0000 0.00000 NaN #> MIN MAX #> 1 0.5632079 120.38814 #> 2 0.5422029 77.44622 #> 501 0.0000000 0.00000 #> 502 0.0000000 0.00000 #> 1001 0.6072282 151.10496 #> 1002 0.5971008 102.40567 #> 1501 0.0000000 0.00000 #> 1502 0.0000000 0.00000 nr_adjusted_design |> subset(response_status == \"Respondent\") |> summarize_rep_weights( type = 'overall' ) #> nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 119 500 29 30 5625.555 1257.982 0.5305136 367.826"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"sample-based-calibration","dir":"","previous_headings":"Example usage","what":"Sample-based calibration","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"rake poststratify estimated control totals rather “true” population values, may need account variance estimated control totals ensure calibrated estimates appropriately reflect sampling error primary survey interest survey control totals estimated. ‘svrep’ package provides two functions accomplish . function calibrate_to_estimate() requires user supply vector control totals variance-covariance matrix, function calibrate_to_sample() requires user supply dataset replicate weights use estimating control totals sampling variance. example, suppose survey measuring vaccination status adults Louisville, Kentucky. variance estimation, use 100 bootstrap replicates. reduce nonresponse bias coverage error survey, can rake survey population totals demographic groups estimated Census Bureau American Community Survey (ACS). estimate population totals raking purposes, can use microdata replicate weights. can see distribution race/ethnicity among respondents differs distribution race/ethnicity ACS benchmarks. two options calibrating sample control totals benchmark survey. first approach, supply point estimates variance-covariance matrix function calibrate_to_estimate(). second approach, supply control survey’s replicate design calibrate_to_sample(). calibration, can see estimated vaccination rate decreased, estimated standard error estimated vaccination rate increased.","code":"data(\"lou_vax_survey\") # Load example data lou_vax_survey <- svydesign(ids = ~ 1, weights = ~ SAMPLING_WEIGHT, data = lou_vax_survey) |> as_bootstrap_design(replicates = 100, mse = TRUE) # Adjust for nonresponse lou_vax_survey <- lou_vax_survey |> redistribute_weights( reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\" ) |> subset(RESPONSE_STATUS == \"Respondent\") # Load microdata to use for estimating control totals data(\"lou_pums_microdata\") acs_benchmark_survey <- survey::svrepdesign( data = lou_pums_microdata, variables = ~ UNIQUE_ID + AGE + SEX + RACE_ETHNICITY + EDUC_ATTAINMENT, weights = ~ PWGTP, repweights = \"PWGTP\\\\d{1,2}\", type = \"successive-difference\", mse = TRUE ) # Compare demographic estimates from the two data sources estimate_comparisons <- data.frame( 'Vax_Survey' = svymean(x = ~ RACE_ETHNICITY, design = lou_vax_survey) |> coef(), 'ACS_Benchmark' = svymean(x = ~ RACE_ETHNICITY, design = acs_benchmark_survey) |> coef() ) rownames(estimate_comparisons) <- gsub(x = rownames(estimate_comparisons), \"RACE_ETHNICITY\", \"\") print(estimate_comparisons) #> Vax_Survey #> Black or African American alone, not Hispanic or Latino 0.16932271 #> Hispanic or Latino 0.03386454 #> Other Race, not Hispanic or Latino 0.05776892 #> White alone, not Hispanic or Latino 0.73904382 #> ACS_Benchmark #> Black or African American alone, not Hispanic or Latino 0.19949824 #> Hispanic or Latino 0.04525039 #> Other Race, not Hispanic or Latino 0.04630955 #> White alone, not Hispanic or Latino 0.70894182 # Estimate control totals and their variance-covariance matrix control_totals <- svymean(x = ~ RACE_ETHNICITY + EDUC_ATTAINMENT, design = acs_benchmark_survey) point_estimates <- coef(control_totals) vcov_estimates <- vcov(control_totals) # Calibrate the vaccination survey to the estimated control totals vax_survey_raked_to_estimates <- calibrate_to_estimate( rep_design = lou_vax_survey, estimate = point_estimates, vcov_estimate = vcov_estimates, cal_formula = ~ RACE_ETHNICITY + EDUC_ATTAINMENT, calfun = survey::cal.raking ) vax_survey_raked_to_acs_sample <- calibrate_to_sample( primary_rep_design = lou_vax_survey, control_rep_design = acs_benchmark_survey, cal_formula = ~ RACE_ETHNICITY + EDUC_ATTAINMENT, calfun = survey::cal.raking ) # Compare the two sets of estimates svyby_repwts( rep_design = list( 'NR-adjusted' = lou_vax_survey, 'Raked to estimate' = vax_survey_raked_to_estimates, 'Raked to sample' = vax_survey_raked_to_acs_sample ), formula = ~ VAX_STATUS, FUN = svymean, keep.names = FALSE ) #> Design_Name VAX_STATUSUnvaccinated VAX_STATUSVaccinated se1 #> 1 NR-adjusted 0.4621514 0.5378486 0.01863299 #> 2 Raked to estimate 0.4732623 0.5267377 0.01895171 #> 3 Raked to sample 0.4732623 0.5267377 0.01893093 #> se2 #> 1 0.01863299 #> 2 0.01895171 #> 3 0.01893093"},{"path":"https://bschneidr.github.io/svrep/index.html","id":"saving-results-to-a-data-file","dir":"","previous_headings":"Example usage","what":"Saving results to a data file","title":"Tools for Creating, Updating, and Analyzing Survey Replicate Weights","text":"’re satisfied weights, can create data frame analysis variables columns final full-sample weights replicate weights. format easy export data files can loaded R software later.","code":"data_frame_with_final_weights <- vax_survey_raked_to_estimates |> as_data_frame_with_weights( full_wgt_name = \"RAKED_WGT\", rep_wgt_prefix = \"RAKED_REP_WGT_\" ) # Preview first 10 column names colnames(data_frame_with_final_weights) |> head(10) #> [1] \"RESPONSE_STATUS\" \"RACE_ETHNICITY\" \"SEX\" \"EDUC_ATTAINMENT\" #> [5] \"VAX_STATUS\" \"SAMPLING_WEIGHT\" \"RAKED_WGT\" \"RAKED_REP_WGT_1\" #> [9] \"RAKED_REP_WGT_2\" \"RAKED_REP_WGT_3\" # Write the data to a CSV file write.csv( x = data_frame_with_final_weights, file = \"survey-data_with-updated-weights.csv\" )"},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":null,"dir":"Reference","previous_headings":"","what":"Add inactive replicates to a survey design object — add_inactive_replicates","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"Adds inactive replicates survey design object. inactive replicate replicate contribute variance estimates adds matrix replicate weights matrix desired number columns. new replicates' values simply equal full-sample weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"","code":"add_inactive_replicates(design, n_total, n_to_add, location = \"last\")"},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"design survey design object, created either survey srvyr packages. n_total total number replicates result contain. design already contains n_total replicates (), update made. n_to_add number additional replicates add. Can use n_total argument n_to_add argument, . location Either \"first\", \"last\" (default), \"random\". Specifies columns new replicates located matrix replicate weights. Use \"first\" place new replicates first (.e., leftmost part matrix), \"last\" place new replicates last (.e., rightmost part matrix). Use \"random\" intersperse new replicates random column locations matrix; original replicates still original order.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"updated survey design object, number columns replicate weights potentially increased. increase happens user specifies n_to_add argument instead n_total, user specifies n_total n_total less number columns replicate weights design already .","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"Inactive replicates also sometimes referred \"dead replicates\", example Ash (2014). purpose adding inactive replicates increase number columns replicate weights without impacting variance estimates. can useful, example, combining data survey across multiple years, different years use different number replicates, consistent number replicates desired combined data file. Suppose initial replicate design \\(L\\) replicates, respective constants \\(c_k\\) \\(k=1,\\dots,L\\) used estimate variance formula $$v_{R} = \\sum_{k=1}^L c_k\\left(\\hat{T}_y^{(k)}-\\hat{T}_y\\right)^2$$ \\(\\hat{T}_y\\) estimate produced using full-sample weights \\(\\hat{T}_y^{(k)}\\) estimate replicate \\(k\\). Inactive replicates simply replicates exactly equal full sample: , replicate \\(k\\) called \"inactive\" vector replicate weights exactly equals full-sample weights. case, using formula estimate variances, replicates contribute nothing variance estimate. analyst uses variant formula full-sample estimate \\(\\hat{T}_y\\) replaced average replicate estimate (.e., \\(L^{-1}\\sum_{k=1}^{L}\\hat{T}_y^{(k)}\\)), variance estimates differ vs. adding inactive replicates. reason, strongly recommend explicitly specify mse=TRUE creating replicate design object R functions svrepdesign(), as_bootstrap_design(), etc. working already existing replicate design, can update mse option TRUE simply using code my_design$mse <- TRUE.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/add_inactive_replicates.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add inactive replicates to a survey design object — add_inactive_replicates","text":"","code":"library(survey) #> Loading required package: grid #> Loading required package: Matrix #> Loading required package: survival #> #> Attaching package: ‘survey’ #> The following object is masked from ‘package:graphics’: #> #> dotchart set.seed(2023) # Create an example survey design object sample_data <- data.frame( PSU = c(1,2,3) ) survey_design <- svydesign( data = sample_data, ids = ~ PSU, weights = ~ 1 ) rep_design <- survey_design |> as.svrepdesign(type = \"JK1\", mse = TRUE) # Inspect replicates before subsampling rep_design |> weights(type = \"analysis\") #> [,1] [,2] [,3] #> [1,] 0.0 1.5 1.5 #> [2,] 1.5 0.0 1.5 #> [3,] 1.5 1.5 0.0 # Inspect replicates after adding inactive replicates rep_design |> add_inactive_replicates(n_total = 5, location = \"first\") |> weights(type = \"analysis\") #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1 1 0.0 1.5 1.5 #> [2,] 1 1 1.5 0.0 1.5 #> [3,] 1 1 1.5 1.5 0.0 rep_design |> add_inactive_replicates(n_to_add = 2, location = \"last\") |> weights(type = \"analysis\") #> [,1] [,2] [,3] [,4] [,5] #> [1,] 0.0 1.5 1.5 1 1 #> [2,] 1.5 0.0 1.5 1 1 #> [3,] 1.5 1.5 0.0 1 1 rep_design |> add_inactive_replicates(n_to_add = 5, location = \"random\") |> weights(type = \"analysis\") #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #> [1,] 1 1 1 0.0 1 1.5 1 1.5 #> [2,] 1 1 1 1.5 1 0.0 1 1.5 #> [3,] 1 1 1 1.5 1 1.5 1 0.0"},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"Converts survey design object replicate design object replicate weights formed using bootstrap method. Supports stratified, cluster samples one stages sampling. stage sampling, either simple random sampling (without replacement) unequal probability sampling (without replacement) may used.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"","code":"as_bootstrap_design( design, type = \"Rao-Wu-Yue-Beaumont\", replicates = 500, compress = TRUE, mse = getOption(\"survey.replicates.mse\"), samp_method_by_stage = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. type type bootstrap use, chosen based applicability sampling method used survey. available types following: \"Rao-Wu-Yue-Beaumont\" (default): bootstrap method Beaumont Émond (2022), generalization Rao-Wu-Yue bootstrap, applicable wide variety designs, including single-stage multistage stratified designs. design may different sampling methods used different stages. stage sampling may potentially PPS (.e., use unequal probabilities), without replacement, may potentially use Poisson sampling. stratum fixed sample size \\(n\\) sampling units, resampling replicate resamples \\((n-1)\\) sampling units replacement. \"Rao-Wu\": basic Rao-Wu \\((n-1)\\) bootstrap method, applicable single-stage designs multistage designs first-stage sampling fractions small (can thus ignored). Accommodates stratified designs. sampling within stratum must simple random sampling without replacement, although first-stage sampling effectively treated sampling without replacement. \"Preston\": Preston's multistage rescaled bootstrap, applicable single-stage designs multistage designs arbitrary sampling fractions. Accommodates stratified designs. sampling within stratum must simple random sampling without replacement. \"Canty-Davison\": Canty-Davison bootstrap, applicable single-stage designs, arbitrary sampling fractions. Accommodates stratified designs. sampling stratum must simple random sampling without replacement. replicates Number bootstrap replicates (large possible, given computer memory/storage limitations). commonly-recommended default 500. compress Use compressed representation replicate weights matrix. reduces computer memory required represent replicate weights impact estimates. mse TRUE, compute variances sums squares around point estimate full-sample weights, FALSE, compute variances sums squares around mean estimate replicate weights. samp_method_by_stage (Optional). default, function automatically determine sampling method used stage. However, argument can used ensure correct sampling method identified stage. Accepts vector length equal number stages sampling. element one following: \"SRSWOR\" - Simple random sampling, without replacement \"SRSWR\" - Simple random sampling, replacement \"PPSWOR\" - Unequal probabilities selection, without replacement \"PPSWR\" - Unequal probabilities selection, replacement \"Poisson\" - Poisson sampling: sampling unit selected sample , potentially different probabilities inclusion sampling unit.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"replicate design object, class svyrep.design, can used usual functions, svymean() svyglm(). Use weights(..., type = 'analysis') extract matrix replicate weights. Use as_data_frame_with_weights() convert design object data frame columns full-sample replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"Beaumont, J.-F.; Émond, N. (2022). \"Bootstrap Variance Estimation Method Multistage Sampling Two-Phase Sampling Poisson Sampling Used Second Phase.\" Stats, 5: 339–357. https://doi.org/10.3390/stats5020019 Canty, .J.; Davison, .C. (1999). \"Resampling-based variance estimation labour force surveys.\" Statistician, 48: 379-391. Preston, J. (2009). \"Rescaled bootstrap stratified multistage sampling.\" Survey Methodology, 35(2): 227-234. Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). \"recent work resampling methods complex surveys.\" Survey Methodology, 18: 209–217.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/as_bootstrap_design.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a bootstrap replicate design — as_bootstrap_design","text":"","code":"library(survey) # Example 1: A multistage sample with two stages of SRSWOR ## Load an example dataset from a multistage sample, with two stages of SRSWOR data(\"mu284\", package = 'survey') multistage_srswor_design <- svydesign(data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2) ## Convert the survey design object to a bootstrap design set.seed(2022) bootstrap_rep_design <- as_bootstrap_design(multistage_srswor_design, replicates = 500) ## Compare std. error estimates from bootstrap versus linearization data.frame( 'Statistic' = c('total', 'mean', 'median'), 'SE (bootstrap)' = c(SE(svytotal(x = ~ y1, design = bootstrap_rep_design)), SE(svymean(x = ~ y1, design = bootstrap_rep_design)), SE(svyquantile(x = ~ y1, quantile = 0.5, design = bootstrap_rep_design))), 'SE (linearization)' = c(SE(svytotal(x = ~ y1, design = multistage_srswor_design)), SE(svymean(x = ~ y1, design = multistage_srswor_design)), SE(svyquantile(x = ~ y1, quantile = 0.5, design = multistage_srswor_design))), check.names = FALSE ) #> Statistic SE (bootstrap) SE (linearization) #> 1 total 2311.130145 2274.254701 #> 2 mean 2.449955 2.273653 #> 3 median 2.331234 2.521210 # Example 2: A multistage-sample, # first stage selected with unequal probabilities without replacement # second stage selected with simple random sampling without replacement data(\"library_multistage_sample\", package = \"svrep\") multistage_pps <- svydesign(data = library_multistage_sample, ids = ~ PSU_ID + SSU_ID, fpc = ~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, pps = \"brewer\") bootstrap_rep_design <- as_bootstrap_design( multistage_pps, replicates = 500, samp_method_by_stage = c(\"PPSWOR\", \"SRSWOR\") ) ## Compare std. error estimates from bootstrap versus linearization data.frame( 'Statistic' = c('total', 'mean'), 'SE (bootstrap)' = c( SE(svytotal(x = ~ TOTCIR, na.rm = TRUE, design = bootstrap_rep_design)), SE(svymean(x = ~ TOTCIR, na.rm = TRUE, design = bootstrap_rep_design))), 'SE (linearization)' = c( SE(svytotal(x = ~ TOTCIR, na.rm = TRUE, design = multistage_pps)), SE(svymean(x = ~ TOTCIR, na.rm = TRUE, design = multistage_pps))), check.names = FALSE ) #> Statistic SE (bootstrap) SE (linearization) #> 1 total 266151536.55 255100437.38 #> 2 mean 45762.71 42544.16"},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"Convert survey design object data frame weights stored columns","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"","code":"as_data_frame_with_weights( design, full_wgt_name = \"FULL_SAMPLE_WGT\", rep_wgt_prefix = \"REP_WGT_\", vars_to_keep = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"design survey design object, created either survey srvyr packages. full_wgt_name column name use full-sample weights rep_wgt_prefix replicate design objects, prefix use column names replicate weights. column names created appending replicate number prefix. vars_to_keep default, variables data kept. select subset non-weight variables, can supply character vector variable names keep.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"data frame, new columns containing weights survey design object","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_data_frame_with_weights.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a data frame with weights stored as columns — as_data_frame_with_weights","text":"","code":"data(\"lou_vax_survey\", package = 'svrep') library(survey) # Create a survey design object survey_design <- svydesign(data = lou_vax_survey, weights = ~ SAMPLING_WEIGHT, ids = ~ 1) rep_survey_design <- as.svrepdesign(survey_design, type = \"boot\", replicates = 10) # Adjust the weights for nonresponse nr_adjusted_design <- redistribute_weights( design = rep_survey_design, reduce_if = RESPONSE_STATUS == \"Nonrespondent\", increase_if = RESPONSE_STATUS == \"Respondent\", by = c(\"RACE_ETHNICITY\", \"EDUC_ATTAINMENT\") ) # Save the survey design object as a data frame nr_adjusted_data <- as_data_frame_with_weights( nr_adjusted_design, full_wgt_name = \"NR_ADJUSTED_WGT\", rep_wgt_prefix = \"NR_ADJUSTED_REP_WGT_\" ) head(nr_adjusted_data) #> RESPONSE_STATUS RACE_ETHNICITY #> 1 Nonrespondent White alone, not Hispanic or Latino #> 2 Nonrespondent Black or African American alone, not Hispanic or Latino #> 3 Respondent White alone, not Hispanic or Latino #> 4 Nonrespondent White alone, not Hispanic or Latino #> 5 Nonrespondent White alone, not Hispanic or Latino #> 6 Respondent White alone, not Hispanic or Latino #> SEX EDUC_ATTAINMENT VAX_STATUS SAMPLING_WEIGHT NR_ADJUSTED_WGT #> 1 Female Less than high school 596.702 0.000 #> 2 Female High school or beyond 596.702 0.000 #> 3 Female Less than high school Vaccinated 596.702 1223.239 #> 4 Female Less than high school 596.702 0.000 #> 5 Female High school or beyond 596.702 0.000 #> 6 Female High school or beyond Vaccinated 596.702 1059.068 #> NR_ADJUSTED_REP_WGT_1 NR_ADJUSTED_REP_WGT_2 NR_ADJUSTED_REP_WGT_3 #> 1 0 0.000 0 #> 2 0 0.000 0 #> 3 0 2572.449 0 #> 4 0 0.000 0 #> 5 0 0.000 0 #> 6 0 0.000 0 #> NR_ADJUSTED_REP_WGT_4 NR_ADJUSTED_REP_WGT_5 NR_ADJUSTED_REP_WGT_6 #> 1 0.000 0.000 0.000 #> 2 0.000 0.000 0.000 #> 3 1260.888 0.000 0.000 #> 4 0.000 0.000 0.000 #> 5 0.000 0.000 0.000 #> 6 2058.492 3243.364 1056.924 #> NR_ADJUSTED_REP_WGT_7 NR_ADJUSTED_REP_WGT_8 NR_ADJUSTED_REP_WGT_9 #> 1 0 0.000 0 #> 2 0 0.000 0 #> 3 0 1219.633 0 #> 4 0 0.000 0 #> 5 0 0.000 0 #> 6 0 1024.285 0 #> NR_ADJUSTED_REP_WGT_10 #> 1 0.000 #> 2 0.000 #> 3 1202.584 #> 4 0.000 #> 5 0.000 #> 6 2074.098 # Check the column names of the result colnames(nr_adjusted_data) #> [1] \"RESPONSE_STATUS\" \"RACE_ETHNICITY\" \"SEX\" #> [4] \"EDUC_ATTAINMENT\" \"VAX_STATUS\" \"SAMPLING_WEIGHT\" #> [7] \"NR_ADJUSTED_WGT\" \"NR_ADJUSTED_REP_WGT_1\" \"NR_ADJUSTED_REP_WGT_2\" #> [10] \"NR_ADJUSTED_REP_WGT_3\" \"NR_ADJUSTED_REP_WGT_4\" \"NR_ADJUSTED_REP_WGT_5\" #> [13] \"NR_ADJUSTED_REP_WGT_6\" \"NR_ADJUSTED_REP_WGT_7\" \"NR_ADJUSTED_REP_WGT_8\" #> [16] \"NR_ADJUSTED_REP_WGT_9\" \"NR_ADJUSTED_REP_WGT_10\""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"Converts survey design object replicate design object replicate weights formed using generalized replication method Fay (1989). generalized replication method forms replicate weights textbook variance estimator, provided variance estimator can represented quadratic form whose matrix positive semidefinite (covers large class variance estimators).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"","code":"as_fays_gen_rep_design( design, variance_estimator = NULL, aux_var_names = NULL, max_replicates = 500, balanced = TRUE, psd_option = \"warn\", mse = TRUE, compress = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. variance_estimator name variance estimator whose quadratic form matrix created. See variance-estimators detailed description variance estimator. Options include: \"Yates-Grundy\": Yates-Grundy variance estimator based first-order second-order inclusion probabilities. \"Horvitz-Thompson\": Horvitz-Thompson variance estimator based first-order second-order inclusion probabilities. \"Poisson Horvitz-Thompson\": Horvitz-Thompson variance estimator based assuming Poisson sampling, first-order inclusion probabilities inferred sampling probabilities survey design object. \"Stratified Multistage SRS\": usual stratified multistage variance estimator based estimating variance cluster totals within strata stage. \"Ultimate Cluster\": usual variance estimator based estimating variance first-stage cluster totals within first-stage strata. \"Deville-1\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 1\". \"Deville-2\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 2\". \"Deville-Tille\": variance estimator useful balanced sampling designs, proposed Deville Tillé (2005). \"SD1\": non-circular successive-differences variance estimator described Ash (2014), sometimes used variance estimation systematic sampling. \"SD2\": circular successive-differences variance estimator described Ash (2014). estimator basis \"successive-differences replication\" estimator commonly used variance estimation systematic sampling. aux_var_names (used variance_estimator = \"Deville-Tille\"). vector names auxiliary variables used sampling. max_replicates maximum number replicates allow (large possible, given computer memory/storage limitations). commonly-recommended default 500. number replicates needed balanced, fully-efficient estimator less max_replicates, number replicates needed created. replicates needed max_replicates, full number replicates needed created, random subsample retained. balanced balanced=TRUE, replicates contribute equally variance estimates, number replicates needed may slightly increase. psd_option Either \"warn\" (default) \"error\". option specifies happen target variance estimator quadratic form matrix positive semidefinite. can occasionally happen, particularly two-phase designs. psd_option=\"error\", error message displayed. psd_option=\"warn\", warning message displayed, quadratic form matrix approximated similar positive semidefinite matrix. approximation suggested Beaumont Patak (2012), note conservative sense producing overestimates variance. Beaumont Patak (2012) argue overestimation expected small magnitude. See get_nearest_psd_matrix details approximation. mse TRUE (default), compute variances sums squares around point estimate full-sample weights, FALSE, compute variances sums squares around mean estimate replicate weights. Fay's generalized replication method, setting mse = FALSE can potentially lead large underestimates variance. compress reduces computer memory required represent replicate weights impact estimates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"replicate design object, class svyrep.design, can used usual functions, svymean() svyglm(). Use weights(..., type = 'analysis') extract matrix replicate weights. Use as_data_frame_with_weights() convert design object data frame columns full-sample replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"See Fay (1989) full description replication method, see documentation make_fays_gen_rep_factors implementation details. See variance-estimators description variance estimator available use function. Use rescale_reps eliminate negative adjustment factors.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"two-phase-designs","dir":"Reference","previous_headings":"","what":"Two-Phase Designs","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"two-phase design, variance_estimator list variance estimators' names, two elements, list('Ultimate Cluster', 'Poisson Horvitz-Thompson'). two-phase designs, following estimators may used second phase: \"Ultimate Cluster\" \"Stratified Multistage SRS\" \"Poisson Horvitz-Thompson\" statistical details handling two-phase designs, see documentation make_twophase_quad_form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"generalized replication method first proposed Fay (1984). Fay (1989) refined generalized replication method produce \"balanced\" replicates, sense replicate contributes equally variance estimates. advantage balanced replicates one can still obtain reasonable variance estimate using random subset replicates. - Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59. - Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. - Dippo, Cathryn, Robert Fay, David Morganstein. 1984. “Computing Variances Complex Samples Replicate Weights.” , 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf. - Fay, Robert. 1984. “Properties Estimates Variance Based Replication Methods.” , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf. - Fay, Robert. 1989. “Theory Application Replicate Weighting Variance Calculations.” , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf - Matei, Alina, Yves Tillé. (2005). “Evaluation Variance Approximations Estimators Maximum Entropy Sampling Unequal Probability Fixed Sample Size.” Journal Official Statistics, 21(4):543–70.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/as_fays_gen_rep_design.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a replication design\nusing Fay's generalized replication method — as_fays_gen_rep_design","text":"","code":"if (FALSE) { library(survey) ## Load an example systematic sample ---- data('library_stsys_sample', package = 'svrep') ## First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] ## Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) ## Convert to generalized replicate design gen_rep_design_sd2 <- as_fays_gen_rep_design( design = design_obj, variance_estimator = \"SD2\", max_replicates = 250, mse = TRUE ) svytotal(x = ~ TOTSTAFF, na.rm = TRUE, design = gen_rep_design_sd2) }"},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"Converts survey design object replicate design object replicate weights formed using generalized bootstrap method. generalized survey bootstrap method forming bootstrap replicate weights textbook variance estimator, provided variance estimator can represented quadratic form whose matrix positive semidefinite (covers large class variance estimators).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"","code":"as_gen_boot_design( design, variance_estimator = NULL, aux_var_names = NULL, replicates = 500, tau = \"auto\", exact_vcov = FALSE, psd_option = \"warn\", mse = getOption(\"survey.replicates.mse\"), compress = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. variance_estimator name variance estimator whose quadratic form matrix created. See variance-estimators detailed description variance estimator. Options include: \"Yates-Grundy\": Yates-Grundy variance estimator based first-order second-order inclusion probabilities. \"Horvitz-Thompson\": Horvitz-Thompson variance estimator based first-order second-order inclusion probabilities. \"Poisson Horvitz-Thompson\": Horvitz-Thompson variance estimator based assuming Poisson sampling, first-order inclusion probabilities inferred sampling probabilities survey design object. \"Stratified Multistage SRS\": usual stratified multistage variance estimator based estimating variance cluster totals within strata stage. \"Ultimate Cluster\": usual variance estimator based estimating variance first-stage cluster totals within first-stage strata. \"Deville-1\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 1\". \"Deville-2\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 2\". \"Deville-Tille\": variance estimator useful balanced sampling designs, proposed Deville Tillé (2005). \"SD1\": non-circular successive-differences variance estimator described Ash (2014), sometimes used variance estimation systematic sampling. \"SD2\": circular successive-differences variance estimator described Ash (2014). estimator basis \"successive-differences replication\" estimator commonly used variance estimation systematic sampling. aux_var_names (used variance_estimator = \"Deville-Tille\"). vector names auxiliary variables used sampling. replicates Number bootstrap replicates (large possible, given computer memory/storage limitations). commonly-recommended default 500. tau Either \"auto\", single number. rescaling constant used avoid negative weights transformation \\(\\frac{w + \\tau - 1}{\\tau}\\), \\(w\\) original weight \\(\\tau\\) rescaling constant tau. tau=\"auto\", rescaling factor determined automatically follows: adjustment factors nonnegative, tau set equal 1; otherwise, tau set smallest value needed rescale adjustment factors least 0.01. exact_vcov exact_vcov=TRUE, replicate factors generated variance estimates totals exactly match results target variance estimator. requires num_replicates exceeds rank Sigma. replicate factors generated applying PCA-whitening collection draws multivariate Normal distribution, applying coloring transformation whitened collection draws. psd_option Either \"warn\" (default) \"error\". option specifies happen target variance estimator quadratic form matrix positive semidefinite. can occasionally happen, particularly two-phase designs. psd_option=\"error\", error message displayed. psd_option=\"warn\", warning message displayed, quadratic form matrix approximated similar positive semidefinite matrix. approximation suggested Beaumont Patak (2012), note conservative sense producing overestimates variance. Beaumont Patak (2012) argue overestimation expected small magnitude. See get_nearest_psd_matrix details approximation. mse TRUE, compute variances sums squares around point estimate full-sample weights, FALSE, compute variances sums squares around mean estimate replicate weights. compress reduces computer memory required represent replicate weights impact estimates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"replicate design object, class svyrep.design, can used usual functions, svymean() svyglm(). Use weights(..., type = 'analysis') extract matrix replicate weights. Use as_data_frame_with_weights() convert design object data frame columns full-sample replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"Let \\(v( \\hat{T_y})\\) textbook variance estimator estimated population total \\(\\hat{T}_y\\) variable \\(y\\). base weight case \\(\\) sample \\(w_i\\), let \\(\\breve{y}_i\\) denote weighted value \\(w_iy_i\\). Suppose can represent textbook variance estimator quadratic form: \\(v(\\hat{T}_y) = \\breve{y}\\Sigma\\breve{y}^T\\), \\(n \\times n\\) matrix \\(\\Sigma\\). constraint \\(\\Sigma\\) , sample, must symmetric positive semidefinite. bootstrapping process creates \\(B\\) sets replicate weights, \\(b\\)-th set replicate weights vector length \\(n\\) denoted \\(\\mathbf{}^{(b)}\\), whose \\(k\\)-th value denoted \\(a_k^{(b)}\\). yields \\(B\\) replicate estimates population total, \\(\\hat{T}_y^{*(b)}=\\sum_{k \\s} a_k^{(b)} \\breve{y}_k\\), \\(b=1, \\ldots B\\), can used estimate sampling variance. $$ v_B\\left(\\hat{T}_y\\right)=\\frac{\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2}{B} $$ bootstrap variance estimator can written quadratic form: $$ v_B\\left(\\hat{T}_y\\right) =\\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}} $$ $$ \\boldsymbol{\\Sigma}_B = \\frac{\\sum_{b=1}^B\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)^{\\prime}}{B} $$ Note vector adjustment factors \\(\\mathbf{}^{(b)}\\) expectation \\(\\mathbf{1}_n\\) variance-covariance matrix \\(\\boldsymbol{\\Sigma}\\), bootstrap expectation \\(E_{*}\\left( \\boldsymbol{\\Sigma}_B \\right) = \\boldsymbol{\\Sigma}\\). Since bootstrap process takes sample values \\(\\breve{y}\\) fixed, bootstrap expectation variance estimator \\(E_{*} \\left( \\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}}\\right)= \\mathbf{\\breve{y}}^{\\prime}\\Sigma \\mathbf{\\breve{y}}\\). Thus, can produce bootstrap variance estimator expectation textbook variance estimator simply randomly generating \\(\\mathbf{}^{(b)}\\) distribution following two conditions: Condition 1: \\(\\quad \\mathbf{E}_*(\\mathbf{})=\\mathbf{1}_n\\) Condition 2: \\(\\quad \\mathbf{E}_*\\left(\\mathbf{}-\\mathbf{1}_n\\right)\\left(\\mathbf{}-\\mathbf{1}_n\\right)^{\\prime}=\\mathbf{\\Sigma}\\) multiple ways generate adjustment factors satisfying conditions, simplest general method simulate multivariate normal distribution: \\(\\mathbf{} \\sim MVN(\\mathbf{1}_n, \\boldsymbol{\\Sigma})\\). method used function.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"details-on-rescaling-to-avoid-negative-adjustment-factors","dir":"Reference","previous_headings":"","what":"Details on Rescaling to Avoid Negative Adjustment Factors","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"Let \\(\\mathbf{} = \\left[ \\mathbf{}^{(1)} \\cdots \\mathbf{}^{(b)} \\cdots \\mathbf{}^{(B)} \\right]\\) denote \\((n \\times B)\\) matrix bootstrap adjustment factors. eliminate negative adjustment factors, Beaumont Patak (2012) propose forming rescaled matrix nonnegative replicate factors \\(\\mathbf{}^S\\) rescaling adjustment factor \\(a_k^{(b)}\\) follows: $$ a_k^{S,(b)} = \\frac{a_k^{(b)} + \\tau - 1}{\\tau} $$ \\(\\tau \\geq 1 - a_k^{(b)} \\geq 1\\) \\(k\\) \\(\\left\\{ 1,\\ldots,n \\right\\}\\) \\(b\\) \\(\\left\\{1, \\ldots, B\\right\\}\\). value \\(\\tau\\) can set based realized adjustment factor matrix \\(\\mathbf{}\\) choosing \\(\\tau\\) prior generating adjustment factor matrix \\(\\mathbf{}\\) \\(\\tau\\) likely large enough prevent negative bootstrap weights. adjustment factors rescaled manner, important adjust scale factor used estimating variance bootstrap replicates, becomes \\(\\frac{\\tau^2}{B}\\) instead \\(\\frac{1}{B}\\). $$ \\textbf{Prior rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{1}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2 $$ $$ \\textbf{rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{\\tau^2}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{S*(b)}-\\hat{T}_y\\right)^2 $$ sharing dataset uses rescaled weights generalized survey bootstrap, documentation dataset instruct user use replication scale factor \\(\\frac{\\tau^2}{B}\\) rather \\(\\frac{1}{B}\\) estimating sampling variances.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"two-phase-designs","dir":"Reference","previous_headings":"","what":"Two-Phase Designs","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"two-phase design, variance_estimator list variance estimators' names, two elements, list('Ultimate Cluster', 'Poisson Horvitz-Thompson'). two-phase designs, following estimators may used second phase: \"Ultimate Cluster\" \"Stratified Multistage SRS\" \"Poisson Horvitz-Thompson\" statistical details handling two-phase designs, see documentation make_twophase_quad_form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"generalized survey bootstrap first proposed Bertail Combris (1997). See Beaumont Patak (2012) clear overview generalized survey bootstrap. generalized survey bootstrap represents one strategy forming replication variance estimators general framework proposed Fay (1984) Dippo, Fay, Morganstein (1984). - Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59. - Bellhouse, D.R. (1985). \"Computing Methods Variance Estimation Complex Surveys.\" Journal Official Statistics, Vol.1, .3. - Beaumont, Jean-François, Zdenek Patak. 2012. “Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.” International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x. - Bertail, Combris. 1997. “Bootstrap Généralisé d’un Sondage.” Annales d’Économie Et de Statistique, . 46: 49. https://doi.org/10.2307/20076068. - Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. - Dippo, Cathryn, Robert Fay, David Morganstein. 1984. “Computing Variances Complex Samples Replicate Weights.” , 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf. - Fay, Robert. 1984. “Properties Estimates Variance Based Replication Methods.” , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf. - Matei, Alina, Yves Tillé. (2005). “Evaluation Variance Approximations Estimators Maximum Entropy Sampling Unequal Probability Fixed Sample Size.” Journal Official Statistics, 21(4):543–70.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/as_gen_boot_design.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a generalized bootstrap replicate design — as_gen_boot_design","text":"","code":"if (FALSE) { library(survey) # Example 1: Bootstrap based on the Yates-Grundy estimator ---- set.seed(2014) data('election', package = 'survey') ## Create survey design object pps_design_yg <- svydesign( data = election_pps, id = ~1, fpc = ~p, pps = ppsmat(election_jointprob), variance = \"YG\" ) ## Convert to generalized bootstrap replicate design gen_boot_design_yg <- pps_design_yg |> as_gen_boot_design(variance_estimator = \"Yates-Grundy\", replicates = 1000, tau = \"auto\") svytotal(x = ~ Bush + Kerry, design = pps_design_yg) svytotal(x = ~ Bush + Kerry, design = gen_boot_design_yg) # Example 2: Bootstrap based on the successive-difference estimator ---- data('library_stsys_sample', package = 'svrep') ## First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] ## Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) ## Convert to generalized bootstrap replicate design gen_boot_design_sd2 <- as_gen_boot_design( design = design_obj, variance_estimator = \"SD2\", replicates = 2000 ) ## Estimate sampling variances svytotal(x = ~ TOTSTAFF, na.rm = TRUE, design = gen_boot_design_sd2) svytotal(x = ~ TOTSTAFF, na.rm = TRUE, design = design_obj) # Example 3: Two-phase sample ---- # -- First stage is stratified systematic sampling, # -- second stage is response/nonresponse modeled as Poisson sampling nonresponse_model <- glm( data = library_stsys_sample, family = quasibinomial('logit'), formula = I(RESPONSE_STATUS == \"Survey Respondent\") ~ 1, weights = 1/library_stsys_sample$SAMPLING_PROB ) library_stsys_sample[['RESPONSE_PROPENSITY']] <- predict( nonresponse_model, newdata = library_stsys_sample, type = \"response\" ) twophase_design <- twophase( data = library_stsys_sample, # Identify cases included in second phase sample subset = ~ I(RESPONSE_STATUS == \"Survey Respondent\"), strata = list(~ SAMPLING_STRATUM, NULL), id = list(~ 1, ~ 1), probs = list(NULL, ~ RESPONSE_PROPENSITY), fpc = list(~ STRATUM_POP_SIZE, NULL), ) twophase_boot_design <- as_gen_boot_design( design = twophase_design, variance_estimator = list( \"SD2\", \"Poisson Horvitz-Thompson\" ) ) svytotal(x = ~ LIBRARIA, design = twophase_boot_design) }"},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"Forms specified number jackknife replicates based grouping primary sampling units (PSUs) random, (approximately) equal-sized groups.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"","code":"as_random_group_jackknife_design( design, replicates = 50, var_strat = NULL, var_strat_frac = NULL, sort_var = NULL, adj_method = \"variance-stratum-psus\", scale_method = \"variance-stratum-psus\", group_var_name = \".random_group\", compress = TRUE, mse = getOption(\"survey.replicates.mse\") )"},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. replicates number replicates create variance stratum. total number replicates created number variance strata times replicates. Every design stratum must least many primary sampling units (PSUs), replicates. var_strat Specifies name variable data defines variance strata use grouped jackknife. var_strat = NULL, effectively one variance stratum. var_strat_frac Specifies sampling fraction use finite population corrections value var_strat. Can use either single number variable data corresponding var_strat. sort_var (Optional) Specifies name variable data used sort data assigning random groups. variable specified var_strat, sorting happen within values variable. adj_method Specifies calculate replicate weight adjustment factor. Available options adj_method include: \"variance-stratum-psus\" (default) replicate weight adjustment unit based number PSUs variance stratum. \"variance-units\" replicate weight adjustment unit based number variance units variance stratum. See section \"Adjustment Scale Methods\" details. scale_method Specifies calculate scale factor replicate. Available options scale_method include: \"variance-stratum-psus\" scale factor variance unit based number PSUs compared number PSUs variance stratum. \"variance-units\" scale factor variance unit based number variance units variance stratum. See section \"Adjustment Scale Methods\" details. group_var_name (Optional) name new variable created save identifiers random group PSU grouped purpose forming replicates. Specify group_var_name = NULL avoid creating variable data. compress Use compressed representation replicate weights matrix. reduces computer memory required represent replicate weights impact estimates. mse TRUE, compute variances sums squares around point estimate full-sample weights, FALSE, compute variances sums squares around mean estimate replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"replicate design object, class svyrep.design, can used usual functions, svymean() svyglm(). Use weights(..., type = 'analysis') extract matrix replicate weights. Use as_data_frame_with_weights() convert design object data frame columns full-sample replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"formation-of-random-groups","dir":"Reference","previous_headings":"","what":"Formation of Random Groups","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"Within value VAR_STRAT, data sorted first-stage sampling strata, PSUs stratum randomly arranged. Groups formed serially placing PSUs group. first PSU VAR_STRAT placed first group, second PSU second group, . PSU assigned last group, process begins assigning next PSU first group, PSU second group, . random group observation assigned can saved variable data using function argument group_var_name.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"adjustment-and-scale-methods","dir":"Reference","previous_headings":"","what":"Adjustment and Scale Methods","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"jackknife replication variance estimator based \\(R\\) replicates takes following form: $$ v(\\hat{\\theta}) = \\sum_{r=1}^{R} (1 - f_r) \\times c_r \\times \\left(\\hat{\\theta}_r - \\hat{\\theta}\\right)^2 $$ \\(r\\) indexes one \\(R\\) sets replicate weights, \\(c_r\\) corresponding scale factor \\(r\\)-th replicate, \\(1 - f_r\\) optional finite population correction factor can potentially differ across variance strata. form replicate weights, PSUs divided \\(\\tilde{H}\\) variance strata, \\(\\tilde{h}\\)-th variance stratum contains \\(G_{\\tilde{h}}\\) random groups. number replicates \\(R\\) equals total number random groups across variance strata: \\(R = \\sum_{\\tilde{h}}^{\\tilde{H}} G_{\\tilde{h}}\\). words, replicate corresponds one random groups one variance strata. weights replicate \\(r\\) corresponding random group \\(g\\) within variance stratum \\(\\tilde{h}\\) defined follows. case \\(\\) variance stratum \\(\\tilde{h}\\), \\(w_{}^{(r)} = w_i\\). case \\(\\) variance stratum \\(\\tilde{h}\\) random group \\(g\\), \\(w_{}^{(r)} = a_{\\tilde{h}g} w_i\\). Otherwise, case \\(\\) random group \\(g\\) variance stratum \\(\\tilde{h}\\), \\(w_{}^{(r)} = 0\\). R function argument adj_method determines adjustment factor \\(a_{\\tilde{h} g}\\) calculated. adj_method = \"variance-units\", \\(a_{\\tilde{h} g}\\) calculated based \\(G_{\\tilde{h}}\\), number random groups variance stratum \\(\\tilde{h}\\). adj_method = \"variance-stratum-psus\", \\(a_{\\tilde{h} g}\\) calculated based \\(n_{\\tilde{h}g}\\), number PSUs random group \\(g\\) variance stratum \\(\\tilde{h}\\), well \\(n_{\\tilde{h}}\\), total number PSUs variance stratum \\(\\tilde{h}\\). adj_method = \"variance-units\", : $$a_{\\tilde{h}g} = \\frac{G_{\\tilde{h}}}{G_{\\tilde{h}} - 1}$$ adj_method = \"variance-stratum-psus\", : $$a_{\\tilde{h}g} = \\frac{n_{\\tilde{h}}}{n_{\\tilde{h}} - n_{\\tilde{h}g}}$$ scale factor \\(c_r\\) replicate \\(r\\) corresponding random group \\(g\\) within variance stratum \\(\\tilde{h}\\) calculated according function argument scale_method. scale_method = \"variance-units\", : $$c_r = \\frac{G_{\\tilde{h}} - 1}{G_{\\tilde{h}}}$$ scale_method = \"variance-stratum-psus\", : $$c_r = \\frac{n_{\\tilde{h}} - n_{\\tilde{h}g}}{n_{\\tilde{h}}}$$ sampling fraction \\(f_r\\) used finite population correction \\(1 - f_r\\) default assumed equal 0. However, user can supply sampling fraction variance stratum using argument var_strat_frac. variance units variance stratum differing numbers PSUs, combination adj_method = \"variance-stratum-psus\" scale_method = \"variance-units\" recommended Valliant, Brick, Dever (2008), corresponding method \"GJ2\". random-groups jackknife method often referred \"DAGJK\" corresponds options var_strat = NULL, adj_method = \"variance-units\", scale_method = \"variance-units\". DAGJK method yield upwardly-biased variance estimates totals total number PSUs multiple total number replicates (Valliant, Brick, Dever 2008).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"See Section 15.5 Valliant, Dever, Kreuter (2018) introduction grouped jackknife guidelines creating random groups. - Valliant, R., Dever, J., Kreuter, F. (2018). \"Practical Tools Designing Weighting Survey Samples, 2nd edition.\" New York: Springer. See Valliant, Brick, Dever (2008) statistical details related adj_method scale_method arguments. - Valliant, Richard, Michael Brick, Jill Dever. 2008. \"Weight Adjustments Grouped Jackknife Variance Estimator.\" Journal Official Statistics. 24: 469–88. See Chapter 4 Wolter (2007) additional details jackknife, including method based random groups. - Wolter, Kirk. 2007. \"Introduction Variance Estimation.\" New York, NY: Springer New York. https://doi.org/10.1007/978-0-387-35099-8.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/as_random_group_jackknife_design.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a survey design object to a random-groups jackknife design — as_random_group_jackknife_design","text":"","code":"library(survey) # Load example data data('api', package = 'survey') api_strat_design <- svydesign( data = apistrat, id = ~ 1, strata = ~stype, weights = ~pw ) # Create a random-groups jackknife design jk_design <- as_random_group_jackknife_design( api_strat_design, replicates = 15 ) print(jk_design) #> Call: as_random_group_jackknife_design(api_strat_design, replicates = 15) #> with 15 replicates."},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":null,"dir":"Reference","previous_headings":"","what":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"Calibrate weights primary survey match estimated totals control survey, using adjustments replicate weights account variance estimated control totals. adjustments replicate weights conducted using method proposed Fuller (1998). method can used implement general calibration well post-stratification raking specifically (see details calfun parameter).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"","code":"calibrate_to_estimate( rep_design, estimate, vcov_estimate, cal_formula, calfun = survey::cal.linear, bounds = list(lower = -Inf, upper = Inf), verbose = FALSE, maxit = 50, epsilon = 1e-07, variance = NULL, col_selection = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"rep_design replicate design object primary survey, created either survey srvyr packages. estimate vector estimated control totals. names entries must match names calling svytotal(x = cal_formula, design = rep_design). vcov_estimate variance-covariance matrix estimated control totals. column names row names must match names estimate. cal_formula formula listing variables use calibration. variables must included rep_design. calfun calibration function survey package, cal.linear, cal.raking, cal.logit. Use cal.linear ordinary post-stratification, cal.raking raking. See calibrate additional details. bounds Parameter passed grake calibration. See calibrate details. verbose Parameter passed grake calibration. See calibrate details. maxit Parameter passed grake calibration. See calibrate details. epsilon Parameter passed grake calibration. calibration, absolute difference calibration target calibrated estimate larger epsilon times (1 plus absolute value target). See calibrate details. variance Parameter passed grake calibration. See calibrate details. col_selection Optional parameter determine replicate columns control totals perturbed. supplied, col_selection must integer vector length equal length estimate.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"replicate design object, full-sample weights calibrated totals estimate, replicate weights adjusted account variance control totals. element col_selection indicates, replicate column calibrated primary survey, column replicate weights matched control survey.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"Fuller method, k randomly-selected replicate columns primary survey calibrated control totals formed perturbing k-dimensional vector estimated control totals using spectral decomposition variance-covariance matrix estimated control totals. replicate columns simply calibrated unperturbed control totals. set replicate columns whose control totals perturbed random, multiple ways ensure matching reproducible. user can either call set.seed using function, supply vector randomly-selected column indices argument col_selection.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"syntax-for-common-types-of-calibration","dir":"Reference","previous_headings":"","what":"Syntax for Common Types of Calibration","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"ratio estimation auxiliary variable X, use following options: - cal_formula = ~ -1 + X - variance = 1, - cal.fun = survey::cal.linear post-stratification, use following option: - cal.fun = survey::cal.linear raking, use following option: - cal.fun = survey::cal.raking","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"Fuller, W.. (1998). \"Replication variance estimation two-phase samples.\" Statistica Sinica, 8: 1153-1164. Opsomer, J.D. . Erciulescu (2021). \"Replication variance estimation sample-based calibration.\" Survey Methodology, 47: 265-277.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_estimate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_estimate","text":"","code":"if (FALSE) { # Load example data for primary survey ---- suppressPackageStartupMessages(library(survey)) data(api) primary_survey <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) |> as.svrepdesign(type = \"JK1\") # Load example data for control survey ---- control_survey <- svydesign(id = ~ 1, fpc = ~fpc, data = apisrs) |> as.svrepdesign(type = \"JK1\") # Estimate control totals ---- estimated_controls <- svytotal(x = ~ stype + enroll, design = control_survey) control_point_estimates <- coef(estimated_controls) control_vcov_estimate <- vcov(estimated_controls) # Calibrate totals for one categorical variable and one numeric ---- calibrated_rep_design <- calibrate_to_estimate( rep_design = primary_survey, estimate = control_point_estimates, vcov_estimate = control_vcov_estimate, cal_formula = ~ stype + enroll ) # Inspect estimates before and after calibration ---- ##_ For the calibration variables, estimates and standard errors ##_ from calibrated design will match those of the control survey svytotal(x = ~ stype + enroll, design = primary_survey) svytotal(x = ~ stype + enroll, design = control_survey) svytotal(x = ~ stype + enroll, design = calibrated_rep_design) ##_ Estimates from other variables will be changed as well svymean(x = ~ api00 + api99, design = primary_survey) svymean(x = ~ api00 + api99, design = control_survey) svymean(x = ~ api00 + api99, design = calibrated_rep_design) # Inspect weights before and after calibration ---- summarize_rep_weights(primary_survey, type = 'overall') summarize_rep_weights(calibrated_rep_design, type = 'overall') # For reproducibility, specify which columns are randomly selected for Fuller method ---- column_selection <- calibrated_rep_design$col_selection print(column_selection) calibrated_rep_design <- calibrate_to_estimate( rep_design = primary_survey, estimate = control_point_estimates, vcov_estimate = control_vcov_estimate, cal_formula = ~ stype + enroll, col_selection = column_selection ) }"},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":null,"dir":"Reference","previous_headings":"","what":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"Calibrate weights primary survey match estimated totals control survey, using adjustments replicate weights account variance estimated control totals. adjustments replicate weights conducted using method proposed Opsomer Erciulescu (2021). method can used implement general calibration well post-stratification raking specifically (see details calfun parameter).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"","code":"calibrate_to_sample( primary_rep_design, control_rep_design, cal_formula, calfun = survey::cal.linear, bounds = list(lower = -Inf, upper = Inf), verbose = FALSE, maxit = 50, epsilon = 1e-07, variance = NULL, control_col_matches = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"primary_rep_design replicate design object primary survey, created either survey srvyr packages. control_rep_design replicate design object control survey. cal_formula formula listing variables use calibration. variables must included primary_rep_design control_rep_design. calfun calibration function survey package, cal.linear, cal.raking, cal.logit. Use cal.linear ordinary post-stratification, cal.raking raking. See calibrate additional details. bounds Parameter passed grake calibration. See calibrate details. verbose Parameter passed grake calibration. See calibrate details. maxit Parameter passed grake calibration. See calibrate details. epsilon Parameter passed grake calibration. calibration, absolute difference calibration target calibrated estimate larger epsilon times (1 plus absolute value target). See calibrate details. variance Parameter passed grake calibration. See calibrate details. control_col_matches Optional parameter specify control survey replicate matched primary survey replicate. \\(-th\\) entry control_col_matches equals \\(k\\), replicate \\(\\) primary_rep_design matched replicate \\(k\\) control_rep_design. Entries NA denote primary survey replicate matched control survey replicate. parameter used, matching done random.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"replicate design object, full-sample weights calibrated totals control_rep_design, replicate weights adjusted account variance control totals. primary_rep_design fewer columns replicate weights control_rep_design, number replicate columns length rscales increased multiple k, scale updated dividing k. element control_column_matches indicates, replicate column calibrated primary survey, column replicate weights matched control survey. Columns matched control survey replicate column indicated NA. element degf set match primary survey ensure degrees freedom erroneously inflated potential increases number columns replicate weights.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"Opsomer-Erciulescu method, column replicate weights control survey randomly matched column replicate weights primary survey, column primary survey calibrated control totals estimated perturbing control sample's full-sample estimates using estimates matched column replicate weights control survey. fewer columns replicate weights control survey primary survey, primary replicate columns matched replicate column control survey. columns replicate weights control survey primary survey, columns replicate weights primary survey duplicated k times, k smallest positive integer resulting number columns replicate weights primary survey greater equal number columns replicate weights control survey. replicate columns control survey matched random primary survey replicate columns, multiple ways ensure matching reproducible. user can either call set.seed using function, supply mapping argument control_col_matches.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"syntax-for-common-types-of-calibration","dir":"Reference","previous_headings":"","what":"Syntax for Common Types of Calibration","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"ratio estimation auxiliary variable X, use following options: - cal_formula = ~ -1 + X - variance = 1, - cal.fun = survey::cal.linear post-stratification, use following option: - cal.fun = survey::cal.linear raking, use following option: - cal.fun = survey::cal.raking","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"Opsomer, J.D. . Erciulescu (2021). \"Replication variance estimation sample-based calibration.\" Survey Methodology, 47: 265-277.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/calibrate_to_sample.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calibrate weights from a primary survey to estimated totals from a control survey,\nwith replicate-weight adjustments that account for variance of the control totals — calibrate_to_sample","text":"","code":"if (FALSE) { # Load example data for primary survey ---- suppressPackageStartupMessages(library(survey)) data(api) primary_survey <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) |> as.svrepdesign(type = \"JK1\") # Load example data for control survey ---- control_survey <- svydesign(id = ~ 1, fpc = ~fpc, data = apisrs) |> as.svrepdesign(type = \"JK1\") # Calibrate totals for one categorical variable and one numeric ---- calibrated_rep_design <- calibrate_to_sample( primary_rep_design = primary_survey, control_rep_design = control_survey, cal_formula = ~ stype + enroll, ) # Inspect estimates before and after calibration ---- ##_ For the calibration variables, estimates and standard errors ##_ from calibrated design will match those of the control survey svytotal(x = ~ stype + enroll, design = primary_survey) svytotal(x = ~ stype + enroll, design = control_survey) svytotal(x = ~ stype + enroll, design = calibrated_rep_design) ##_ Estimates from other variables will be changed as well svymean(x = ~ api00 + api99, design = primary_survey) svymean(x = ~ api00 + api99, design = control_survey) svymean(x = ~ api00 + api99, design = calibrated_rep_design) # Inspect weights before and after calibration ---- summarize_rep_weights(primary_survey, type = 'overall') summarize_rep_weights(calibrated_rep_design, type = 'overall') # For reproducibility, specify how to match replicates between surveys ---- column_matching <- calibrated_rep_design$control_col_matches print(column_matching) calibrated_rep_design <- calibrate_to_sample( primary_rep_design = primary_survey, control_rep_design = control_survey, cal_formula = ~ stype + enroll, control_col_matches = column_matching ) }"},{"path":"https://bschneidr.github.io/svrep/reference/compress_design.html","id":null,"dir":"Reference","previous_headings":"","what":"Produce a compressed representation of a survey design object — compress_design","title":"Produce a compressed representation of a survey design object — compress_design","text":"Produce compressed representation survey design object","code":""},{"path":"https://bschneidr.github.io/svrep/reference/compress_design.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Produce a compressed representation of a survey design object — compress_design","text":"","code":"compress_design(design, vars_to_keep = NULL)"},{"path":"https://bschneidr.github.io/svrep/reference/compress_design.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Produce a compressed representation of a survey design object — compress_design","text":"design survey design object vars_to_keep (Optional) character vector variables design keep compressed design. default, none variables retained.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/compress_design.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Produce a compressed representation of a survey design object — compress_design","text":"list two elements. design_subset element design object minimal rows needed represent survey design. index element links row original design row design_subset, design can \"uncompressed.\"","code":""},{"path":"https://bschneidr.github.io/svrep/reference/distribute_matrix_across_clusters.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","title":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","text":"Turns cluster-level matrix element-level matrix suitably duplicating rows columns matrix.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/distribute_matrix_across_clusters.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","text":"","code":"distribute_matrix_across_clusters( cluster_level_matrix, cluster_ids, rows = TRUE, cols = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/distribute_matrix_across_clusters.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","text":"cluster_level_matrix square matrix, whose number rows/columns matches number clusters. cluster_ids vector cluster identifiers. rows=TRUE, number unique elements cluster_ids must match number rows cluster_level_matrix. cols=TRUE, number unique elements cluster_ids must match number columns cluster_level_matrix. rows Whether duplicate rows cluster_level_matrix elements cluster. cols Whether duplicate columns cluster_level_matrix elements cluster.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/distribute_matrix_across_clusters.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Helper function to turn a cluster-level matrix into an element-level matrix\nby duplicating rows or columns of the matrix — distribute_matrix_across_clusters","text":"input cluster_level_matrix rows/columns duplicated number rows (rows=TRUE) columns (cols=TRUE) equals length cluster_ids.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"function estimates number bootstrap replicates needed reduce simulation error bootstrap variance estimator target level, \"simulation error\" defined error caused using finite number bootstrap replicates simulation error measured simulation coefficient variation (\"simulation CV\").","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"","code":"estimate_boot_reps_for_target_cv(svrepstat, target_cv = 0.05)"},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"svrepstat estimate obtained bootstrap replicate survey design object, function svymean(..., return.replicates = TRUE) withReplicates(..., return.replicates = TRUE). target_cv numeric value (vector numeric values) 0 1. target simulation CV bootstrap variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"data frame one row value target_cv. column TARGET_CV gives target coefficient variation. column MAX_REPS gives maximum number replicates needed statistics included svrepstat. remaining columns give number replicates needed statistic.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"suggested-usage","dir":"Reference","previous_headings":"","what":"Suggested Usage","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"- Step 1: Determine largest acceptable level simulation error key survey estimates, level simulation error measured terms simulation CV. refer \"target CV.\" conventional value target CV 5%. - Step 2: Estimate key statistics interest using large number bootstrap replicates (5,000) save estimates bootstrap replicate. can conveniently done using function survey package svymean(..., return.replicates = TRUE) withReplicates(..., return.replicates = TRUE). - Step 3: Use function estimate_boot_reps_for_target_cv() estimate minimum number bootstrap replicates needed attain target CV.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"Unlike replication methods jackknife balanced repeated replication, bootstrap variance estimator's precision can always improved using larger number replicates, use finite number bootstrap replicates introduces simulation error variance estimation process. Simulation error can measured \"simulation coefficient variation\" (CV), ratio standard error bootstrap estimator expectation bootstrap estimator, expectation standard error evaluated respect bootstrapping process given selected sample. statistic \\(\\hat{\\theta}\\), simulation CV bootstrap variance estimator \\(v_{B}(\\hat{\\theta})\\) based \\(B\\) replicate estimates \\(\\hat{\\theta}^{\\star}_1,\\dots,\\hat{\\theta}^{\\star}_B\\) defined follows: $$ CV_{\\star}(v_{B}(\\hat{\\theta})) = \\frac{\\sqrt{var_{\\star}(v_B(\\hat{\\theta}))}}{E_{\\star}(v_B(\\hat{\\theta}))} = \\frac{CV_{\\star}(E_2)}{\\sqrt{B}} $$ $$ E_2 = (\\hat{\\theta}^{\\star} - \\hat{\\theta})^2 $$ $$ CV_{\\star}(E_2) = \\frac{\\sqrt{var_{\\star}(E_2)}}{E_{\\star}(E_2)} $$ \\(var_{\\star}\\) \\(E_{\\star}\\) evaluated respect bootstrapping process, given selected sample. simulation CV, denoted \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\), estimated given number replicates \\(B\\) estimating \\(CV_{\\star}(E_2)\\) using observed values dividing \\(\\sqrt{B}\\). bootstrap errors assumed normally distributed, \\(CV_{\\star}(E_2)=\\sqrt{2}\\) \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\) need estimated. Using observed replicate estimates estimate simulation CV instead assuming normality allows simulation CV used wide array bootstrap methods.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"See Section 3.3 Section 8 Beaumont Patak (2012) details example simulation CV used determine number bootstrap replicates needed various alternative bootstrap methods empirical illustration. Beaumont, J.-F. Z. Patak. (2012), \"Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling.\" International Statistical Review, 80: 127-148. doi:10.1111/j.1751-5823.2011.00166.x .","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_reps_for_target_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Estimate the number of bootstrap replicates needed to reduce the bootstrap simulation error to a target level — estimate_boot_reps_for_target_cv","text":"","code":"if (FALSE) { set.seed(2022) # Create an example bootstrap survey design object ---- library(survey) data('api', package = 'survey') boot_design <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) |> svrep::as_bootstrap_design(replicates = 5000) # Calculate estimates of interest and retain estimates from each replicate ---- estimated_means_and_proportions <- svymean(x = ~ api00 + api99 + stype, design = boot_design, return.replicates = TRUE) custom_statistic <- withReplicates(design = boot_design, return.replicates = TRUE, theta = function(wts, data) { numerator <- sum(data$api00 * wts) denominator <- sum(data$api99 * wts) statistic <- numerator/denominator return(statistic) }) # Determine minimum number of bootstrap replicates needed to obtain given simulation CVs ---- estimate_boot_reps_for_target_cv( svrepstat = estimated_means_and_proportions, target_cv = c(0.01, 0.05, 0.10) ) estimate_boot_reps_for_target_cv( svrepstat = custom_statistic, target_cv = c(0.01, 0.05, 0.10) ) }"},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"Estimates bootstrap simulation error, expressed \"simulation coefficient variation\" (CV).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"","code":"estimate_boot_sim_cv(svrepstat)"},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"svrepstat estimate obtained bootstrap replicate survey design object, function svymean(..., return.replicates = TRUE) withReplicates(..., return.replicates = TRUE).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"data frame one row statistic. column STATISTIC gives name statistic. column SIMULATION_CV gives estimated simulation CV statistic. column N_REPLICATES gives number bootstrap replicates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"Unlike replication methods jackknife balanced repeated replication, bootstrap variance estimator's precision can always improved using larger number replicates, use finite number bootstrap replicates introduces simulation error variance estimation process. Simulation error can measured \"simulation coefficient variation\" (CV), ratio standard error bootstrap estimator expectation bootstrap estimator, expectation standard error evaluated respect bootstrapping process given selected sample. statistic \\(\\hat{\\theta}\\), simulation CV bootstrap variance estimator \\(v_{B}(\\hat{\\theta})\\) based \\(B\\) replicate estimates \\(\\hat{\\theta}^{\\star}_1,\\dots,\\hat{\\theta}^{\\star}_B\\) defined follows: $$ CV_{\\star}(v_{B}(\\hat{\\theta})) = \\frac{\\sqrt{var_{\\star}(v_B(\\hat{\\theta}))}}{E_{\\star}(v_B(\\hat{\\theta}))} = \\frac{CV_{\\star}(E_2)}{\\sqrt{B}} $$ $$ E_2 = (\\hat{\\theta}^{\\star} - \\hat{\\theta})^2 $$ $$ CV_{\\star}(E_2) = \\frac{\\sqrt{var_{\\star}(E_2)}}{E_{\\star}(E_2)} $$ \\(var_{\\star}\\) \\(E_{\\star}\\) evaluated respect bootstrapping process, given selected sample. simulation CV, denoted \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\), estimated given number replicates \\(B\\) estimating \\(CV_{\\star}(E_2)\\) using observed values dividing \\(\\sqrt{B}\\). bootstrap errors assumed normally distributed, \\(CV_{\\star}(E_2)=\\sqrt{2}\\) \\(CV_{\\star}(v_{B}(\\hat{\\theta}))\\) need estimated. Using observed replicate estimates estimate simulation CV instead assuming normality allows simulation CV used wide array bootstrap methods.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"See Section 3.3 Section 8 Beaumont Patak (2012) details example simulation CV used determine number bootstrap replicates needed various alternative bootstrap methods empirical illustration. Beaumont, J.-F. Z. Patak. (2012), \"Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling.\" International Statistical Review, 80: 127-148. doi:10.1111/j.1751-5823.2011.00166.x .","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/estimate_boot_sim_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Estimate the bootstrap simulation error — estimate_boot_sim_cv","text":"","code":"if (FALSE) { set.seed(2022) # Create an example bootstrap survey design object ---- library(survey) data('api', package = 'survey') boot_design <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) |> svrep::as_bootstrap_design(replicates = 5000) # Calculate estimates of interest and retain estimates from each replicate ---- estimated_means_and_proportions <- svymean(x = ~ api00 + api99 + stype, design = boot_design, return.replicates = TRUE) custom_statistic <- withReplicates(design = boot_design, return.replicates = TRUE, theta = function(wts, data) { numerator <- sum(data$api00 * wts) denominator <- sum(data$api99 * wts) statistic <- numerator/denominator return(statistic) }) # Estimate simulation CV of bootstrap estimates ---- estimate_boot_sim_cv( svrepstat = estimated_means_and_proportions ) estimate_boot_sim_cv( svrepstat = custom_statistic ) }"},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":null,"dir":"Reference","previous_headings":"","what":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"Determines quadratic form matrix specified variance estimator, parsing information stored survey design object created using 'survey' package.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"","code":"get_design_quad_form( design, variance_estimator, ensure_psd = FALSE, aux_var_names = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"design survey design object created using 'survey' ('srvyr') package, class 'survey.design' 'svyimputationList'. Also accepts two-phase design objects class 'twophase2'; see section titled \"Two-Phase Designs\" information handling two-phase designs. variance_estimator name variance estimator whose quadratic form matrix created. See section \"Variance Estimators\" . Options include: \"Yates-Grundy\": Yates-Grundy variance estimator based first-order second-order inclusion probabilities. \"Horvitz-Thompson\": Horvitz-Thompson variance estimator based first-order second-order inclusion probabilities. \"Poisson Horvitz-Thompson\": Horvitz-Thompson variance estimator based assuming Poisson sampling, first-order inclusion probabilities inferred sampling probabilities survey design object. \"Stratified Multistage SRS\": usual stratified multistage variance estimator based estimating variance cluster totals within strata stage. \"Ultimate Cluster\": usual variance estimator based estimating variance first-stage cluster totals within first-stage strata. \"Deville-1\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 1\". \"Deville-2\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 2\". \"Deville-Tille\": variance estimator useful balanced sampling designs, proposed Deville Tillé (2005). \"SD1\": non-circular successive-differences variance estimator described Ash (2014), sometimes used variance estimation systematic sampling. \"SD2\": circular successive-differences variance estimator described Ash (2014). estimator basis \"successive-differences replication\" estimator commonly used variance estimation systematic sampling. ensure_psd TRUE (default), ensures result positive semidefinite matrix. necessary quadratic form used input replication methods generalized bootstrap. mathematical details, please see documentation function get_nearest_psd_matrix(). approximation method discussed Beaumont Patak (2012) context forming replicate weights two-phase samples. authors argue approximation lead small overestimation variance. aux_var_names required variance_estimator = \"Deville-Tille\". character vector variable names auxiliary variables used Breidt Chauvet (2011) variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"matrix representing quadratic form specified variance estimator, based extracting information clustering, stratification, selection probabilities survey design object.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"variance-estimators","dir":"Reference","previous_headings":"","what":"Variance Estimators","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"See variance-estimators description variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"two-phase-designs","dir":"Reference","previous_headings":"","what":"Two-Phase Designs","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"two-phase design, variance_estimator list variance estimators' names, two elements, list('Ultimate Cluster', 'Poisson Horvitz-Thompson'). two-phase designs, following estimators may used second phase: \"Ultimate Cluster\" \"Stratified Multistage SRS\" \"Poisson Horvitz-Thompson\" statistical details handling two-phase designs, see documentation make_twophase_quad_form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"- Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59. - Beaumont, Jean-François, Zdenek Patak. (2012). \"Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.\" International Statistical Review 80 (1): 127–48. - Bellhouse, D.R. (1985). \"Computing Methods Variance Estimation Complex Surveys.\" Journal Official Statistics, Vol.1, .3. - Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. - Särndal, C.-E., Swensson, B., & Wretman, J. (1992). \"Model Assisted Survey Sampling.\" Springer New York.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_design_quad_form.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Determine the quadratic form matrix of a variance estimator for a survey design object — get_design_quad_form","text":"","code":"if (FALSE) { # Example 1: Quadratic form for successive-difference variance estimator ---- data('library_stsys_sample', package = 'svrep') ## First, ensure data are sorted in same order as was used in sampling library_stsys_sample <- library_stsys_sample[ order(library_stsys_sample$SAMPLING_SORT_ORDER), ] ## Create a survey design object design_obj <- svydesign( data = library_stsys_sample, strata = ~ SAMPLING_STRATUM, ids = ~ 1, fpc = ~ STRATUM_POP_SIZE ) ## Obtain quadratic form quad_form_matrix <- get_design_quad_form( design = design_obj, variance_estimator = \"SD2\" ) ## Estimate variance of estimated population total y <- design_obj$variables$LIBRARIA wts <- weights(design_obj, type = 'sampling') y_wtd <- as.matrix(y) * wts y_wtd[is.na(y_wtd)] <- 0 pop_total <- sum(y_wtd) var_est <- t(y_wtd) %*% quad_form_matrix %*% y_wtd std_error <- sqrt(var_est) print(pop_total); print(std_error) # Compare to estimate from assuming SRS svytotal(x = ~ LIBRARIA, na.rm = TRUE, design = design_obj) # Example 2: Two-phase design (second phase is nonresponse) ---- ## Estimate response propensities, separately by stratum library_stsys_sample[['RESPONSE_PROB']] <- svyglm( design = design_obj, formula = I(RESPONSE_STATUS == \"Survey Respondent\") ~ SAMPLING_STRATUM, family = quasibinomial('logistic') ) |> predict(type = 'response') ## Create a survey design object, ## where nonresponse is treated as a second phase of sampling twophase_design <- twophase( data = library_stsys_sample, strata = list(~ SAMPLING_STRATUM, NULL), id = list(~ 1, ~ 1), fpc = list(~ STRATUM_POP_SIZE, NULL), probs = list(NULL, ~ RESPONSE_PROB), subset = ~ I(RESPONSE_STATUS == \"Survey Respondent\") ) ## Obtain quadratic form for the two-phase variance estimator, ## where first phase variance contribution estimated ## using the successive differences estimator ## and second phase variance contribution estimated ## using the Horvitz-Thompson estimator ## (with joint probabilities based on assumption of Poisson sampling) get_design_quad_form( design = twophase_design, variance_estimator = list( \"SD2\", \"Poisson Horvitz-Thompson\" ) ) }"},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"Approximates symmetric, real matrix nearest positive semidefinite matrix Frobenius norm, using method Higham (1988). real, symmetric matrix, equivalent \"zeroing \" negative eigenvalues. See \"Details\" section information.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"","code":"get_nearest_psd_matrix(X)"},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"X symmetric, real matrix missing values.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"nearest positive semidefinite matrix dimension X.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"Let \\(\\) denote symmetric, real matrix positive semidefinite. can form spectral decomposition \\(=\\Gamma \\Lambda \\Gamma^{\\prime}\\), \\(\\Lambda\\) diagonal matrix whose entries eigenvalues \\(\\). method Higham (1988) approximate \\(\\) \\(\\tilde{} = \\Gamma \\Lambda_{+} \\Gamma^{\\prime}\\), \\(ii\\)-th entry \\(\\Lambda_{+}\\) \\(\\max(\\Lambda_{ii}, 0)\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"- Higham, N. J. (1988). \"Computing nearest symmetric positive semidefinite matrix.\" Linear Algebra Applications, 103, 103–118.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/get_nearest_psd_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Approximates a symmetric, real matrix by the nearest positive\nsemidefinite matrix. — get_nearest_psd_matrix","text":"","code":"X <- matrix( c(2, 5, 5, 5, 2, 5, 5, 5, 2), nrow = 3, byrow = TRUE ) get_nearest_psd_matrix(X) #> [,1] [,2] [,3] #> [1,] 4 4 4 #> [2,] 4 4 4 #> [3,] 4 4 4"},{"path":"https://bschneidr.github.io/svrep/reference/getvars.html","id":null,"dir":"Reference","previous_headings":"","what":"Get variables from a database — getvars","title":"Get variables from a database — getvars","text":"database helper function copied 'survey' package","code":""},{"path":"https://bschneidr.github.io/svrep/reference/getvars.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get variables from a database — getvars","text":"","code":"getvars( formula, dbconnection, tables, db.only = TRUE, updates = NULL, subset = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/getvars.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get variables from a database — getvars","text":"formula Either formula character vector giving names variables dbconnection database connection tables Name(s) table(s) pull db.Unclear parameter inherited 'survey' package updates Updates potentially make subset Optional indices data subset returning result","code":""},{"path":"https://bschneidr.github.io/svrep/reference/getvars.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get variables from a database — getvars","text":"data frame","code":""},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"Compute matrix joint inclusion probabilities quadratic form Horvitz-Thompson variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"","code":"ht_matrix_to_joint_probs(ht_quad_form)"},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"ht_quad_form matrix quadratic form representing Horvitz-Thompson variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"matrix joint inclusion probabilities","code":""},{"path":"https://bschneidr.github.io/svrep/reference/ht_matrix_to_joint_probs.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute the matrix of joint inclusion probabilities\nfrom the quadratic form of a Horvitz-Thompson variance estimator. — ht_matrix_to_joint_probs","text":"quadratic form matrix Horvitz-Thompson variance estimator \\(ij\\)-th entry equal \\((1-\\frac{\\pi_i \\pi_j}{\\pi_{ij}})\\). matrix joint probabilties \\(ij\\)-th entry equal \\(\\pi_{ij}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Check whether a matrix is positive semidefinite — is_psd_matrix","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"Check whether matrix positive semidefinite, based checking symmetric negative eigenvalues.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"","code":"is_psd_matrix(X, tolerance = sqrt(.Machine$double.eps))"},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"X matrix missing infinite values. tolerance Tolerance controlling whether tiny computed eigenvalue actually considered negative. Computed negative eigenvalues considered negative less less -abs(tolerance * max(eigen(X)$values)). small nonzero tolerance recommended since eigenvalues nearly always computed floating-point error.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"logical value. TRUE matrix deemed positive semidefinite. Negative otherwise (including X symmetric).","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/is_psd_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check whether a matrix is positive semidefinite — is_psd_matrix","text":"","code":"X <- matrix( c(2, 5, 5, 5, 2, 5, 5, 5, 2), nrow = 3, byrow = TRUE ) is_psd_matrix(X) #> [1] FALSE eigen(X)$values #> [1] 12 -3 -3"},{"path":"https://bschneidr.github.io/svrep/reference/libraries.html","id":null,"dir":"Reference","previous_headings":"","what":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","title":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","text":"Data taken complete census public libraries United States FY2020 (April 2020 March 2021). Public Libraries Survey (PLS) annual census public libraries U.S., including public libraries identified state library administrative agencies 50 states, District Columbia, outlying territories American Samoa, Guam, Northern Mariana Islands, U.S. Virgin Islands (Puerto Rico participate FY2020). primary dataset, library_census, represents full microdata census. datasets library_multistage_sample library_stsys_sample samples drawn library_census using different sampling methods.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/libraries.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","text":"","code":"data(library_census) data(library_multistage_sample) data(library_stsys_sample)"},{"path":"https://bschneidr.github.io/svrep/reference/libraries.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","text":"Library Census (library_census): dataset includes 9,245 records (one per library) 23 variables. column variable label, accessible using function var_label() 'labelled' package simply calling attr(x, 'label') given column. data include subset variables included public-use data published PLS, specifically Public Library System Data File. Particularly relevant variables include: Identifier variables survey response status: FSCSKEY: unique identifier libraries. LIBNAME: name library. RESPONSE_STATUS: Response status Public Library Survey: indicates whether library respondent, nonrespondent, closed. Numeric summaries: TOTCIR: Total circulation VISITS: Total visitors REGBOR: Total number registered users TOTSTAFF: Total staff (measured full-time equivalent staff) LIBRARIA: Total librarians (measured full-time equivalent staff) TOTOPEXP: Total operating expenses TOTINCM: Total income BRANLIB: Number library branches CENTLIB: Number central library locations Location: LONGITUD: Geocoded longitude (WGS84 CRS) LATITUD: Geocoded latitude (WGS84 CRS) STABR: Two-letter state abbreviation CBSA: Five-digit identifer core-based statistical area (CBSA) MICROF: Flag metropolitan micropolitan statistical area Library Multistage Sample (library_multistage_sample): data represent two-stage sample (PSUs SSUs), first stage sample selected using unequal probability sampling without replacement (PPSWOR) second stage sample selected using simple random sampling without replacement (SRSWOR). Includes variables library_census, additional design variables. PSU_ID: unique identifier primary sampling units SSU_ID: unique identifer secondary sampling units SAMPLING_PROB: Overall inclusion probability PSU_SAMPLING_PROB: Inclusion probability PSU SSU_SAMPLING_PROB: Inclusion probability SSU PSU_POP_SIZE: number PSUs population SSU_POP_SIZE: number population SSUs within PSU Library Stratified Systematic Sample (library_stsys_sample): data represent stratified systematic sample. Includes variables library_census, additional design variables. SAMPLING_STRATUM: Unique identifier sampling strata STRATUM_POP_SIZE: population size stratum SAMPLING_SORT_ORDER: sort order used selecting random systematic sample SAMPLING_PROB: Overall inclusion probability","code":""},{"path":"https://bschneidr.github.io/svrep/reference/libraries.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Public Libraries Survey (PLS): A Census of U.S. Public Libraries in FY2020 — libraries","text":"Pelczar, M., Soffronoff, J., Nielsen, E., Li, J., & Mabile, S. (2022). Data File Documentation: Public Libraries United States Fiscal Year 2020. Institute Museum Library Services: Washington, D.C.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_pums_microdata.html","id":null,"dir":"Reference","previous_headings":"","what":"ACS PUMS Data for Louisville — lou_pums_microdata","title":"ACS PUMS Data for Louisville — lou_pums_microdata","text":"Person-level microdata American Community Survey (ACS) 2015-2019 public-use microdata sample (PUMS) data Louisville, KY. microdata sample represents adults (persons aged 18 ) Louisville, KY. data include replicate weights use variance estimation.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_pums_microdata.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ACS PUMS Data for Louisville — lou_pums_microdata","text":"","code":"data(lou_pums_microdata)"},{"path":"https://bschneidr.github.io/svrep/reference/lou_pums_microdata.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"ACS PUMS Data for Louisville — lou_pums_microdata","text":"data frame 80 rows 85 variables UNIQUE_ID: Unique identifier records AGE: Age years (copied AGEP variable ACS microdata) RACE_ETHNICITY: Race Hispanic/Latino ethnicity derived RAC1P HISP variables ACS microdata collapsed smaller number categories. SEX: Male Female EDUC_ATTAINMENT: Highest level education attained ('Less high school' 'High school beyond') derived SCHL variable ACS microdata collapsed smaller number categories. PWGTP: Weights full-sample PWGTP1-PWGTP80: 80 columns replicate weights created using Successive Differences Replication (SDR) method.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_pums_microdata.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ACS PUMS Data for Louisville — lou_pums_microdata","text":"","code":"if (FALSE) { data(lou_pums_microdata) # Prepare the data for analysis with the survey package library(survey) lou_pums_rep_design <- survey::svrepdesign( data = lou_pums_microdata, variables = ~ UNIQUE_ID + AGE + SEX + RACE_ETHNICITY + EDUC_ATTAINMENT, weights = ~ PWGTP, repweights = \"PWGTP\\\\d{1,2}\", type = \"successive-difference\", mse = TRUE ) # Estimate population proportions svymean(~ SEX, design = lou_pums_rep_design) }"},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey.html","id":null,"dir":"Reference","previous_headings":"","what":"Louisville Vaccination Survey — lou_vax_survey","title":"Louisville Vaccination Survey — lou_vax_survey","text":"survey measuring Covid-19 vaccination status handful demographic variables, based simple random sample 1,000 residents Louisville, Kentucky approximately 50% response rate. data created using simulation.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Louisville Vaccination Survey — lou_vax_survey","text":"","code":"data(lou_vax_survey)"},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Louisville Vaccination Survey — lou_vax_survey","text":"data frame 1,000 rows 6 variables RESPONSE_STATUS Response status survey ('Respondent' 'Nonrespondent') RACE_ETHNICITY Race Hispanic/Latino ethnicity derived RAC1P HISP variables ACS microdata collapsed smaller number categories. SEX Male Female EDUC_ATTAINMENT Highest level education attained ('Less high school' 'High school beyond') derived SCHL variable ACS microdata collapsed smaller number categories. VAX_STATUS Covid-19 vaccination status ('Vaccinated' 'Unvaccinated') SAMPLING_WEIGHT Sampling weight: equal cases since data come simple random sample","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey_control_totals.html","id":null,"dir":"Reference","previous_headings":"","what":"Control totals for the Louisville Vaccination Survey — lou_vax_survey_control_totals","title":"Control totals for the Louisville Vaccination Survey — lou_vax_survey_control_totals","text":"Control totals use raking post-stratification Louisville Vaccination Survey data. Control totals population size estimates ACS 2015-2019 5-year Public Use Microdata Sample (PUMS) specific demographic categories among adults Jefferson County, KY. data created using simulation.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey_control_totals.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Control totals for the Louisville Vaccination Survey — lou_vax_survey_control_totals","text":"","code":"data(lou_vax_survey_control_totals)"},{"path":"https://bschneidr.github.io/svrep/reference/lou_vax_survey_control_totals.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Control totals for the Louisville Vaccination Survey — lou_vax_survey_control_totals","text":"nested list object two lists, poststratification raking, contains two elements: estimates variance-covariance. poststratification Control totals combination RACE_ETHNICITY, SEX, EDUC_ATTAINMENT. estimates: numeric vector estimated population totals. variance-covariance: variance-covariance matrix estimated population totals. raking Separate control totals RACE_ETHNICITY, SEX, EDUC_ATTAINMENT. estimates: numeric vector estimated population totals. variance-covariance: variance-covariance matrix estimated population totals.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"Creates quadratic form matrix variance estimator balanced samples, proposed Deville Tillé (2005).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"","code":"make_deville_tille_matrix(probs, aux_vars)"},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"probs vector first-order inclusion probabilities aux_vars matrix auxiliary variables, number rows matching number elements probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"symmetric matrix whose dimension matches length probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"See Section 6.8 Tillé (2020) detail estimator, including explanation quadratic form. See Deville Tillé (2005) results simulation study comparing alternative estimators balanced sampling. estimator can written follows: $$ v(\\hat{Y})=\\sum_{k \\S} \\frac{c_k}{\\pi_k^2}\\left(y_k-\\hat{y}_k^*\\right)^2, $$ $$ \\hat{y}_k^*=\\mathbf{z}_k^{\\top}\\left(\\sum_{\\ell \\S} c_{\\ell} \\frac{\\mathbf{z}_{\\ell} \\mathbf{z}_{\\ell}^{\\prime}}{\\pi_{\\ell}^2}\\right)^{-1} \\sum_{\\ell \\S} c_{\\ell} \\frac{\\mathbf{z}_{\\ell} y_{\\ell}}{\\pi_{\\ell}^2} $$ \\(\\mathbf{z}_k\\) denotes vector auxiliary variables observation \\(k\\) included sample \\(S\\), inclusion probability \\(\\pi_k\\). value \\(c_k\\) set \\(\\frac{n}{n-q}(1-\\pi_k)\\), \\(n\\) number observations \\(q\\) number auxiliary variables. See Li, Chen, Krenzke (2014) example estimator's use basis generalized replication estimator. See Breidt Chauvet (2011) discussion alternative simulation-based estimators specific application variance estimation balanced samples selected using cube method.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_deville_tille_matrix.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Create a quadratic form's matrix\nfor a Deville-Tillé variance estimator for balanced samples — make_deville_tille_matrix","text":"- Breidt, F.J. Chauvet, G. (2011). \"Improved variance estimation balanced samples drawn via cube method.\" Journal Statistical Planning Inference, 141, 411-425. - Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. - Li, J., Chen, S., Krenzke, T. (2014). \"Replication Variance Estimation Balanced Sampling: Application PIAAC Study.\" Proceedings Survey Research Methods Section, 2014: 985–994. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf. - Tillé, Y. (2020). \"Sampling estimation finite populations.\" (. Hekimi, Trans.). Wiley.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":null,"dir":"Reference","previous_headings":"","what":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"Generate matrix replication factors using Fay's generalized replication method. method yields fully efficient variance estimator sufficient number replicates used.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"","code":"make_fays_gen_rep_factors( Sigma, max_replicates = Matrix::rankMatrix(Sigma) + 4, balanced = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"Sigma quadratic form matrix corresponding target variance estimator. Must positive semidefinite. max_replicates maximum number replicates allow. function attempt create minimum number replicates needed produce fully-efficient variance estimator. replicates needed max_replicates, full number replicates needed created, random subsample retained. balanced balanced=TRUE, replicates contribute equally variance estimates, number replicates needed may slightly increase.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"matrix replicate factors, number rows matching number rows Sigma number columns less equal max_replicates. calculate variance estimates using factors, use overall scale factor given calling attr(x, \"scale\") result.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"See Fay (1989) full explanation Fay's generalized replication method. documentation provides brief overview. Let \\(\\boldsymbol{\\Sigma}\\) quadratic form matrix target variance estimator, assumed positive semidefinite. Suppose rank \\(\\boldsymbol{\\Sigma}\\) \\(k\\), \\(\\boldsymbol{\\Sigma}\\) can represented spectral decomposition \\(k\\) eigenvectors eigenvalues, \\(r\\)-th eigenvector eigenvalue denoted \\(\\mathbf{v}_{(r)}\\) \\(\\lambda_r\\), respectively. $$ \\boldsymbol{\\Sigma} = \\sum_{r=1}^k \\lambda_r \\mathbf{v}_{(r)} \\mathbf{v^{\\prime}}_{(r)} $$ balanced = FALSE, let \\(\\mathbf{H}\\) denote identity matrix \\(k' = k\\) rows/columns. balanced = TRUE, let \\(\\mathbf{H}\\) Hadamard matrix (entries equal \\(1\\) \\(-1\\)), order \\(k^{\\prime} \\geq k\\). Let \\(\\mathbf{H}_{mr}\\) denote entry row \\(m\\) column \\(r\\) \\(\\mathbf{H}\\). \\(k^{\\prime}\\) replicates formed follows. Let \\(r\\) denote given replicate, \\(r = 1, ..., k^{\\prime}\\), let \\(c\\) denote positive constant (yet specified). \\(r\\)-th replicate adjustment factor \\(\\mathbf{f}_{r}\\) formed : $$ \\mathbf{f}_{r} = 1 + c \\sum_{m=1}^k H_{m r} \\lambda_{(m)}^{\\frac{1}{2}} \\mathbf{v}_{(m)} $$ balanced = FALSE, \\(c = 1\\). balanced = TRUE, \\(c = \\frac{1}{\\sqrt{k^{\\prime}}}\\). replicates negative, can use rescale_reps, recalculates replicate factors smaller value \\(c\\). \\(k^{\\prime}\\) replicates used, variance estimates calculated : $$ v_{rep}\\left(\\hat{T}_y\\right) = \\sum_{r=1}^{k^{\\prime}}\\left(\\hat{T}_y^{*(r)}-\\hat{T}_y\\right)^2 $$ population totals, replication variance estimator exactly match target variance estimator number replicates \\(k^{\\prime}\\) matches rank \\(\\Sigma\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"the-number-of-replicates","dir":"Reference","previous_headings":"","what":"The Number of Replicates","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"balanced=TRUE, number replicates created may need increase slightly. due fact Hadamard matrix order \\(k^{\\prime} \\geq k\\) used balance replicates, may necessary use order \\(k^{\\prime} > k\\). number replicates \\(k^{\\prime}\\) large practical purposes, one can simply retain random subset \\(R\\) \\(k^{\\prime}\\) replicates. case, variances calculated follows: $$ v_{rep}\\left(\\hat{T}_y\\right) = \\frac{k^{\\prime}}{R} \\sum_{r=1}^{R}\\left(\\hat{T}_y^{*(r)}-\\hat{T}_y\\right)^2 $$ happens max_replicates less matrix rank Sigma: random subset created replicates retained. Subsampling replicates recommended using balanced=TRUE, since case every replicate contributes equally variance estimates. balanced=FALSE, randomly subsampling replicates valid may produce large variation variance estimates since replicates case may vary greatly contribution variance estimates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"balanced=TRUE, Hadamard matrix used described . Hadamard matrix deterministically created using function hadamard() 'survey' package. However, order rows/columns randomly permuted forming replicates. general, column-ordering replicate weights random. ensure exact reproducibility, recommended call set.seed() using function.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"Fay, Robert. 1989. \"Theory Application Replicate Weighting Variance Calculations.\" , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_fays_gen_rep_factors.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Form replication factors using Fay's generalized replication method — make_fays_gen_rep_factors","text":"","code":"if (FALSE) { library(survey) # Load an example dataset that uses unequal probability sampling ---- data('election', package = 'survey') # Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ---- n <- nrow(election_pps) pi <- election_jointprob horvitz_thompson_matrix <- matrix(nrow = n, ncol = n) for (i in seq_len(n)) { for (j in seq_len(n)) { horvitz_thompson_matrix[i,j] <- 1 - (pi[i,i] * pi[j,j])/pi[i,j] } } ## Equivalently: horvitz_thompson_matrix <- make_quad_form_matrix( variance_estimator = \"Horvitz-Thompson\", joint_probs = election_jointprob ) # Make generalized replication adjustment factors ---- adjustment_factors <- make_fays_gen_rep_factors( Sigma = horvitz_thompson_matrix, max_replicates = 50 ) attr(adjustment_factors, 'scale') # Compute the Horvitz-Thompson estimate and the replication estimate ht_estimate <- svydesign(data = election_pps, ids = ~ 1, prob = diag(election_jointprob), pps = ppsmat(election_jointprob)) |> svytotal(x = ~ Kerry) rep_estimate <- svrepdesign( data = election_pps, weights = ~ wt, repweights = adjustment_factors, combined.weights = FALSE, scale = attr(adjustment_factors, 'scale'), rscales = rep(1, times = ncol(adjustment_factors)), type = \"other\", mse = TRUE ) |> svytotal(x = ~ Kerry) SE(rep_estimate) SE(ht_estimate) SE(rep_estimate) / SE(ht_estimate) }"},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":null,"dir":"Reference","previous_headings":"","what":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"Creates replicate factors generalized survey bootstrap method. generalized survey bootstrap method forming bootstrap replicate weights textbook variance estimator, provided variance estimator can represented quadratic form whose matrix positive semidefinite (covers large class variance estimators).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"","code":"make_gen_boot_factors(Sigma, num_replicates, tau = \"auto\", exact_vcov = FALSE)"},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"Sigma matrix quadratic form used represent variance estimator. Must positive semidefinite. num_replicates number bootstrap replicates create. tau Either \"auto\", single number. rescaling constant used avoid negative weights transformation \\(\\frac{w + \\tau - 1}{\\tau}\\), \\(w\\) original weight \\(\\tau\\) rescaling constant tau. tau=\"auto\", rescaling factor determined automatically follows: adjustment factors nonnegative, tau set equal 1; otherwise, tau set smallest value needed rescale adjustment factors least 0.01. exact_vcov exact_vcov=TRUE, replicate factors generated variance-covariance matrix exactly matches target variance estimator's quadratic form (within numeric precision). desirable causes variance estimates totals closely match values target variance estimator. requires num_replicates exceeds rank Sigma. replicate factors generated applying PCA-whitening collection draws multivariate Normal distribution, applying coloring transformation whitened collection draws.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"matrix number rows Sigma, number columns equal num_replicates. object attribute named tau can retrieved calling attr(= 'tau') object. value tau rescaling factor used avoid negative weights. addition, object attributes named scale rscales can passed directly svrepdesign. Note value scale \\(\\tau^2/B\\), value rscales vector length \\(B\\), every entry equal \\(1\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"Let \\(v( \\hat{T_y})\\) textbook variance estimator estimated population total \\(\\hat{T}_y\\) variable \\(y\\). base weight case \\(\\) sample \\(w_i\\), let \\(\\breve{y}_i\\) denote weighted value \\(w_iy_i\\). Suppose can represent textbook variance estimator quadratic form: \\(v(\\hat{T}_y) = \\breve{y}\\Sigma\\breve{y}^T\\), \\(n \\times n\\) matrix \\(\\Sigma\\). constraint \\(\\Sigma\\) , sample, must symmetric positive semidefinite. bootstrapping process creates \\(B\\) sets replicate weights, \\(b\\)-th set replicate weights vector length \\(n\\) denoted \\(\\mathbf{}^{(b)}\\), whose \\(k\\)-th value denoted \\(a_k^{(b)}\\). yields \\(B\\) replicate estimates population total, \\(\\hat{T}_y^{*(b)}=\\sum_{k \\s} a_k^{(b)} \\breve{y}_k\\), \\(b=1, \\ldots B\\), can used estimate sampling variance. $$ v_B\\left(\\hat{T}_y\\right)=\\frac{\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2}{B} $$ bootstrap variance estimator can written quadratic form: $$ v_B\\left(\\hat{T}_y\\right) =\\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}} $$ $$ \\boldsymbol{\\Sigma}_B = \\frac{\\sum_{b=1}^B\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)\\left(\\mathbf{}^{(b)}-\\mathbf{1}_n\\right)^{\\prime}}{B} $$ Note vector adjustment factors \\(\\mathbf{}^{(b)}\\) expectation \\(\\mathbf{1}_n\\) variance-covariance matrix \\(\\boldsymbol{\\Sigma}\\), bootstrap expectation \\(E_{*}\\left( \\boldsymbol{\\Sigma}_B \\right) = \\boldsymbol{\\Sigma}\\). Since bootstrap process takes sample values \\(\\breve{y}\\) fixed, bootstrap expectation variance estimator \\(E_{*} \\left( \\mathbf{\\breve{y}}^{\\prime}\\Sigma_B \\mathbf{\\breve{y}}\\right)= \\mathbf{\\breve{y}}^{\\prime}\\Sigma \\mathbf{\\breve{y}}\\). Thus, can produce bootstrap variance estimator expectation textbook variance estimator simply randomly generating \\(\\mathbf{}^{(b)}\\) distribution following two conditions: Condition 1: \\(\\quad \\mathbf{E}_*(\\mathbf{})=\\mathbf{1}_n\\) Condition 2: \\(\\quad \\mathbf{E}_*\\left(\\mathbf{}-\\mathbf{1}_n\\right)\\left(\\mathbf{}-\\mathbf{1}_n\\right)^{\\prime}=\\mathbf{\\Sigma}\\) multiple ways generate adjustment factors satisfying conditions, simplest general method simulate multivariate normal distribution: \\(\\mathbf{} \\sim MVN(\\mathbf{1}_n, \\boldsymbol{\\Sigma})\\). method used function.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"details-on-rescaling-to-avoid-negative-adjustment-factors","dir":"Reference","previous_headings":"","what":"Details on Rescaling to Avoid Negative Adjustment Factors","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"Let \\(\\mathbf{} = \\left[ \\mathbf{}^{(1)} \\cdots \\mathbf{}^{(b)} \\cdots \\mathbf{}^{(B)} \\right]\\) denote \\((n \\times B)\\) matrix bootstrap adjustment factors. eliminate negative adjustment factors, Beaumont Patak (2012) propose forming rescaled matrix nonnegative replicate factors \\(\\mathbf{}^S\\) rescaling adjustment factor \\(a_k^{(b)}\\) follows: $$ a_k^{S,(b)} = \\frac{a_k^{(b)} + \\tau - 1}{\\tau} $$ \\(\\tau \\geq 1 - a_k^{(b)} \\geq 1\\) \\(k\\) \\(\\left\\{ 1,\\ldots,n \\right\\}\\) \\(b\\) \\(\\left\\{1, \\ldots, B\\right\\}\\). value \\(\\tau\\) can set based realized adjustment factor matrix \\(\\mathbf{}\\) choosing \\(\\tau\\) prior generating adjustment factor matrix \\(\\mathbf{}\\) \\(\\tau\\) likely large enough prevent negative bootstrap weights. adjustment factors rescaled manner, important adjust scale factor used estimating variance bootstrap replicates, becomes \\(\\frac{\\tau^2}{B}\\) instead \\(\\frac{1}{B}\\). $$ \\textbf{Prior rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{1}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2 $$ $$ \\textbf{rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{\\tau^2}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{S*(b)}-\\hat{T}_y\\right)^2 $$ sharing dataset uses rescaled weights generalized survey bootstrap, documentation dataset instruct user use replication scale factor \\(\\frac{\\tau^2}{B}\\) rather \\(\\frac{1}{B}\\) estimating sampling variances.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"generalized survey bootstrap first proposed Bertail Combris (1997). See Beaumont Patak (2012) clear overview generalized survey bootstrap. generalized survey bootstrap represents one strategy forming replication variance estimators general framework proposed Fay (1984) Dippo, Fay, Morganstein (1984). - Beaumont, Jean-François, Zdenek Patak. 2012. “Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.” International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x. - Bertail, Combris. 1997. “Bootstrap Généralisé d’un Sondage.” Annales d’Économie Et de Statistique, . 46: 49. https://doi.org/10.2307/20076068. - Dippo, Cathryn, Robert Fay, David Morganstein. 1984. “Computing Variances Complex Samples Replicate Weights.” , 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf. - Fay, Robert. 1984. “Properties Estimates Variance Based Replication Methods.” , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_gen_boot_factors.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Creates replicate factors for the generalized survey bootstrap — make_gen_boot_factors","text":"","code":"if (FALSE) { library(survey) # Load an example dataset that uses unequal probability sampling ---- data('election', package = 'survey') # Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ---- n <- nrow(election_pps) pi <- election_jointprob horvitz_thompson_matrix <- matrix(nrow = n, ncol = n) for (i in seq_len(n)) { for (j in seq_len(n)) { horvitz_thompson_matrix[i,j] <- 1 - (pi[i,i] * pi[j,j])/pi[i,j] } } ## Equivalently: horvitz_thompson_matrix <- make_quad_form_matrix( variance_estimator = \"Horvitz-Thompson\", joint_probs = election_jointprob ) # Make generalized bootstrap adjustment factors ---- bootstrap_adjustment_factors <- make_gen_boot_factors( Sigma = horvitz_thompson_matrix, num_replicates = 80, tau = 'auto' ) # Determine replication scale factor for variance estimation ---- tau <- attr(bootstrap_adjustment_factors, 'tau') B <- ncol(bootstrap_adjustment_factors) replication_scaling_constant <- tau^2 / B # Create a replicate design object ---- election_pps_bootstrap_design <- svrepdesign( data = election_pps, weights = 1 / diag(election_jointprob), repweights = bootstrap_adjustment_factors, combined.weights = FALSE, type = \"other\", scale = attr(bootstrap_adjustment_factors, 'scale'), rscales = attr(bootstrap_adjustment_factors, 'rscales') ) # Compare estimates to Horvitz-Thompson estimator ---- election_pps_ht_design <- svydesign( id = ~1, fpc = ~p, data = election_pps, pps = ppsmat(election_jointprob), variance = \"HT\" ) svytotal(x = ~ Bush + Kerry, design = election_pps_bootstrap_design) svytotal(x = ~ Bush + Kerry, design = election_pps_ht_design) }"},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"Several variance estimators designs use unequal probability sampling without replacement (.e., PPSWOR), variance estimation tends accurate using approximation estimator uses first-order inclusion probabilities (.e., basic sampling weights) ignores joint inclusion probabilities. function returns matrix quadratic form used represent variance estimators.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"","code":"make_ppswor_approx_matrix(probs, method = \"Deville-1\")"},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"probs vector first-order inclusion probabilities method string specifying approximation method use. See \"Details\" section . Options include: \"Deville-1\" \"Deville-2\"","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"symmetric matrix whose dimension matches length probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"variance estimators shown effective designs use fixed sample size high-entropy sampling method. includes PPSWOR sampling methods, unequal-probability systematic sampling important exception. variance estimators generally take following form: $$ \\hat{v}(\\hat{Y}) = \\sum_{=1}^{n} c_i (\\breve{y}_i - \\frac{1}{\\sum_{=k}^{n}c_k}\\sum_{k=1}^{n}c_k \\breve{y}_k)^2 $$ \\(\\breve{y}_i = y_i/\\pi_i\\) weighted value variable interest, \\(c_i\\) constants depend approximation method used. matrix quadratic form, denoted \\(\\Sigma\\), \\(ij\\)-th entry defined follows: $$ \\sigma_{ii} = c_i (1 - \\frac{c_i}{\\sum_{k=1}^{n}c_k}) \\textit{ } = j \\\\ \\sigma_{ij}=\\frac{-c_i c_j}{\\sum_{k=1}^{n}c_k} \\textit{ } \\neq j \\\\ $$ \\(\\pi_{} = 1\\) every unit, \\(\\sigma_{ij}=0\\) \\(,j\\). one sampling unit, \\(\\sigma_{11}=0\\); , unit treated sampled certainty. constants \\(c_i\\) defined approximation method follows, names taken directly Matei Tillé (2005). \"Deville-1\": $$c_i=\\left(1-\\pi_i\\right) \\frac{n}{n-1}$$ \"Deville-2\": $$c_i = (1-\\pi_i) \\left[1 - \\sum_{k=1}^{n} \\left(\\frac{1-\\pi_k}{\\sum_{k=1}^{n}(1-\\pi_k)}\\right)^2 \\right]^{-1}$$ approximations \"Deville-1\" \"Deville-2\" shown simulation studies Matei Tillé (2005) perform much better terms MSE compared strictly-unbiased Horvitz-Thompson Yates-Grundy variance estimators. case simple random sampling without replacement (SRSWOR), estimators identical usual Horvitz-Thompson variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_ppswor_approx_matrix.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Create a quadratic form's matrix to represent a variance estimator\nfor PPSWOR designs, based on commonly-used approximations — make_ppswor_approx_matrix","text":"Matei, Alina, Yves Tillé. 2005. “Evaluation Variance Approximations Estimators Maximum Entropy Sampling Unequal Probability Fixed Sample Size.” Journal Official Statistics 21(4):543–70.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"Common variance estimators estimated population totals can represented quadratic form. Given choice variance estimator information sample design, function constructs matrix quadratic form. notation, let \\(v(\\hat{Y}) = \\mathbf{\\breve{y}}^{\\prime}\\mathbf{\\Sigma}\\mathbf{\\breve{y}}\\), \\(\\breve{y}\\) vector weighted values, \\(y_i/\\pi_i, \\space =1,\\dots,n\\). function constructs \\(n \\times n\\) matrix quadratic form, \\(\\mathbf{\\Sigma}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"","code":"make_quad_form_matrix( variance_estimator = \"Yates-Grundy\", probs = NULL, joint_probs = NULL, cluster_ids = NULL, strata_ids = NULL, strata_pop_sizes = NULL, sort_order = NULL, aux_vars = NULL )"},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"variance_estimator name variance estimator whose quadratic form matrix created. See section \"Variance Estimators\" . Options include: \"Yates-Grundy\": Yates-Grundy variance estimator based first-order second-order inclusion probabilities. used, argument joint_probs must also used. \"Horvitz-Thompson\": Horvitz-Thompson variance estimator based first-order second-order inclusion probabilities. used, argument joint_probs must also used. \"Stratified Multistage SRS\": usual stratified multistage variance estimator based estimating variance cluster totals within strata stage. option used, necessary also use arguments strata_ids, cluster_ids, strata_pop_sizes, strata_pop_sizes. \"Ultimate Cluster\": usual variance estimator based estimating variance first-stage cluster totals within first-stage strata. option used, necessary also use arguments strata_ids, cluster_ids, strata_pop_sizes. Optionally, use finite population correction factors, one can also use argument strata_pop_sizes. \"Deville-1\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 1\". option used, necessary also use arguments strata_ids, cluster_ids, probs. \"Deville-2\": variance estimator unequal-probability sampling without replacement, described Matei Tillé (2005) \"Deville 2\". option used, necessary also use arguments strata_ids, cluster_ids, probs. \"SD1\": non-circular successive-differences variance estimator described Ash (2014), sometimes used variance estimation systematic sampling. \"SD2\": circular successive-differences variance estimator described Ash (2014). estimator basis \"successive-differences replication\" estimator commonly used variance estimation systematic sampling. \"Deville-Tille\": estimator Deville Tillé (2005), developed balanced sampling using cube method. probs Required variance_estimator equals \"Deville-1\", \"Deville-2\", \"Breidt-Chauvet\". matrix data frame sampling probabilities. multiple stages sampling, probs can multiple columns, one column level sampling accounted variance estimator. joint_probs used variance_estimator = \"Horvitz-Thompson\" variance_estimator = \"Yates-Grundy\". matrix joint inclusion probabilities. Element [,] matrix first-order inclusion probability unit , element [,j] joint inclusion probability units j. cluster_ids Required unless variance_estimator equals \"Horvitz-Thompson\" \"Yates-Grundy\". matrix data frame cluster IDs. multiple stages sampling, cluster_ids can multiple columns, one column level sampling accounted variance estimator. strata_ids Required variance_estimator equals \"Stratified Multistage SRS\" \"Ultimate Cluster\". matrix data frame strata IDs. multiple stages sampling, strata_ids can multiple columns, one column level sampling accounted variance estimator. strata_pop_sizes Required variance_estimator equals \"Stratified Multistage SRS\", can optionally used variance_estimator equals \"Ultimate Cluster\", \"SD1\", \"SD2\". multiple stages sampling, strata_pop_sizes can multiple columns, one column level sampling accounted variance estimator. sort_order Required variance_estimator equals \"SD1\" \"SD2\". vector orders rows data order used sampling. aux_vars Required variance_estimator equals \"Deville-Tille\". matrix auxiliary variables.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"matrix quadratic form representing variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"variance-estimators","dir":"Reference","previous_headings":"","what":"Variance Estimators","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"See variance-estimators description variance estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"arguments-required-for-each-variance-estimator","dir":"Reference","previous_headings":"","what":"Arguments required for each variance estimator","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"arguments required optional variance estimator.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_quad_form_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Represent a variance estimator as a quadratic form — make_quad_form_matrix","text":"","code":"if (FALSE) { # Example 1: The Horvitz-Thompson Estimator library(survey) data(\"election\", package = \"survey\") ht_quad_form_matrix <- make_quad_form_matrix(variance_estimator = \"Horvitz-Thompson\", joint_probs = election_jointprob) ##_ Produce variance estimate wtd_y <- as.matrix(election_pps$wt * election_pps$Bush) t(wtd_y) %*% ht_quad_form_matrix %*% wtd_y ##_ Compare against result from 'survey' package svytotal(x = ~ Bush, design = svydesign(data=election_pps, variance = \"HT\", pps = ppsmat(election_jointprob), ids = ~ 1, fpc = ~ p)) |> vcov() # Example 2: Stratified multistage Sample ---- data(\"mu284\", package = 'survey') multistage_srswor_design <- svydesign(data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2) multistage_srs_quad_form <- make_quad_form_matrix( variance_estimator = \"Stratified Multistage SRS\", cluster_ids = mu284[,c('id1', 'id2')], strata_ids = matrix(1, nrow = nrow(mu284), ncol = 2), strata_pop_sizes = mu284[,c('n1', 'n2')] ) wtd_y <- as.matrix(weights(multistage_srswor_design) * mu284$y1) t(wtd_y) %*% multistage_srs_quad_form %*% wtd_y ##_ Compare against result from 'survey' package svytotal(x = ~ y1, design = multistage_srswor_design) |> vcov() # Example 3: Successive-differences estimator ---- data('library_stsys_sample', package = 'svrep') sd1_quad_form <- make_quad_form_matrix( variance_estimator = 'SD1', cluster_ids = library_stsys_sample[,'FSCSKEY',drop=FALSE], strata_ids = library_stsys_sample[,'SAMPLING_STRATUM',drop=FALSE], strata_pop_sizes = library_stsys_sample[,'STRATUM_POP_SIZE',drop=FALSE], sort_order = library_stsys_sample[['SAMPLING_SORT_ORDER']] ) wtd_y <- as.matrix(library_stsys_sample[['TOTCIR']] / library_stsys_sample$SAMPLING_PROB) wtd_y[is.na(wtd_y)] <- 0 t(wtd_y) %*% sd1_quad_form %*% wtd_y # Example 4: Deville estimators ---- data('library_multistage_sample', package = 'svrep') deville_quad_form <- make_quad_form_matrix( variance_estimator = 'Deville-1', cluster_ids = library_multistage_sample[,c(\"PSU_ID\", \"SSU_ID\")], strata_ids = cbind(rep(1, times = nrow(library_multistage_sample)), library_multistage_sample$PSU_ID), probs = library_multistage_sample[,c(\"PSU_SAMPLING_PROB\", \"SSU_SAMPLING_PROB\")] ) }"},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":null,"dir":"Reference","previous_headings":"","what":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"Creates bootstrap replicate weights multistage stratified sample design using method Beaumont Émond (2022), generalization Rao-Wu-Yue bootstrap. design may different sampling methods used different stages. stage sampling may potentially use unequal probabilities (without replacement) may potentially use Poisson sampling.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"","code":"make_rwyb_bootstrap_weights( num_replicates = 100, samp_unit_ids, strata_ids, samp_unit_sel_probs, samp_method_by_stage = rep(\"PPSWOR\", times = ncol(samp_unit_ids)), allow_final_stage_singletons = TRUE, output = \"weights\" )"},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"num_replicates Positive integer giving number bootstrap replicates create samp_unit_ids Matrix data frame sampling unit IDs stage sampling strata_ids Matrix data frame strata IDs sampling unit stage sampling samp_unit_sel_probs Matrix data frame selection probabilities sampling unit stage sampling. samp_method_by_stage vector length equal number stages sampling, corresponding number columns samp_unit_ids. describes method sampling used stage. element one following: \"SRSWOR\" - Simple random sampling, without replacement \"SRSWR\" - Simple random sampling, replacement \"PPSWOR\" - Unequal probabilities selection, without replacement \"PPSWR\" - Unequal probabilities selection, replacement \"Poisson\" - Poisson sampling: sampling unit selected sample , potentially different probabilities inclusion sampling unit. allow_final_stage_singletons Logical value indicating whether allow non-certainty singleton strata final sampling stage (rather throw error message). TRUE, sampling unit non-certainty singleton stratum final-stage adjustment factor calculated selected certainty final stage (.e., adjustment factor 1), final bootstrap weight calculated combining adjustment factor final-stage selection probability. output Either \"weights\" (default) \"factors\". Specifying output = \"factors\" returns matrix replicate adjustment factors can later multiplied full-sample weights produce matrix replicate weights. Specifying output = \"weights\" returns matrix replicate weights, full-sample weights inferred using samp_unit_sel_probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"matrix number rows samp_unit_ids number columns equal value argument num_replicates. Specifying output = \"factors\" returns matrix replicate adjustment factors can later multiplied full-sample weights produce matrix replicate weights. Specifying output = \"weights\" returns matrix replicate weights, full-sample weights inferred using samp_unit_sel_probs.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"Beaumont Émond (2022) describe general algorithm forming bootstrap replicate weights multistage stratified samples, based method Rao-Wu-Yue, extensions sampling without replacement use unequal probabilities selection (.e., sampling probability proportional size) well Poisson sampling. methods guaranteed produce nonnegative replicate weights provide design-unbiased design-consistent variance estimates totals, designs sampling uses one following methods: \"SRSWOR\" - Simple random sampling, without replacement \"SRSWR\" - Simple random sampling, replacement \"PPSWR\" - Unequal probabilities selection, replacement \"Poisson\" - Poisson sampling: sampling unit selected sample , potentially different probabilities inclusion sampling unit. designs least one stage's strata sampling without replacement unequal probabilities selection (\"PPSWOR\"), bootstrap method Beaumont Émond (2022) guaranteed produce nonnegative weights, design-unbiased, since method approximates joint selection probabilities needed unbiased estimation. Unless stages use simple random sampling without replacement, resulting bootstrap replicate weights guaranteed strictly positive, may useful calibration analyses domains small sample sizes. stages use simple random sampling without replacement, possible replicate weights zero. survey nonresponse, may useful represent response/nonresponse additional stage sampling, sampling conducted Poisson sampling unit's \"selection probability\" stage response propensity (typically estimated).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"Beaumont, J.-F.; Émond, N. (2022). \"Bootstrap Variance Estimation Method Multistage Sampling Two-Phase Sampling Poisson Sampling Used Second Phase.\" Stats, 5: 339–357. https://doi.org/10.3390/stats5020019 Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). \"recent work resampling methods complex surveys.\" Surv. Methodol., 18: 209–217.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_rwyb_bootstrap_weights.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create bootstrap replicate weights for a general survey design,\nusing the Rao-Wu-Yue-Beaumont bootstrap method — make_rwyb_bootstrap_weights","text":"","code":"if (FALSE) { library(survey) # Example 1: A multistage sample with two stages of SRSWOR ## Load an example dataset from a multistage sample, with two stages of SRSWOR data(\"mu284\", package = 'survey') multistage_srswor_design <- svydesign(data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2) ## Create bootstrap replicate weights set.seed(2022) bootstrap_replicate_weights <- make_rwyb_bootstrap_weights( num_replicates = 5000, samp_unit_ids = multistage_srswor_design$cluster, strata_ids = multistage_srswor_design$strata, samp_unit_sel_probs = multistage_srswor_design$fpc$sampsize / multistage_srswor_design$fpc$popsize, samp_method_by_stage = c(\"SRSWOR\", \"SRSWOR\") ) ## Create a replicate design object with the survey package bootstrap_rep_design <- svrepdesign( data = multistage_srswor_design$variables, repweights = bootstrap_replicate_weights, weights = weights(multistage_srswor_design, type = \"sampling\"), type = \"bootstrap\" ) ## Compare std. error estimates from bootstrap versus linearization data.frame( 'Statistic' = c('total', 'mean', 'median'), 'SE (bootstrap)' = c(SE(svytotal(x = ~ y1, design = bootstrap_rep_design)), SE(svymean(x = ~ y1, design = bootstrap_rep_design)), SE(svyquantile(x = ~ y1, quantile = 0.5, design = bootstrap_rep_design))), 'SE (linearization)' = c(SE(svytotal(x = ~ y1, design = multistage_srswor_design)), SE(svymean(x = ~ y1, design = multistage_srswor_design)), SE(svyquantile(x = ~ y1, quantile = 0.5, design = multistage_srswor_design))), check.names = FALSE ) # Example 2: A single-stage sample selected with unequal probabilities, without replacement ## Load an example dataset of U.S. counties states with 2004 Presidential vote counts data(\"election\", package = 'survey') pps_wor_design <- svydesign(data = election_pps, pps = \"overton\", fpc = ~ p, # Inclusion probabilities ids = ~ 1) ## Create bootstrap replicate weights set.seed(2022) bootstrap_replicate_weights <- make_rwyb_bootstrap_weights( num_replicates = 5000, samp_unit_ids = pps_wor_design$cluster, strata_ids = pps_wor_design$strata, samp_unit_sel_probs = pps_wor_design$prob, samp_method_by_stage = c(\"PPSWOR\") ) ## Create a replicate design object with the survey package bootstrap_rep_design <- svrepdesign( data = pps_wor_design$variables, repweights = bootstrap_replicate_weights, weights = weights(pps_wor_design, type = \"sampling\"), type = \"bootstrap\" ) ## Compare std. error estimates from bootstrap versus linearization data.frame( 'Statistic' = c('total', 'mean'), 'SE (bootstrap)' = c(SE(svytotal(x = ~ Bush, design = bootstrap_rep_design)), SE(svymean(x = ~ I(Bush/votes), design = bootstrap_rep_design))), 'SE (Overton\\'s PPS approximation)' = c(SE(svytotal(x = ~ Bush, design = pps_wor_design)), SE(svymean(x = ~ I(Bush/votes), design = pps_wor_design))), check.names = FALSE ) }"},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"successive-difference variance estimator can represented quadratic form. function determines matrix quadratic form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"","code":"make_sd_matrix(n, f = 0, type = \"SD1\")"},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"n Number rows columns matrix f single number 0 1, representing sampling fraction. Default value 0. type Either \"SD1\" \"SD2\". See \"Details\" section definitions.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"matrix dimension n","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"Ash (2014) describes estimator follows: $$ \\hat{v}_{SD1}(\\hat{Y}) = (1-f) \\frac{n}{2(n-1)} \\sum_{k=2}^n\\left(\\breve{y}_k-\\breve{y}_{k-1}\\right)^2 $$ $$ \\hat{v}_{SD2}(\\hat{Y}) = \\frac{1}{2}(1-f)\\left[\\sum_{k=2}^n\\left(\\breve{y}_k-\\breve{y}_{k-1}\\right)^2+\\left(\\breve{y}_n-\\breve{y}_1\\right)^2\\right] $$ \\(\\breve{y}_k\\) weighted value \\(y_k/\\pi_k\\) unit \\(k\\) selection probability \\(\\pi_k\\), \\(f\\) sampling fraction \\(\\frac{n}{N}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_sd_matrix.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Create a quadratic form's matrix to represent a successive-difference variance estimator — make_sd_matrix","text":"Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"usual variance estimator simple random sampling without replacement can represented quadratic form. function determines matrix quadratic form.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"","code":"make_srswor_matrix(n, f = 0)"},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"n Sample size f single number 0 1, representing sampling fraction. Default value 0.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"symmetric matrix dimension n","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_srswor_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a quadratic form's matrix to represent the basic variance estimator\nfor a total under simple random sampling without replacement — make_srswor_matrix","text":"basic variance estimator total simple random sampling without replacement follows: $$ \\hat{v}(\\hat{Y}) = (1 - f)\\frac{n}{n - 1} \\sum_{=1}^{n} (y_i - \\bar{y})^2 $$ \\(f\\) sampling fraction \\(\\frac{n}{N}\\). \\(f=0\\), matrix quadratic form non-diagonal elements equal \\(-(n-1)^{-1}\\), diagonal elements equal \\(1\\). \\(f > 0\\), element multiplied \\((1-f)\\). \\(n=1\\), function returns \\(1 \\times 1\\) matrix whose sole element equals \\(0\\) (essentially treating sole sampled unit selection made probability \\(1\\)).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":null,"dir":"Reference","previous_headings":"","what":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"function combines quadratic forms phase two phase design, combined variance entire two-phase sampling design can estimated.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"","code":"make_twophase_quad_form( sigma_1, sigma_2, phase_2_joint_probs, ensure_psd = TRUE )"},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"sigma_1 quadratic form first phase variance estimator, subsetted include cases selected phase two sample. sigma_2 quadratic form second phase variance estimator, conditional selection first phase sample. phase_2_joint_probs matrix conditional joint inclusion probabilities second phase, given selected first phase sample. ensure_psd TRUE (default), ensures result positive semidefinite matrix. necessary quadratic form used input replication methods generalized bootstrap. details, see help section entitled \"Ensuring Result Positive Semidefinite\".","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"quadratic form matrix can used estimate sampling variance two-phase sample design.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"two-phase variance estimator quadratic form matrix \\(\\boldsymbol{\\Sigma}_{ab}\\) given : $$ \\boldsymbol{\\Sigma}_{ab} = {W}^{-1}_b(\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b ){W}^{-1}_b + \\boldsymbol{\\Sigma}_b $$ first term estimates variance contribution first phase sampling, second term estimates variance contribution second phase sampling. full quadratic form variance estimator : $$ v(\\hat{t_y}) = \\breve{\\breve{y^{'}}} \\boldsymbol{\\Sigma}_{ab} \\breve{\\breve{y}} $$ weighted variable \\(\\breve{\\breve{y}}_k = \\frac{y_k}{\\pi_{ak}\\pi_{bk}}\\), formed using first phase inclusion probability, denoted \\(\\pi_{ak}\\), conditional second phase inclusion probability (given selected first phase sample), denoted \\(\\pi_{bk}\\). notation estimator follows: \\(n_a\\) denotes first phase sample size. \\(n_b\\) denotes second phase sample size. \\(\\boldsymbol{\\Sigma}_a\\) denotes matrix dimension \\(n_a \\times n_a\\) representing quadratic form variance estimator used full first-phase design. \\(\\boldsymbol{\\Sigma}_{^\\prime}\\) denotes matrix dimension \\(n_b \\times n_b\\) formed subsetting rows columns \\(\\boldsymbol{\\Sigma}_a\\) include cases selected second-phase sample. \\(\\boldsymbol{\\Sigma}_{b}\\) denotes matrix dimension \\(n_b \\times n_b\\) representing Horvitz-Thompson estimator variance second-phase sample, conditional selected first-phase sample. \\(\\boldsymbol{D}_b\\) denotes \\(n_b \\times n_b\\) matrix weights formed inverses second-phase joint inclusion probabilities, element \\(kl\\) equal \\(\\pi_{bkl}^{-1}\\), \\(\\pi_{bkl}\\) conditional probability units \\(k\\) \\(l\\) included second-phase sample, given selected first-phase sample. Note matrix often positive semidefinite, two-phase variance estimator quadratic form necessarily positive semidefinite. \\(\\boldsymbol{W}_b\\) denotes diagonal \\(n_b \\times n_b\\) matrix whose \\(k\\)-th diagonal entry second-phase weight \\(\\pi_{bk}^{-1}\\), \\(\\pi_{bk}\\) conditional probability unit \\(k\\) included second-phase sample, given selected first-phase sample.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"ensuring-the-result-is-positive-semidefinite","dir":"Reference","previous_headings":"","what":"Ensuring the Result is Positive Semidefinite","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"Note matrix \\((\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b )\\) may positive semidefinite, since matrix \\(D_b\\) guaranteed positive semidefinite. \\((\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b )\\) found positive semidefinite, approximated nearest positive semidefinite matrix Frobenius norm, using method Higham (1988). approximation discussed Beaumont Patak (2012) context forming replicate weights two-phase samples. authors argue approximation lead small overestimation variance. Since \\((\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b )\\) real, symmetric matrix, equivalent \"zeroing \" negative eigenvalues. precise, denote \\(=(\\boldsymbol{\\Sigma}_{^\\prime} \\circ D_b )\\). can form spectral decomposition \\(=\\Gamma \\Lambda \\Gamma^{\\prime}\\), \\(\\Lambda\\) diagonal matrix whose entries eigenvalues \\(\\). method Higham (1988) approximate \\(\\) \\(\\tilde{} = \\Gamma \\Lambda_{+} \\Gamma^{\\prime}\\), \\(ii\\)-th entry \\(\\Lambda_{+}\\) \\(\\max(\\Lambda_{ii}, 0)\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"See Section 7.5 Tillé (2020) Section 9.3 Särndal, Swensson, Wretman (1992) overview variance estimation two-phase sampling. case Horvitz-Thompson variance estimator used phases, method used function equivalent equation (9.3.8) Särndal, Swensson, Wretman (1992) equation (7.7) Tillé (2020). However, function can used combination first-phase second-phase variance estimators, provided joint inclusion probabilities second-phase design available nonzero. Beaumont, Jean-François, Zdenek Patak. (2012). “Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.” International Statistical Review 80 (1): 127–48. Higham, N. J. (1988). \"Computing nearest symmetric positive semidefinite matrix.\" Linear Algebra Applications, 103, 103–118. Särndal, C.-E., Swensson, B., & Wretman, J. (1992). \"Model Assisted Survey Sampling.\" Springer New York. Tillé, Y. (2020). \"Sampling estimation finite populations.\" (. Hekimi, Trans.). Wiley.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/make_twophase_quad_form.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Combine quadratic forms from each phase of a two phase design — make_twophase_quad_form","text":"","code":"if (FALSE) { ## ---------------------- Example 1 ------------------------## ## First phase is a stratified multistage sample ## ## Second phase is a simple random sample ## ##----------------------------------------------------------## data('library_multistage_sample', package = 'svrep') # Load first-phase sample twophase_sample <- library_multistage_sample # Select second-phase sample set.seed(2022) twophase_sample[['SECOND_PHASE_SELECTION']] <- sampling::srswor( n = 100, N = nrow(twophase_sample) ) |> as.logical() # Declare survey design twophase_design <- twophase( method = \"full\", data = twophase_sample, # Identify the subset of first-phase elements # which were selected into the second-phase sample subset = ~ SECOND_PHASE_SELECTION, # Describe clusters, probabilities, and population sizes # at each phase of sampling id = list(~ PSU_ID + SSU_ID, ~ 1), probs = list(~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB, NULL), fpc = list(~ PSU_POP_SIZE + SSU_POP_SIZE, NULL) ) # Get quadratic form matrix for the first phase design first_phase_sigma <- get_design_quad_form( design = twophase_design$phase1$full, variance_estimator = \"Stratified Multistage SRS\" ) # Subset to only include cases sampled in second phase first_phase_sigma <- first_phase_sigma[twophase_design$subset, twophase_design$subset] # Get quadratic form matrix for the second-phase design second_phase_sigma <- get_design_quad_form( design = twophase_design$phase2, variance_estimator = \"Ultimate Cluster\" ) # Get second-phase joint probabilities n <- twophase_design$phase2$fpc$sampsize[1,1] N <- twophase_design$phase2$fpc$popsize[1,1] second_phase_joint_probs <- Matrix::Matrix((n/N)*((n-1)/(N-1)), nrow = n, ncol = n) diag(second_phase_joint_probs) <- rep(n/N, times = n) # Get quadratic form for entire two-phase variance estimator twophase_quad_form <- make_twophase_quad_form( sigma_1 = first_phase_sigma, sigma_2 = second_phase_sigma, phase_2_joint_probs = second_phase_joint_probs ) # Use for variance estimation rep_factors <- make_gen_boot_factors( Sigma = twophase_quad_form, num_replicates = 500 ) library(survey) combined_weights <- 1/twophase_design$prob twophase_rep_design <- svrepdesign( data = twophase_sample |> subset(SECOND_PHASE_SELECTION), type = 'other', repweights = rep_factors, weights = combined_weights, combined.weights = FALSE, scale = attr(rep_factors, 'scale'), rscales = attr(rep_factors, 'rscales') ) svymean(x = ~ LIBRARIA, design = twophase_rep_design) ## ---------------------- Example 2 ------------------------## ## First phase is a stratified systematic sample ## ## Second phase is nonresponse, modeled as Poisson sampling ## ##----------------------------------------------------------## data('library_stsys_sample', package = 'svrep') # Determine quadratic form for full first-phase sample variance estimator full_phase1_quad_form <- make_quad_form_matrix( variance_estimator = \"SD2\", cluster_ids = library_stsys_sample[,'FSCSKEY',drop=FALSE], strata_ids = library_stsys_sample[,'SAMPLING_STRATUM',drop=FALSE], strata_pop_sizes = library_stsys_sample[,'STRATUM_POP_SIZE',drop=FALSE], sort_order = library_stsys_sample$SAMPLING_SORT_ORDER ) # Identify cases included in phase two sample # (in this example, respondents) phase2_inclusion <- ( library_stsys_sample$RESPONSE_STATUS == \"Survey Respondent\" ) phase2_sample <- library_stsys_sample[phase2_inclusion,] # Estimate response propensities response_propensities <- glm( data = library_stsys_sample, family = quasibinomial('logit'), formula = phase2_inclusion ~ 1, weights = 1/library_stsys_sample$SAMPLING_PROB ) |> predict(type = \"response\", newdata = phase2_sample) # Estimate conditional joint inclusion probabilities for second phase phase2_joint_probs <- outer(response_propensities, response_propensities) diag(phase2_joint_probs) <- response_propensities # Determine quadratic form for variance estimator of second phase # (Horvitz-Thompson estimator for nonresponse modeled as Poisson sampling) phase2_quad_form <- make_quad_form_matrix( variance_estimator = \"Horvitz-Thompson\", joint_probs = phase2_joint_probs ) # Create combined quadratic form for entire design twophase_quad_form <- make_twophase_quad_form( sigma_1 = full_phase1_quad_form[phase2_inclusion, phase2_inclusion], sigma_2 = phase2_quad_form, phase_2_joint_probs = phase2_joint_probs ) combined_weights <- 1/(phase2_sample$SAMPLING_PROB * response_propensities) # Use for variance estimation rep_factors <- make_gen_boot_factors( Sigma = twophase_quad_form, num_replicates = 500 ) library(survey) twophase_rep_design <- svrepdesign( data = phase2_sample, type = 'other', repweights = rep_factors, weights = combined_weights, combined.weights = FALSE, scale = attr(rep_factors, 'scale'), rscales = attr(rep_factors, 'rscales') ) svymean(x = ~ LIBRARIA, design = twophase_rep_design) }"},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":null,"dir":"Reference","previous_headings":"","what":"Redistribute weight from one group to another — redistribute_weights","title":"Redistribute weight from one group to another — redistribute_weights","text":"Redistributes weight one group another: example, non-respondents respondents. Redistribution conducted full-sample weights well set replicate weights. can done separately combination set grouping variables, example implement nonresponse weighting class adjustment.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Redistribute weight from one group to another — redistribute_weights","text":"","code":"redistribute_weights(design, reduce_if, increase_if, by)"},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Redistribute weight from one group to another — redistribute_weights","text":"design survey design object, created either survey srvyr packages. reduce_if expression indicating cases weights set zero. Must evaluate logical vector values TRUE FALSE. increase_if expression indicating cases weights increased. Must evaluate logical vector values TRUE FALSE. (Optional) character vector names variables used group redistribution weights. example, data include variables named \"stratum\" \"wt_class\", one specify = c(\"stratum\", \"wt_class\").","code":""},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Redistribute weight from one group to another — redistribute_weights","text":"survey design object, updated full-sample weights updated replicate weights. resulting survey design object always value combined.weights set TRUE.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Redistribute weight from one group to another — redistribute_weights","text":"See Chapter 2 Heeringa, West, Berglund (2017) Chapter 13 Valliant, Dever, Kreuter (2018) overview nonresponse adjustment methods based redistributing weights. - Heeringa, S., West, B., Berglund, P. (2017). Applied Survey Data Analysis, 2nd edition. Boca Raton, FL: CRC Press. \"Applied Survey Data Analysis, 2nd edition.\" Boca Raton, FL: CRC Press. - Valliant, R., Dever, J., Kreuter, F. (2018). \"Practical Tools Designing Weighting Survey Samples, 2nd edition.\" New York: Springer.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/redistribute_weights.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Redistribute weight from one group to another — redistribute_weights","text":"","code":"# Load example data suppressPackageStartupMessages(library(survey)) data(api) dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) dclus1$variables$response_status <- sample(x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), size = nrow(dclus1), replace = TRUE) rep_design <- as.svrepdesign(dclus1) # Adjust weights for cases with unknown eligibility ue_adjusted_design <- redistribute_weights( design = rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\"), by = c(\"stype\") ) # Adjust weights for nonresponse nr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status %in% c(\"Nonrespondent\"), increase_if = response_status == \"Respondent\", by = c(\"stype\") )"},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":null,"dir":"Reference","previous_headings":"","what":"Rescale replicate factors — rescale_reps","title":"Rescale replicate factors — rescale_reps","text":"Rescale replicate factors. main application rescaling ensure replicate weights strictly positive. Note rescaling impact variance estimates totals (linear statistics), variance estimates nonlinear statistics affected rescaling.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Rescale replicate factors — rescale_reps","text":"","code":"rescale_reps(x, tau = NULL, min_wgt = 0.01, digits = 2)"},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Rescale replicate factors — rescale_reps","text":"x Either replicate survey design object, numeric matrix replicate weights. tau Either single positive number, NULL. rescaling constant \\(\\tau\\) used transformation \\(\\frac{w + \\tau - 1}{\\tau}\\), \\(w\\) original weight. tau=NULL left unspecified, argument min_wgt used instead, case, \\(\\tau\\) automatically set smallest value needed rescale replicate weights least min_wgt. min_wgt used tau=NULL tau left unspecified. Specifies minimum acceptable value rescaled weights, used automatically determine value \\(\\tau\\) used transformation \\(\\frac{w + \\tau - 1}{\\tau}\\), \\(w\\) original weight. Must least zero must less one. digits used argument min_wgt used. Specifies number decimal places use choosing tau. Using smaller number digits useful simply producing easier--read documentation.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Rescale replicate factors — rescale_reps","text":"input numeric matrix, returns rescaled matrix. input replicate survey design object, returns updated replicate survey design object. replicate survey design object, results depend whether object matrix replicate factors rather matrix replicate weights (product replicate factors sampling weights). design object combined.weights=FALSE, replication factors adjusted. design object combined.weights=TRUE, replicate weights adjusted. strongly recommended use rescaling method replication factors rather weights. replicate survey design object, scale element design object updated appropriately, element tau also added. input matrix instead survey design object, result matrix attribute named tau can retrieved using attr(x, 'tau').","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Rescale replicate factors — rescale_reps","text":"Let \\(\\mathbf{} = \\left[ \\mathbf{}^{(1)} \\cdots \\mathbf{}^{(b)} \\cdots \\mathbf{}^{(B)} \\right]\\) denote \\((n \\times B)\\) matrix replicate adjustment factors. eliminate negative adjustment factors, Beaumont Patak (2012) propose forming rescaled matrix nonnegative replicate factors \\(\\mathbf{}^S\\) rescaling adjustment factor \\(a_k^{(b)}\\) follows: $$ a_k^{S,(b)} = \\frac{a_k^{(b)} + \\tau - 1}{\\tau} $$ \\(\\tau \\geq 1 - a_k^{(b)} \\geq 1\\) \\(k\\) \\(\\left\\{ 1,\\ldots,n \\right\\}\\) \\(b\\) \\(\\left\\{1, \\ldots, B\\right\\}\\). value \\(\\tau\\) can set based realized adjustment factor matrix \\(\\mathbf{}\\) choosing \\(\\tau\\) prior generating adjustment factor matrix \\(\\mathbf{}\\) \\(\\tau\\) likely large enough prevent negative adjustment factors. adjustment factors rescaled manner, important adjust scale factor used estimating variance bootstrap replicates. example, bootstrap replicates, adjustment factor becomes \\(\\frac{\\tau^2}{B}\\) instead \\(\\frac{1}{B}\\). $$ \\textbf{Prior rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{1}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{*(b)}-\\hat{T}_y\\right)^2 $$ $$ \\textbf{rescaling: } v_B\\left(\\hat{T}_y\\right) = \\frac{\\tau^2}{B}\\sum_{b=1}^B\\left(\\hat{T}_y^{S*(b)}-\\hat{T}_y\\right)^2 $$","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Rescale replicate factors — rescale_reps","text":"method suggested Fay (1989) specific application creating replicate factors using generalized replication method. Beaumont Patak (2012) provided extended discussion rescaling method context rescaling generalized bootstrap replication factors avoid negative replicate weights. notation used documentation taken Beaumont Patak (2012). - Beaumont, Jean-François, Zdenek Patak. 2012. \"Generalized Bootstrap Sample Surveys Special Attention Poisson Sampling: Generalized Bootstrap Sample Surveys.\" International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x. - Fay, Robert. 1989. \"Theory Application Replicate Weighting Variance Calculations.\" , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf","code":""},{"path":"https://bschneidr.github.io/svrep/reference/rescale_reps.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Rescale replicate factors — rescale_reps","text":"","code":"# Example 1: Rescaling a matrix of replicate weights to avoid negative weights rep_wgts <- matrix( c(1.69742746694909, -0.230761178913411, 1.53333377634192, 0.0495043413294782, 1.81820367441039, 1.13229198793703, 1.62482013925955, 1.0866133494029, 0.28856654131668, 0.581930729719006, 0.91827012312825, 1.49979905894482, 1.26281337410693, 1.99327362761477, -0.25608700039304), nrow = 3, ncol = 5 ) rescaled_wgts <- rescale_reps(rep_wgts, min_wgt = 0.01) print(rep_wgts) #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.6974275 0.04950434 1.6248201 0.5819307 1.262813 #> [2,] -0.2307612 1.81820367 1.0866133 0.9182701 1.993274 #> [3,] 1.5333338 1.13229199 0.2885665 1.4997991 -0.256087 print(rescaled_wgts) #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.54915549 0.2515782 1.4919844 0.6708116 1.20693966 #> [2,] 0.03089671 1.6442549 1.0681995 0.9356458 1.78210522 #> [3,] 1.41994786 1.1041669 0.4398162 1.3935426 0.01095512 #> attr(,\"tau\") #> [1] 1.27 # Example 2: Rescaling replicate weights with a specified value of 'tau' rescaled_wgts <- rescale_reps(rep_wgts, tau = 2) print(rescaled_wgts) #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.3487137 0.5247522 1.3124101 0.7909654 1.1314067 #> [2,] 0.3846194 1.4091018 1.0433067 0.9591351 1.4966368 #> [3,] 1.2666669 1.0661460 0.6442833 1.2498995 0.3719565 #> attr(,\"tau\") #> [1] 2 # Example 3: Rescaling replicate weights of a survey design object set.seed(2023) library(survey) data('mu284', package = 'survey') ## First create a bootstrap design object svy_design_object <- svydesign( data = mu284, ids = ~ id1 + id2, fpc = ~ n1 + n2 ) boot_design <- as_gen_boot_design( design = svy_design_object, variance_estimator = \"Stratified Multistage SRS\", replicates = 5, tau = 1 ) ## Rescale the weights rescaled_boot_design <- boot_design |> rescale_reps(min_wgt = 0.01) boot_wgts <- weights(boot_design, \"analysis\") rescaled_boot_wgts <- weights(rescaled_boot_design, 'analysis') print(boot_wgts) #> REP_1 REP_2 REP_3 REP_4 REP_5 #> [1,] 34.071074 -3.352195 7.031013 35.4547244 18.681422 #> [2,] -3.271131 12.579037 57.474328 9.3992013 25.014379 #> [3,] 12.204302 16.611771 14.029208 6.9869038 -8.727739 #> [4,] 40.124053 62.587721 29.834150 31.6263955 10.057763 #> [5,] 6.857688 48.936835 5.029175 42.1974205 67.126670 #> [6,] 38.866284 -7.883877 6.363613 35.3323662 14.104502 #> [7,] -2.705981 5.310800 51.191780 -18.8838183 34.232137 #> [8,] 23.948409 19.740921 21.950039 0.8683187 -2.397135 #> [9,] 38.102201 56.396306 39.516036 39.6713936 31.130900 #> [10,] 7.987330 41.986885 8.545987 47.8769539 66.314653 #> [11,] 35.747939 -13.746937 9.901870 41.9315736 8.610797 #> [12,] 1.384506 2.579634 50.469377 -26.8411849 19.800463 #> [13,] 22.153736 11.250766 19.117806 0.9281634 -1.226728 #> [14,] 48.183146 68.452257 28.322524 31.3003310 12.972211 #> [15,] 7.066647 63.713091 11.462660 41.8092991 64.604278 #> attr(,\"tau\") #> [1] 1 #> attr(,\"scale\") #> [1] 0.2 #> attr(,\"rscales\") #> [1] 1 1 1 1 1 print(rescaled_boot_wgts) #> REP_1 REP_2 REP_3 REP_4 REP_5 #> [1,] 25.24027 6.805158 11.92004 25.9218675 17.659157 #> [2,] 11.91898 19.726948 41.84285 18.1605261 25.852732 #> [3,] 14.46846 16.639624 15.36743 11.8983106 4.157107 #> [4,] 34.98722 46.053065 29.91830 30.8011800 20.176238 #> [5,] 15.21725 35.945896 14.31651 32.6259871 44.906406 #> [6,] 27.60244 4.572803 11.59127 25.8615925 15.404516 #> [7,] 12.19738 16.146535 38.74800 4.2280041 30.393500 #> [8,] 20.25373 18.181078 19.26931 8.8842293 7.275631 #> [9,] 33.99123 43.003106 34.68770 34.7642333 30.557093 #> [10,] 15.77373 32.522275 16.04893 35.4237868 44.506397 #> [11,] 26.06631 1.684596 13.33425 29.1124336 12.698258 #> [12,] 14.21240 14.801133 38.39214 0.3081191 23.284300 #> [13,] 19.36966 13.998735 17.87412 8.9137094 7.852187 #> [14,] 38.95721 48.941999 29.17366 30.6405572 21.611926 #> [15,] 15.32019 43.224840 17.48571 32.4347943 43.663848 #> attr(,\"tau\") #> [1] 2.03 #> attr(,\"scale\") #> [1] 0.82418 #> attr(,\"rscales\") #> [1] 1 1 1 1 1"},{"path":"https://bschneidr.github.io/svrep/reference/shift_weight.html","id":null,"dir":"Reference","previous_headings":"","what":"(Internal function) Shift weight from one set of cases to another — shift_weight","title":"(Internal function) Shift weight from one set of cases to another — shift_weight","text":"likely want use redistribute_weights instead. function shift_weight internal package used \"--hood.\"","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shift_weight.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(Internal function) Shift weight from one set of cases to another — shift_weight","text":"","code":"shift_weight(wt_set, is_upweight_case, is_downweight_case)"},{"path":"https://bschneidr.github.io/svrep/reference/shift_weight.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(Internal function) Shift weight from one set of cases to another — shift_weight","text":"wt_set numeric vector weights is_upweight_case logical vector indicating cases whose weight increased is_downweight_case logical vector indicating cases whose weight decreased","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shift_weight.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"(Internal function) Shift weight from one set of cases to another — shift_weight","text":"numeric vector adjusted weights, length wt_set.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":null,"dir":"Reference","previous_headings":"","what":"Shuffle the order of replicates in a survey design object — shuffle_replicates","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"Shuffle order replicates survey design object. words, order columns replicate weights randomly permuted.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"","code":"shuffle_replicates(design)"},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"design survey design object, created either survey srvyr packages.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"updated survey design object, order replicates shuffled (.e., order randomly permuted).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/shuffle_replicates.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Shuffle the order of replicates in a survey design object — shuffle_replicates","text":"","code":"library(survey) set.seed(2023) # Create an example survey design object sample_data <- data.frame( STRATUM = c(1,1,1,1,2,2,2,2), PSU = c(1,2,3,4,5,6,7,8) ) survey_design <- svydesign( data = sample_data, strata = ~ STRATUM, ids = ~ PSU, weights = ~ 1 ) rep_design <- survey_design |> as_fays_gen_rep_design(variance_estimator = \"Ultimate Cluster\") # Inspect replicates before shuffling rep_design |> getElement(\"repweights\") #> REP_1 REP_2 REP_3 REP_4 REP_5 REP_6 REP_7 #> [1,] 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 #> [2,] 1.0722540 0.6864786 1.3135214 0.9277460 0.6920437 1.5492236 0.4507764 #> [3,] 0.4135167 1.1689015 0.8310985 1.5864833 1.3507810 1.0668008 0.9331992 #> [4,] 1.1606758 1.4981733 0.5018267 0.8393242 0.6036219 0.7375290 1.2624710 #> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 #> [6,] 0.9342068 1.3506702 1.3506702 0.9342068 0.8300909 0.4136276 0.4136276 #> [7,] 1.2618712 0.6028047 0.6028047 1.2618712 0.5024265 1.1614930 1.1614930 #> [8,] 0.4503686 0.6929717 0.6929717 0.4503686 1.3139292 1.0713260 1.0713260 #> REP_8 #> [1,] 0.6464466 #> [2,] 1.3079563 #> [3,] 0.6492190 #> [4,] 1.3963781 #> [5,] 1.3535534 #> [6,] 0.8300909 #> [7,] 0.5024265 #> [8,] 1.3139292 #> attr(,\"scale\") #> [1] 1 #> attr(,\"rscales\") #> [1] 1 1 1 1 1 1 1 1 # Inspect replicates after shuffling rep_design |> shuffle_replicates() |> getElement(\"repweights\") #> REP_5 REP_1 REP_7 REP_8 REP_6 REP_3 REP_2 #> [1,] 1.3535534 1.3535534 1.3535534 0.6464466 0.6464466 1.3535534 0.6464466 #> [2,] 0.6920437 1.0722540 0.4507764 1.3079563 1.5492236 1.3135214 0.6864786 #> [3,] 1.3507810 0.4135167 0.9331992 0.6492190 1.0668008 0.8310985 1.1689015 #> [4,] 0.6036219 1.1606758 1.2624710 1.3963781 0.7375290 0.5018267 1.4981733 #> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 #> [6,] 0.8300909 0.9342068 0.4136276 0.8300909 0.4136276 1.3506702 1.3506702 #> [7,] 0.5024265 1.2618712 1.1614930 0.5024265 1.1614930 0.6028047 0.6028047 #> [8,] 1.3139292 0.4503686 1.0713260 1.3139292 1.0713260 0.6929717 0.6929717 #> REP_4 #> [1,] 0.6464466 #> [2,] 0.9277460 #> [3,] 1.5864833 #> [4,] 0.8393242 #> [5,] 1.3535534 #> [6,] 0.9342068 #> [7,] 1.2618712 #> [8,] 0.4503686 #> attr(,\"scale\") #> [1] 1 #> attr(,\"rscales\") #> [1] 1 1 1 1 1 1 1 1"},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":null,"dir":"Reference","previous_headings":"","what":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"Stack replicate designs: combine rows data, rows replicate weights, respective full-sample weights. can useful comparing estimates set adjustments made weights. Another delicate application combining sets replicate weights multiple years data survey, although must done carefully based guidance data provider.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"","code":"stack_replicate_designs(..., .id = \"Design_Name\")"},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"... Replicate-weights survey design objects combine. can supplied one two ways. Option 1 - series design objects, example 'adjusted' = adjusted_design, 'orig' = orig_design. Option 2 - list object containing design objects, example list('nr' = nr_adjusted_design, 'ue' = ue_adjusted_design). objects must specifications type, rho, mse, scales, rscales. .id single character value, becomes name new column identifiers created output data link row design taken. labels used identifiers taken named arguments.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"replicate-weights survey design object, class svyrep.design svyrep.stacked. resulting survey design object always value combined.weights set TRUE.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Stack replicate designs, combining data and weights into a single object — stack_replicate_designs","text":"","code":"# Load example data, creating a replicate design object suppressPackageStartupMessages(library(survey)) data(api) dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) dclus1$variables$response_status <- sample(x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), size = nrow(dclus1), replace = TRUE) orig_rep_design <- as.svrepdesign(dclus1) # Adjust weights for cases with unknown eligibility ue_adjusted_design <- redistribute_weights( design = orig_rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\"), by = c(\"stype\") ) # Adjust weights for nonresponse nr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status %in% c(\"Nonrespondent\"), increase_if = response_status == \"Respondent\", by = c(\"stype\") ) # Stack the three designs, using any of the following syntax options stacked_design <- stack_replicate_designs(orig_rep_design, ue_adjusted_design, nr_adjusted_design, .id = \"which_design\") stacked_design <- stack_replicate_designs('original' = orig_rep_design, 'unknown eligibility adjusted' = ue_adjusted_design, 'nonresponse adjusted' = nr_adjusted_design, .id = \"which_design\") list_of_designs <- list('original' = orig_rep_design, 'unknown eligibility adjusted' = ue_adjusted_design, 'nonresponse adjusted' = nr_adjusted_design) stacked_design <- stack_replicate_designs(list_of_designs, .id = \"which_design\")"},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":null,"dir":"Reference","previous_headings":"","what":"Retain only a random subset of the replicates in a design — subsample_replicates","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"Randomly subsamples replicates survey design object, keep subset. scale factor used estimation increased account subsampling.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"","code":"subsample_replicates(design, n_reps)"},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"design survey design object, created either survey srvyr packages. n_reps number replicates keep subsampling","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"updated survey design object, random selection replicates retained. overall 'scale' factor design (accessed design$scale) increased account sampling replicates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"statistical-details","dir":"Reference","previous_headings":"","what":"Statistical Details","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"Suppose initial replicate design \\(L\\) replicates, respective constants \\(c_k\\) \\(k=1,\\dots,L\\) used estimate variance formula $$v_{R} = \\sum_{k=1}^L c_k\\left(\\hat{T}_y^{(k)}-\\hat{T}_y\\right)^2$$ subsampling replicates, \\(L_0\\) original \\(L\\) replicates randomly selected, variances estimated using formula: $$v_{R} = \\frac{L}{L_0} \\sum_{k=1}^{L_0} c_k\\left(\\hat{T}_y^{(k)}-\\hat{T}_y\\right)^2$$ subsampling suggested certain replicate designs Fay (1989). Kim Wu (2013) provide detailed theoretical justification also propose alternative methods subsampling replicates.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"Fay, Robert. 1989. \"Theory Application Replicate Weighting Variance Calculations.\" , 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf Kim, J.K. Wu, C. 2013. \"Sparse Efficient Replication Variance Estimation Complex Surveys.\" Survey Methodology, Statistics Canada, 39(1), 91-120.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/subsample_replicates.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retain only a random subset of the replicates in a design — subsample_replicates","text":"","code":"library(survey) set.seed(2023) # Create an example survey design object sample_data <- data.frame( STRATUM = c(1,1,1,1,2,2,2,2), PSU = c(1,2,3,4,5,6,7,8) ) survey_design <- svydesign( data = sample_data, strata = ~ STRATUM, ids = ~ PSU, weights = ~ 1 ) rep_design <- survey_design |> as_fays_gen_rep_design(variance_estimator = \"Ultimate Cluster\") # Inspect replicates before subsampling rep_design |> getElement(\"repweights\") #> REP_1 REP_2 REP_3 REP_4 REP_5 REP_6 REP_7 #> [1,] 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 #> [2,] 1.0722540 0.6864786 1.3135214 0.9277460 0.6920437 1.5492236 0.4507764 #> [3,] 0.4135167 1.1689015 0.8310985 1.5864833 1.3507810 1.0668008 0.9331992 #> [4,] 1.1606758 1.4981733 0.5018267 0.8393242 0.6036219 0.7375290 1.2624710 #> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 #> [6,] 0.9342068 1.3506702 1.3506702 0.9342068 0.8300909 0.4136276 0.4136276 #> [7,] 1.2618712 0.6028047 0.6028047 1.2618712 0.5024265 1.1614930 1.1614930 #> [8,] 0.4503686 0.6929717 0.6929717 0.4503686 1.3139292 1.0713260 1.0713260 #> REP_8 #> [1,] 0.6464466 #> [2,] 1.3079563 #> [3,] 0.6492190 #> [4,] 1.3963781 #> [5,] 1.3535534 #> [6,] 0.8300909 #> [7,] 0.5024265 #> [8,] 1.3139292 #> attr(,\"scale\") #> [1] 1 #> attr(,\"rscales\") #> [1] 1 1 1 1 1 1 1 1 # Inspect replicates after subsampling rep_design |> subsample_replicates(n_reps = 4) |> getElement(\"repweights\") #> REP_5 REP_1 REP_7 REP_8 #> [1,] 1.3535534 1.3535534 1.3535534 0.6464466 #> [2,] 0.6920437 1.0722540 0.4507764 1.3079563 #> [3,] 1.3507810 0.4135167 0.9331992 0.6492190 #> [4,] 0.6036219 1.1606758 1.2624710 1.3963781 #> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 #> [6,] 0.8300909 0.9342068 0.4136276 0.8300909 #> [7,] 0.5024265 1.2618712 1.1614930 0.5024265 #> [8,] 1.3139292 0.4503686 1.0713260 1.3139292 #> attr(,\"scale\") #> [1] 4 #> attr(,\"rscales\") #> [1] 1 1 1 1"},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":null,"dir":"Reference","previous_headings":"","what":"Summarize the replicate weights — summarize_rep_weights","title":"Summarize the replicate weights — summarize_rep_weights","text":"Summarize replicate weights design","code":""},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Summarize the replicate weights — summarize_rep_weights","text":"","code":"summarize_rep_weights(rep_design, type = \"both\", by)"},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Summarize the replicate weights — summarize_rep_weights","text":"rep_design replicate design object, created either survey srvyr packages. type Default \"\". Use type = \"overall\", overall summary replicate weights. Use type = \"specific\" summary column replicate weights, column replicate weights summarized given row summary. Use type = \"\" list containing summaries, list containing names \"overall\" \"\". (Optional) character vector names variables used group summaries.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Summarize the replicate weights — summarize_rep_weights","text":"type = \"\" (default), result list data frames names \"overall\" \"specific\". type = \"overall\", result data frame providing overall summary replicate weights. contents \"overall\" summary following: \"nrows\": Number rows weights \"ncols\": Number columns replicate weights \"degf_svy_pkg\": degrees freedom according survey package R \"rank\": matrix rank determined QR decomposition \"avg_wgt_sum\": average column sum \"sd_wgt_sums\": standard deviation column sums \"min_rep_wgt\": minimum value replicate weight \"max_rep_wgt\": maximum value replicate weight type = \"specific\", result data frame providing summary column replicate weights, column replicate weights described given row data frame. contents \"specific\" summary following: \"Rep_Column\": name given column replicate weights. columns unnamed, column number used instead \"N\": number entries \"N_NONZERO\": number nonzero entries \"SUM\": sum weights \"MEAN\": average weights \"CV\": coefficient variation weights (standard deviation divided mean) \"MIN\": minimum weight \"MAX\": maximum weight","code":""},{"path":"https://bschneidr.github.io/svrep/reference/summarize_rep_weights.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Summarize the replicate weights — summarize_rep_weights","text":"","code":"# Load example data suppressPackageStartupMessages(library(survey)) data(api) dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) dclus1$variables$response_status <- sample(x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), size = nrow(dclus1), replace = TRUE) rep_design <- as.svrepdesign(dclus1) # Adjust weights for cases with unknown eligibility ue_adjusted_design <- redistribute_weights( design = rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\"), by = c(\"stype\") ) # Summarize replicate weights summarize_rep_weights(rep_design, type = \"both\") #> $overall #> nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 183 15 14 15 6194 403.1741 0 36.26464 #> #> $specific #> Rep_Column N N_NONZERO SUM MEAN CV MIN MAX #> 1 1 183 172 6237.518 34.08480 0.25358407 0 36.26464 #> 2 2 183 179 6491.370 35.47197 0.14989713 0 36.26464 #> 3 3 183 181 6563.900 35.86830 0.10540606 0 36.26464 #> 4 4 183 170 6164.989 33.68846 0.27729183 0 36.26464 #> 5 5 183 181 6563.900 35.86830 0.10540606 0 36.26464 #> 6 6 183 179 6491.370 35.47197 0.14989713 0 36.26464 #> 7 7 183 179 6491.370 35.47197 0.14989713 0 36.26464 #> 8 8 183 167 6056.195 33.09396 0.31037848 0 36.26464 #> 9 9 183 174 6310.047 34.48113 0.22805336 0 36.26464 #> 10 10 183 149 5403.431 29.52695 0.47900073 0 36.26464 #> 11 11 183 162 5874.872 32.10312 0.36102892 0 36.26464 #> 12 12 183 146 5294.637 28.93244 0.50479412 0 36.26464 #> 13 13 183 170 6164.989 33.68846 0.27729183 0 36.26464 #> 14 14 183 182 6600.164 36.06647 0.07432829 0 36.26464 #> 15 15 183 171 6201.253 33.88663 0.26563324 0 36.26464 #> # Summarize replicate weights by grouping variables summarize_rep_weights(ue_adjusted_design, type = 'overall', by = c(\"response_status\")) #> response_status nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums #> 1 Ineligible 39 15 13 14 1896.620 164.4527 #> 2 Nonrespondent 47 15 14 15 2296.912 133.9403 #> 3 Respondent 41 15 13 14 2000.468 130.3750 #> 4 Unknown eligibility 56 15 -1 0 0.000 0.0000 #> min_rep_wgt max_rep_wgt #> 1 0 56.98729 #> 2 0 56.98729 #> 3 0 56.98729 #> 4 0 0.00000 summarize_rep_weights(ue_adjusted_design, type = 'overall', by = c(\"stype\", \"response_status\")) #> stype response_status nrows ncols degf_svy_pkg rank avg_wgt_sum #> 1 E Ineligible 29 15 7 8 1413.77685 #> 2 H Ineligible 6 15 2 3 283.80822 #> 3 M Ineligible 4 15 3 4 199.03463 #> 4 E Nonrespondent 36 15 12 13 1753.97013 #> 5 H Nonrespondent 2 15 1 2 95.02487 #> 6 M Nonrespondent 9 15 5 6 447.91713 #> 7 E Respondent 35 15 10 11 1706.22048 #> 8 H Respondent 2 15 1 2 95.02487 #> 9 M Respondent 4 15 2 3 199.22315 #> 10 E Unknown eligibility 44 15 -1 0 0.00000 #> 11 H Unknown eligibility 4 15 -1 0 0.00000 #> 12 M Unknown eligibility 8 15 -1 0 0.00000 #> sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 149.65790 0 53.40068 #> 2 40.20252 0 56.98729 #> 3 23.57306 0 56.98729 #> 4 121.79339 0 53.40068 #> 5 18.18279 0 56.98729 #> 6 46.50443 0 56.98729 #> 7 135.21908 0 53.40068 #> 8 18.18279 0 56.98729 #> 9 32.77474 0 56.98729 #> 10 0.00000 0 0.00000 #> 11 0.00000 0 0.00000 #> 12 0.00000 0 0.00000 # Compare replicate weights rep_wt_summaries <- lapply(list('original' = rep_design, 'adjusted' = ue_adjusted_design), summarize_rep_weights, type = \"overall\") print(rep_wt_summaries) #> $original #> nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 183 15 14 15 6194 403.1741 0 36.26464 #> #> $adjusted #> nrows ncols degf_svy_pkg rank avg_wgt_sum sd_wgt_sums min_rep_wgt max_rep_wgt #> 1 183 15 14 15 6194 403.1741 0 56.98729 #>"},{"path":"https://bschneidr.github.io/svrep/reference/svrep-package.html","id":null,"dir":"Reference","previous_headings":"","what":"svrep: Tools for Creating, Updating, and Analyzing Survey Replicate Weights — svrep-package","title":"svrep: Tools for Creating, Updating, and Analyzing Survey Replicate Weights — svrep-package","text":"Provides tools creating working survey replicate weights, extending functionality 'survey' package Lumley (2004) doi:10.18637/jss.v009.i08 . Implements bootstrap methods complex surveys, including generalized survey bootstrap described Beaumont Patak (2012) doi:10.1111/j.1751-5823.2011.00166.x . Methods provided applying nonresponse adjustments full-sample replicate weights described Rust Rao (1996) doi:10.1177/096228029600500305 . Implements methods sample-based calibration described Opsomer Erciulescu (2021) https://www150.statcan.gc.ca/n1/pub/12-001-x/2021002/article/00006-eng.htm. Diagnostic functions included compare weights weighted estimates different sets replicate weights.","code":""},{"path":[]},{"path":"https://bschneidr.github.io/svrep/reference/svrep-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"svrep: Tools for Creating, Updating, and Analyzing Survey Replicate Weights — svrep-package","text":"Maintainer: Ben Schneider benjamin.julius.schneider@gmail.com (ORCID)","code":""},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":null,"dir":"Reference","previous_headings":"","what":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"modified version svyby() function survey package. Whereas svyby() calculates statistics separately subset formed specified grouping variable, svyby_repwts() calculates statistics separately replicate design, addition additional user-specified grouping variables.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"","code":"svyby_repwts( rep_designs, formula, by, FUN, ..., deff = FALSE, keep.var = TRUE, keep.names = TRUE, verbose = FALSE, vartype = c(\"se\", \"ci\", \"ci\", \"cv\", \"cvpct\", \"var\"), drop.empty.groups = TRUE, return.replicates = FALSE, na.rm.by = FALSE, na.rm.all = FALSE, multicore = getOption(\"survey.multicore\") )"},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"rep_designs replicate-weights survey designs compared. Supplied either : named list replicate-weights survey design objects, example list('nr' = nr_adjusted_design, 'ue' = ue_adjusted_design). 'stacked' replicate-weights survey design object created stack_replicate_designs(). designs must number columns replicate weights, type (bootstrap, JKn, etc.) formula formula specifying variables pass FUN formula specifying factors define subsets FUN function taking formula survey design object first two arguments. Usually function survey package, svytotal svymean. ... arguments FUN deff value TRUE FALSE, indicating whether design effects estimated possible. keep.var value TRUE FALSE. FUN returns svystat object, indicates whether extract standard errors . keep.names Define row names based subsets verbose TRUE, print label subset processed. vartype Report variability one standard error, confidence interval, coefficient variation, percent coefficient variation, variance drop.empty.groups FALSE, report NA empty groups, TRUE drop output return.replicates TRUE, return replicates \"replicates\" attribute result. can useful want produce custom summaries estimates replicate. na.rm.true, omit groups defined NA values variables na.rm.true, check groups non-missing observations variables defined formula treat groups empty multicore Use multicore package distribute subsets multiple processors?","code":""},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"object class \"svyby\": data frame showing grouping factors results FUN combination grouping factors. first grouping factor always consists indicators replicate design used estimate.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/svyby_repwts.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compare survey statistics calculated separately from different sets of replicate weights — svyby_repwts","text":"","code":"if (FALSE) { suppressPackageStartupMessages(library(survey)) data(api) dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) dclus1$variables$response_status <- sample(x = c(\"Respondent\", \"Nonrespondent\", \"Ineligible\", \"Unknown eligibility\"), size = nrow(dclus1), replace = TRUE) orig_rep_design <- as.svrepdesign(dclus1) # Adjust weights for cases with unknown eligibility ue_adjusted_design <- redistribute_weights( design = orig_rep_design, reduce_if = response_status %in% c(\"Unknown eligibility\"), increase_if = !response_status %in% c(\"Unknown eligibility\"), by = c(\"stype\") ) # Adjust weights for nonresponse nr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status %in% c(\"Nonrespondent\"), increase_if = response_status == \"Respondent\", by = c(\"stype\") ) # Compare estimates from the three sets of replicate weights list_of_designs <- list('original' = orig_rep_design, 'unknown eligibility adjusted' = ue_adjusted_design, 'nonresponse adjusted' = nr_adjusted_design) ##_ First compare overall means for two variables means_by_design <- svyby_repwts(formula = ~ api00 + api99, FUN = svymean, rep_design = list_of_designs) print(means_by_design) ##_ Next compare domain means for two variables domain_means_by_design <- svyby_repwts(formula = ~ api00 + api99, by = ~ stype, FUN = svymean, rep_design = list_of_designs) print(domain_means_by_design) # Calculate confidence interval for difference between estimates ests_by_design <- svyby_repwts(rep_designs = list('NR-adjusted' = nr_adjusted_design, 'Original' = orig_rep_design), FUN = svymean, formula = ~ api00 + api99) differences_in_estimates <- svycontrast(stat = ests_by_design, contrasts = list( 'Mean of api00: NR-adjusted vs. Original' = c(1,-1,0,0), 'Mean of api99: NR-adjusted vs. Original' = c(0,0,1,-1) )) print(differences_in_estimates) confint(differences_in_estimates, level = 0.95) }"},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":null,"dir":"Reference","previous_headings":"","what":"Variance Estimators — variance-estimators","title":"Variance Estimators — variance-estimators","text":"help page describes variance estimators commonly used survey samples. variance estimators can used basis generalized replication methods, implemented functions as_fays_gen_rep_design(), as_gen_boot_design(), make_fays_gen_rep_factors(), make_gen_boot_factors()","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"shared-notation","dir":"Reference","previous_headings":"","what":"Shared Notation","title":"Variance Estimators — variance-estimators","text":"Let \\(s\\) denote selected sample size \\(n\\), elements \\(=1,\\dots,n\\). Element \\(\\) sample probability \\(\\pi_i\\) included sample. pair elements \\(ij\\) sampled probability \\(\\pi_{ij}\\). population total variable denoted \\(Y = \\sum_{\\U}y_i\\), Horvitz-Thompson estimator \\(\\hat{Y}\\) denoted \\(\\hat{Y} = \\sum_{\\s} y_i/\\pi_i\\). convenience, denote \\(\\breve{y}_i = y_i/\\pi_i\\). true sampling variance \\(\\hat{Y}\\) denoted \\(V(\\hat{Y})\\), estimator sampling variance denoted \\(v(\\hat{Y})\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"horvitz-thompson","dir":"Reference","previous_headings":"","what":"Horvitz-Thompson","title":"Variance Estimators — variance-estimators","text":"Horvitz-Thompson variance estimator: $$ v(\\hat{Y}) = \\sum_{\\s}\\sum_{j \\s} (1 - \\frac{\\pi_i \\pi_j}{\\pi_{ij}}) \\frac{y_i}{\\pi_i} \\frac{y_j}{\\pi_j} $$","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"yates-grundy","dir":"Reference","previous_headings":"","what":"Yates-Grundy","title":"Variance Estimators — variance-estimators","text":"Yates-Grundy variance estimator: $$ v(\\hat{Y}) = -\\frac{1}{2}\\sum_{\\s}\\sum_{j \\s} (1 - \\frac{\\pi_i \\pi_j}{\\pi_{ij}}) (\\frac{y_i}{\\pi_i} - \\frac{y_j}{\\pi_j})^2 $$","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"poisson-horvitz-thompson","dir":"Reference","previous_headings":"","what":"Poisson Horvitz-Thompson","title":"Variance Estimators — variance-estimators","text":"Poisson Horvitz-Thompson variance estimator simply Horvitz-Thompson variance estimator, \\(\\pi_{ij}=\\pi_i \\times \\pi_j\\), case Poisson sampling.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"stratified-multistage-srs","dir":"Reference","previous_headings":"","what":"Stratified Multistage SRS","title":"Variance Estimators — variance-estimators","text":"Stratified Multistage SRS variance estimator recursive variance estimator proposed Bellhouse (1985) used 'survey' package's function svyrecvar. case simple random sampling without replacement (one stages), estimator exactly matches Horvitz-Thompson estimator. estimator can used number sampling stages. illustration, describe use two sampling stages. $$ v(\\hat{Y}) = \\hat{V}_1 + \\hat{V}_2 $$ $$ \\hat{V}_1 = \\sum_{h=1}^{H} (1 - \\frac{n_h}{N_h})\\frac{n_h}{n_h - 1} \\sum_{=1}^{n_h} (y_{hi.} - \\bar{y}_{hi.})^2 $$ $$ \\hat{V}_2 = \\sum_{h=1}^{H} \\frac{n_h}{N_h} \\sum_{=1}^{n_h}v_{hi}(y_{hi.}) $$ \\(n_h\\) number sampled clusters stratum \\(h\\), \\(N_h\\) number population clusters stratum \\(h\\), \\(y_{hi.}\\) weighted cluster total cluster \\(\\) stratum \\(h\\), \\(\\bar{y}_{hi.}\\) mean weighted cluster total stratum \\(h\\), (\\(\\bar{y}_{hi.} = \\frac{1}{n_h}\\sum_{=1}^{n_h}y_{hi.}\\)), \\(v_{hi}(y_{hi.})\\) estimated sampling variance \\(y_{hi.}\\).","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"ultimate-cluster","dir":"Reference","previous_headings":"","what":"Ultimate Cluster","title":"Variance Estimators — variance-estimators","text":"Ultimate Cluster variance estimator simply stratified multistage SRS variance estimator, ignoring variances later stages sampling. $$ v(\\hat{Y}) = \\hat{V}_1 $$ variance estimator used 'survey' package user specifies option(survey.ultimate.cluster = TRUE) uses svyrecvar(..., one.stage = TRUE). first-stage sampling fractions small, analysts often omit finite population corrections \\((1-\\frac{n_h}{N_h})\\) using ultimate cluster estimator.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"sd-and-sd-successive-difference-estimators-","dir":"Reference","previous_headings":"","what":"SD1 and SD2 (Successive Difference Estimators)","title":"Variance Estimators — variance-estimators","text":"SD1 SD2 variance estimators \"successive difference\" estimators sometimes used systematic sampling designs. Ash (2014) describes estimator follows: $$ \\hat{v}_{S D 1}(\\hat{Y}) = \\left(1-\\frac{n}{N}\\right) \\frac{n}{2(n-1)} \\sum_{k=2}^n\\left(\\breve{y}_k-\\breve{y}_{k-1}\\right)^2 $$ $$ \\hat{v}_{S D 2}(\\hat{Y}) = \\left(1-\\frac{n}{N}\\right) \\frac{1}{2}\\left[\\sum_{k=2}^n\\left(\\breve{y}_k-\\breve{y}_{k-1}\\right)^2+\\left(\\breve{y}_n-\\breve{y}_1\\right)^2\\right] $$ \\(\\breve{y}_k = y_k/\\pi_k\\) weighted value unit \\(k\\) selection probability \\(\\pi_k\\). SD1 estimator recommended Wolter (1984). SD2 estimator basis successive difference replication estimator commonly used systematic sampling designs conservative. See Ash (2014) details. multistage samples, SD1 SD2 applied clusters stage, separately stratum. later stages sampling, variance estimate stratum multiplied product sampling fractions earlier stages sampling. example, third stage sampling, variance estimate third-stage stratum multiplied \\(\\frac{n_1}{N_1}\\frac{n_2}{N_2}\\), product sampling fractions first-stage stratum second-stage stratum.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"deville-and-deville-","dir":"Reference","previous_headings":"","what":"Deville 1 and Deville 2","title":"Variance Estimators — variance-estimators","text":"\"Deville-1\" \"Deville-2\" variance estimators clearly described Matei Tillé (2005), intended designs use fixed-size, unequal-probability random sampling without replacement. variance estimators shown effective designs use fixed sample size high-entropy sampling method. includes PPSWOR sampling methods, unequal-probability systematic sampling important exception. variance estimators take following form: $$ \\hat{v}(\\hat{Y}) = \\sum_{=1}^{n} c_i (\\breve{y}_i - \\frac{1}{\\sum_{=k}^{n}c_k}\\sum_{k=1}^{n}c_k \\breve{y}_k)^2 $$ \\(\\breve{y}_i = y_i/\\pi_i\\) weighted value variable interest, \\(c_i\\) depend method used: \"Deville-1\": $$c_i=\\left(1-\\pi_i\\right) \\frac{n}{n-1}$$ \"Deville-2\": $$c_i = (1-\\pi_i) \\left[1 - \\sum_{k=1}^{n} \\left(\\frac{1-\\pi_k}{\\sum_{k=1}^{n}(1-\\pi_k)}\\right)^2 \\right]^{-1}$$ case simple random sampling without replacement (SRSWOR), estimators identical usual stratified multistage SRS estimator (special case Horvitz-Thompson estimator). multistage samples, \"Deville-1\" \"Deville-2\" applied clusters stage, separately stratum. later stages sampling, variance estimate stratum multiplied product sampling probabilities earlier stages sampling. example, third stage sampling, variance estimate third-stage stratum multiplied \\(\\pi_1 \\times \\pi_{(2 | 1)}\\), \\(\\pi_1\\) sampling probability first-stage unit \\(\\pi_{(2|1)}\\) sampling probability second-stage unit within first-stage unit.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"deville-till-","dir":"Reference","previous_headings":"","what":"Deville-Tillé","title":"Variance Estimators — variance-estimators","text":"See Section 6.8 Tillé (2020) detail estimator, including explanation quadratic form. See Deville Tillé (2005) results simulation study comparing alternative estimators balanced sampling. estimator can written follows: $$ v(\\hat{Y})=\\sum_{k \\S} \\frac{c_k}{\\pi_k^2}\\left(y_k-\\hat{y}_k^*\\right)^2, $$ $$ \\hat{y}_k^*=\\mathbf{z}_k^{\\top}\\left(\\sum_{\\ell \\S} c_{\\ell} \\frac{\\mathbf{z}_{\\ell} \\mathbf{z}_{\\ell}^{\\prime}}{\\pi_{\\ell}^2}\\right)^{-1} \\sum_{\\ell \\S} c_{\\ell} \\frac{\\mathbf{z}_{\\ell} y_{\\ell}}{\\pi_{\\ell}^2} $$ \\(\\mathbf{z}_k\\) denotes vector auxiliary variables observation \\(k\\) included sample \\(S\\), inclusion probability \\(\\pi_k\\). value \\(c_k\\) set \\(\\frac{n}{n-q}(1-\\pi_k)\\), \\(n\\) number observations \\(q\\) number auxiliary variables.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/variance-estimators.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Variance Estimators — variance-estimators","text":"Ash, S. (2014). \"Using successive difference replication estimating variances.\" Survey Methodology, Statistics Canada, 40(1), 47–59. Bellhouse, D.R. (1985). \"Computing Methods Variance Estimation Complex Surveys.\" Journal Official Statistics, Vol.1, .3. Deville, J.‐C., Tillé, Y. (2005). \"Variance approximation balanced sampling.\" Journal Statistical Planning Inference, 128, 569–591. Tillé, Y. (2020). \"Sampling estimation finite populations.\" (. Hekimi, Trans.). Wiley. Matei, Alina, Yves Tillé. (2005). “Evaluation Variance Approximations Estimators Maximum Entropy Sampling Unequal Probability Fixed Sample Size.” Journal Official Statistics, 21(4):543–70.","code":""},{"path":"https://bschneidr.github.io/svrep/reference/wls_hat_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Create the ","title":"Create the ","text":"Create \"hat matrix\" weighted least squares regression","code":""},{"path":"https://bschneidr.github.io/svrep/reference/wls_hat_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create the ","text":"","code":"wls_hat_matrix(X, w)"},{"path":"https://bschneidr.github.io/svrep/reference/wls_hat_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create the ","text":"X Matrix predictor variables, n rows w Vector weights (nonnegative), length n","code":""},{"path":"https://bschneidr.github.io/svrep/reference/wls_hat_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create the ","text":"\\(n \\times n\\) matrix. \"hat matrix\" WLS regression.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-063","dir":"Changelog","previous_headings":"","what":"svrep 0.6.3","title":"svrep 0.6.3","text":"CRAN release: 2023-09-09 Bumped version number CRAN submission. significant user-facing changes: just updates unit tests rendering examples/vignettes due temporary CRAN check issues development version R.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-062","dir":"Changelog","previous_headings":"","what":"svrep 0.6.2","title":"svrep 0.6.2","text":"Bug fixes: Bumped version number CRAN submission. significant user-facing changes: just updates unit tests rendering examples/vignettes due temporary CRAN check issues development version R. Changes specifically CRAN check: Removed 12 unmarked UTF-8 strings causing CRAN check note. Removed LaTeX ‘cases’ formatting documentation as_random_group_jackknife_design(), since old release MacOS throwing LaTeX error trying build manual. formatting might restored later ‘oldrel’ CRAN increases 4.3.X.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-061","dir":"Changelog","previous_headings":"","what":"svrep 0.6.1","title":"svrep 0.6.1","text":"CRAN release: 2023-08-30 Added support Fay’s generalized replication method, specifically version proposed Fay (1989): key functions as_fays_gen_rep_design() make_fays_gen_rep_factors(), nearly identical generalized bootstrap functions as_gen_boot_design() make_gen_boot_factors(). Added new variance estimator, \"Deville-Tille\", useful balanced sampling (including cube method). Currently works single-stage designs. functions as_gen_boot_design() as_fays_gen_rep_design() new argument aux_var_names meant used \"Deville-Tille\" variance estimator. Similarly, make_gen_boot_factors() make_fays_gen_rep_factors() argument named aux_vars.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-060","dir":"Changelog","previous_headings":"","what":"svrep 0.6.0","title":"svrep 0.6.0","text":"CRAN release: 2023-07-06 Added function as_random_group_jackknife_design() create random-group jackknife replicates. creation generalized bootstrap replicates designs many observations degrees freedom (e.g., stratified cluster samples) now much faster efficient. based using ‘Matrix’ package–particularly efficient representation sparse matrices arise stratified designs–well using compressed representation designs use cluster sampling. Now using ‘Matrix’ package improve speed memory usage large quadratic forms. primarily helpful making generalized bootstrap computationally feasible larger datasets. Better documentation bootstrap methods covered as_bootstrap_design(). following functions now work database-backed survey design objects (.e., objects class DBIsvydesign): as_data_frame_with_weights() as_gen_boot_design() as_bootstrap_design() redistribute_weights() calibrate_to_sample() calibrate_to_estimate() function as_data_frame_with_weights() gained argument vars_to_keep allows user indicate want keep specific list variables data. can useful, example, want keep weights unique identifiers. Minor updates bug fixes: function as_bootstrap_design() now throws informative error message supply invalid value type argument. Bug Fix: “Deville-1” “Deville-2” estimators threw errors strata one units selected certainty (.e., sampling probabilities 1). now fixed. Bug Fix: function as_gen_boot_design() sometimes fail detect input design PPS design, caused give user unnecessary error message.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-051","dir":"Changelog","previous_headings":"","what":"svrep 0.5.1","title":"svrep 0.5.1","text":"CRAN release: 2023-05-17 Added argument exact_vcov = TRUE as_gen_boot_design() make_gen_boot_factors(). argument forces generalized bootstrap variance-covariance estimates totals exactly match target variance estimator. words, eliminates bootstrap simulation error variance estimates totals. similar , simple survey designs, jackknife BRR give variance estimates totals exactly match Horvitz-Thompson estimates. Using exact_vcov requires number replicates strictly greater rank target variance estimator. Added new variance estimators (“Deville 1” “Deville 2”) available use generalized bootstrap, particularly useful single-stage PPSWOR designs multistage designs one stages PPSWOR sampling. See updated documentation as_gen_boot_design() make_quad_form_matrix(). ‘srvyr’ package loaded, functions ‘svrep’ return survey design objects always return tbl_svy input tbl_svy. makes easier use functions summarize() mutate(). Fixed bug as_bootstrap_design() wouldn’t create 50 replicates Rao-Wu, Preston, Canty-Davison types.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-050","dir":"Changelog","previous_headings":"","what":"svrep 0.5.0","title":"svrep 0.5.0","text":"CRAN release: 2023-02-07 release adds extensive new functionality two-phase designs. new vignette “Replication Methods Two-phase Sampling” describes new functionality well underlying statistical methods. function as_gen_boot_design() can now create generalized bootstrap weights two-phase survey design objects created ‘survey’ package’s twophase() function. user must specify list two variance estimators use phase, e.g. list('Stratified Multistage SRS', 'Ultimate Cluster'). function make_twophase_quad_form() can used create quadratic form two-phase variance estimator, combining quadratic forms phase. helper function get_nearest_psd_matrix() can used approximate quadratic form matrix nearest positive semidefinite matrix. can particularly useful two-phase designs, since double expansion estimator commonly used practice frequently variance estimator positive semidefinite. function as_gen_boot_design() new argument named psd_option, controls happen target variance estimator quadratic form matrix positive semi-definite. can occasionally happen, particularly two-phase designs. default, function warn user quadratic form positive semi-definite automatically approximate matrix nearest positive semi-definite matrix. Added new function get_design_quad_form(), determines quadratic form matrix specified variance estimator, parsing information stored survey design object created using ‘survey’ package. Added new function rescale_reps() implements rescaling replicate adjustment factors avoid negative replicate weights. functionality already existed as_gen_boot_design() make_gen_boot_factors(), now implemented help new function. Added helper function is_psd_matrix() checking whether matrix positive semi-definite, added helper function get_nearest_psd_matrix() approximating square matrix nearest positive semi-definite matrix. Minor improvements vignettes, particularly formatting.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-041","dir":"Changelog","previous_headings":"","what":"svrep 0.4.1","title":"svrep 0.4.1","text":"CRAN release: 2022-12-18 Fix bug #15, bootstrap conversion multistage survey design objects as_bootstrap_design() throw error user manually specified weights svydesign(). Creation Rao-Wu-Yue-Beaumont bootstrap replicate weights now faster takes less computer memory. Typo fix vignettes.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-040","dir":"Changelog","previous_headings":"","what":"svrep 0.4.0","title":"svrep 0.4.0","text":"CRAN release: 2022-12-11 release adds several functions creating bootstrap generalized bootstrap replicate weights. new vignette “Bootstrap methods surveys” provides guidance choosing bootstrap method selecting number bootstrap replicates use, along statistical details references. Added function as_bootstrap_design() convert survey design object replicate design replicate weights created using bootstrap method. essentially specialized version .svrepdesign() supports additional bootstrap methods detailed documentation bootstrap methods can used different types sampling designs. Added function as_gen_boot_design() convert survey design object replicate design replicate weights created using generalized survey bootstrap. user must supply name target variance estimator (e.g., “Horvitz-Thompson” “Ultimate Cluster”) used create generalized bootstrap factors. See new vignette details. Added functions help choose number bootstrap replicates. function estimate_boot_sim_cv() can used estimate simulation error bootstrap estimate caused using finite number bootstrap replicates. new function estimate_boot_reps_for_target_cv() estimates number bootstrap replicates needed reduce simulation error target level. Added function make_rwyb_bootstrap_weights(), creates bootstrap replicate weights wide range survey designs using method Rao-Wu-Yue-Beaumont (.e., Beaumont’s generalization Rao-Wu-Yue bootstrap method). function can used directly, users can specify as_bootstrap_design(type = \"Rao-Wu-Yue-Beaumont\"). Added function make_gen_boot_factors() create replicate adjustment factors using generalized survey bootstrap. key input make_gen_boot_factors() matrix quadratic form used represent variance estimator. new function make_quad_form_matrix() can used represent chosen variance estimator quadratic form, given information sample design. can used stratified multistage SRS designs (without replacement), systematic samples, PPS samples, without replacement. Minor Updates Bug Fixes: using as_data_frame_with_weights(), ensure full-sample weight named \"FULL_SAMPLE_WGT\" user specify something different. calibrate_to_estimate(), ensure output names list columns perturbed control columns col_selection instead perturbed_control_cols, name matches corresponding function argument, col_selection. Improvements documentation (formatting tweaks typo fixes)","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-030","dir":"Changelog","previous_headings":"","what":"svrep 0.3.0","title":"svrep 0.3.0","text":"CRAN release: 2022-07-05 Added helper function as_data_frame_with_weights() convert survey design object data frame columns weights (full-sample weights , applicable, replicate weights). useful saving data weights data file. Added argument summarize_rep_weights() allows specification one grouping variables use summaries (e.g. = c('stratum', 'response_status') can used summarize response status within stratum). Added small vignette “Nonresponse Adjustments” illustrate conduct nonresponse adjustments using redistribute_weights(). Minor Updates Bug Fixes: Internal code update avoid annoying harmless warning message rho calibrate_to_estimate(). Bug fix stack_replicate_designs() designs created .svrepdesign(..., type = 'mrbbootstrap') .svrepdesign(..., type = 'subbootstrap') threw error.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-020","dir":"Changelog","previous_headings":"","what":"svrep 0.2.0","title":"svrep 0.2.0","text":"CRAN release: 2022-05-12 Added functions calibrate_to_estimate() calibrate_to_sample() calibrating estimated control totals methods account sampling variance control totals. overview functions, please see new vignette “Calibrating Estimated Control Totals”. function calibrate_to_estimate() requires user supply vector control totals variance-covariance matrix. function applies Fuller’s proposed adjustments replicate weights, control totals varied across replicates perturbing control totals using spectral decomposition control totals’ variance-covariance matrix. function calibrate_to_sample() requires user supply replicate design primary survey interest well replicate design control survey used estimate control totals calibration. function applies Opsomer & Erciulescu’s method varying control totals across replicates primary survey matching primary survey replicate replicate control survey. Added example dataset, lou_vax_survey, simulated survey measuring Covid-19 vaccination status handful demographic variables, based simple random sample 1,000 residents Louisville, Kentucky approximately 50% response rate. accompanying dataset lou_pums_microdata provides person-level microdata American Community Survey (ACS) 2015-2019 public-use microdata sample (PUMS) data Louisville, KY. dataset lou_pums_microdata includes replicate weights use variance estimation can used generate control totals lou_vax_survey.","code":""},{"path":"https://bschneidr.github.io/svrep/news/index.html","id":"svrep-010","dir":"Changelog","previous_headings":"","what":"svrep 0.1.0","title":"svrep 0.1.0","text":"CRAN release: 2022-03-30 Initial release package.","code":""}]