Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Suggestion: Improve feasibility of generalized bootstrap for large datasets #20

Open
bschneidr opened this issue Feb 5, 2023 · 1 comment

Comments

@bschneidr
Copy link
Owner

bschneidr commented Feb 5, 2023

Right now, the creation of the generalized bootstrap replicate factors requires a lot of computer memory, because:

  1. The algorithm forms the entire quadratic form matrix of the target variance estimator, which is $n \times n$. Even with a survey of only 1,000 cases, that results in a large matrix.
  2. Sampling from the multivariate normal distribution for $B$ replicates results in a $n \times B$ matrix, where $B$ can be quite large.

A couple ideas for making this more feasible are:

  1. Form bootstrap adjustment factors separately by stratum. The covariances of units in different strata are all zero, so we can get correct results by generating adjustment factors separately for each stratum. This means that instead of creating one big $n \times n$ covariance matrix, we can generate $H$ covariance matrices of size $n_h \times n_h$. Within each of the separate strata $H$, the created columns of replicate weights should be shuffled to ensure independence.
    • Update: (1) is done where possible.
  2. Use quadratic forms whose dimension matches the rank. For example, if a the dataset has 10 rows but three clusters, form a $3 \times 3$ quadratic form, generate adjustment factors for three clusters, and then expand the matrix of replicates to the within-cluster units.
    • Update: (2) is done.
  3. Use an alternative R package that is better able to simulate from high-dimensional covariance matrices. The "mvnfast" package might work well. https://cran.r-project.org/web/packages/mvnfast/mvnfast.pdf
    • Update: May adopt (3) at some point.
@bschneidr
Copy link
Owner Author

Unfortunately "mvnfast" has no unit tests and the minimal documentation has a lot of typos, so I don't think it's a good idea to rely on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant