From 1d63f9572f1feebcda37a59227798efd0a53f3b6 Mon Sep 17 00:00:00 2001 From: guangguangzai Date: Fri, 24 May 2024 15:23:46 -0400 Subject: [PATCH 1/2] This is a initial version --- vignettes/calculate_correlation.Rmd | 120 ++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 vignettes/calculate_correlation.Rmd diff --git a/vignettes/calculate_correlation.Rmd b/vignettes/calculate_correlation.Rmd new file mode 100644 index 0000000..45467c2 --- /dev/null +++ b/vignettes/calculate_correlation.Rmd @@ -0,0 +1,120 @@ +--- +title: "Correlation Matrix Calculation" +author: "Chenguang Zhang" +date: "2024-05-14" +output: html_document +--- + +The weighted parametric group sequential design (WPGSD) (Anderson et al. (2022)) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. + +## Notation + +Suppose that in a group sequential trial there are $m$ elementary null hypotheses $H_i$, $i \in I={1,...,m}$, and there are $K$ analyses. Let $k$ be the index for the interim analyses and final analyses, $k=1,2,...K$. For any noempty set $J \subseteq I$, we denote the intersection hypothesis $H_J=\cap_{j \in J}H_j$. We note that $H_I$ is the global null hypothesis. + +We assume the plan is for all hypotheses to be tested at each of the $k$ planned analyses if the trial continues to the end for all hypotheses. We further assume that the distribution of the $m \times K$ tests of $m$ individual hypotheses at all $k$ analyses is multivariate normal with a completely known correlation matrix. + +Let $Z_{ik}$ be the standardized normal test statistic for hypothesis $i \in I$, analysis $1 \le k \le K$. Let $n_{ik}$ be the number of events collected cumulatively through stage $k$ for hypothesis $i$. Then $n_{i \wedge i',k \wedge k'}$ is the number of events included in both $Z_{ik}$ and $i$, $i' \in I$, $1 \le k$, $k' \le K$. The key of the parametric tests to utilize the correlation among the test statistics. The correlation between $Z_{ik}$ and $Z_{i'k'}$ is +$$Corr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}}}$$. + +## Examples + +In a 2-arm controlled clinical trial example with one primary endpoint, there are 3 patient populations defined by the status of two biomarkers A and B: + +* Biomarker A positive, the population 1, +* Biomarker B positive, the population 2, +* Overall population. + +The 3 primary elementary hypotheses are: + +* H1: the experimental treatment is superior to the control in the population 1 +* H2: the experimental treatment is superior to the control in the population 2 +* H3: the experimental treatment is superior to the control in the overall population + +Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as +```{r} +library(dplyr) +library(tibble) +library(gt) +event_tb <- tribble( + ~Population, ~"Number of Event in IA", ~"Number of Event in FA", + "Population 1", 100,200, + "Population 2", 110,220, + "Overlap of Population 1 and 2", 80,160, + "Overall Population", 225, 450 +) +event_tb %>% + gt() %>% + tab_header(title = "Number of events at each population") +``` + +### Example 1 - Same Analyses Different Population +Let's consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then $k=1$, and to compare $H_{1}$ and $H_{2}$, the $i$ will be $i=1$ and $i=2$. +The correlation matrix will be +$$Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}}$$ +The number of events are listed as +```{r} +event_tbl <- tribble( + ~Population, ~"Number of Event in IA", + "Population 1", 100, + "Population 2", 110, + "Overlap in population 1 and 2", 80 +) +event_tbl %>% + gt() %>% + tab_header(title = "Number of events at each population in example 1") +``` +The the corrleation could be simply calculated as +$$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$ +```{r} +Corr1=80/sqrt(100*110) +round(Corr1,2) +``` + +### Example 2 - Same Population Different Analyses +Let's consider another simple situation, we want to compare single population, for example population 1, but in different analyses, interim and final analyses. Then $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. +The correlation matrix will be +$$Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}}$$ +The number of events are listed as +```{r} +event_tb2 <- tribble( + ~Population, ~"Number of Event in IA", ~"Number of Event in FA", + "Population 1", 100,200 +) +event_tb2 %>% + gt() %>% + tab_header(title = "Number of events at each analyses in example 2") +``` +The the corrleation could be simply calculated as +$$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$ +```{r} +Corr1=100/sqrt(100*200) +round(Corr1,2) +``` +### Example 3 - Cross Population Cross Analyses +Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. +The correlation matrix will be +$$Corr(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$ +The number of events are listed as +```{r} +event_tb3 <- tribble( + ~Population, ~"Number of Event in IA", ~"Number of Event in FA", + "Population 1", 100,200, + "Population 2", 110, 220, + "Overlap in population 1 and 2", 80,160 + +) +event_tb3 %>% + gt() %>% + tab_header(title = "Number of events at each population & analyses in example 3") +``` +The the corrleation could be simply calculated as +$$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$ +```{r} +Corr1=80/sqrt(100*220) +round(Corr1,2) +``` +Now we know how to calculate the correlation values under different situations, and the generate_corr function was built based on this logic. We can directly calculate the results for each cross situation via the function. See code below. +```{r} +#library(wpgsd) + +``` \ No newline at end of file From fb7eb89151d0a7653658c418d27a50a73047c758 Mon Sep 17 00:00:00 2001 From: guangguangzai Date: Thu, 6 Jun 2024 18:13:58 +0000 Subject: [PATCH 2/2] Style code (GHA) --- vignettes/calculate_correlation.Rmd | 32 ++++++++++++++--------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/vignettes/calculate_correlation.Rmd b/vignettes/calculate_correlation.Rmd index 45467c2..e51d825 100644 --- a/vignettes/calculate_correlation.Rmd +++ b/vignettes/calculate_correlation.Rmd @@ -37,9 +37,9 @@ library(tibble) library(gt) event_tb <- tribble( ~Population, ~"Number of Event in IA", ~"Number of Event in FA", - "Population 1", 100,200, - "Population 2", 110,220, - "Overlap of Population 1 and 2", 80,160, + "Population 1", 100, 200, + "Population 2", 110, 220, + "Overlap of Population 1 and 2", 80, 160, "Overall Population", 225, 450 ) event_tb %>% @@ -56,7 +56,7 @@ The number of events are listed as event_tbl <- tribble( ~Population, ~"Number of Event in IA", "Population 1", 100, - "Population 2", 110, + "Population 2", 110, "Overlap in population 1 and 2", 80 ) event_tbl %>% @@ -66,8 +66,8 @@ event_tbl %>% The the corrleation could be simply calculated as $$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$ ```{r} -Corr1=80/sqrt(100*110) -round(Corr1,2) +Corr1 <- 80 / sqrt(100 * 110) +round(Corr1, 2) ``` ### Example 2 - Same Population Different Analyses @@ -78,7 +78,7 @@ The number of events are listed as ```{r} event_tb2 <- tribble( ~Population, ~"Number of Event in IA", ~"Number of Event in FA", - "Population 1", 100,200 + "Population 1", 100, 200 ) event_tb2 %>% gt() %>% @@ -87,8 +87,8 @@ event_tb2 %>% The the corrleation could be simply calculated as $$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$ ```{r} -Corr1=100/sqrt(100*200) -round(Corr1,2) +Corr1 <- 100 / sqrt(100 * 200) +round(Corr1, 2) ``` ### Example 3 - Cross Population Cross Analyses Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. @@ -98,10 +98,9 @@ The number of events are listed as ```{r} event_tb3 <- tribble( ~Population, ~"Number of Event in IA", ~"Number of Event in FA", - "Population 1", 100,200, + "Population 1", 100, 200, "Population 2", 110, 220, - "Overlap in population 1 and 2", 80,160 - + "Overlap in population 1 and 2", 80, 160 ) event_tb3 %>% gt() %>% @@ -110,11 +109,10 @@ event_tb3 %>% The the corrleation could be simply calculated as $$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$ ```{r} -Corr1=80/sqrt(100*220) -round(Corr1,2) +Corr1 <- 80 / sqrt(100 * 220) +round(Corr1, 2) ``` Now we know how to calculate the correlation values under different situations, and the generate_corr function was built based on this logic. We can directly calculate the results for each cross situation via the function. See code below. ```{r} -#library(wpgsd) - -``` \ No newline at end of file +# library(wpgsd) +```