Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reincorporate Stan estimation in latent variable case #10

Open
robertness opened this issue Jan 21, 2019 · 0 comments
Open

Reincorporate Stan estimation in latent variable case #10

robertness opened this issue Jan 21, 2019 · 0 comments

Comments

@robertness
Copy link
Owner

robertness commented Jan 21, 2019

Some orphaned code...

Resample the training data and corresponding noise inferences to match the 50% PKC on, 50% PKC off as in the proposed experiment.

training <- mutate(training, data = noise$data)
training_pkc_off <- training[training$PKC == 0, ]
training_pkc_on <- training[training$PKC == 1, ]
training_pkc_off_upsampled <- sample_n(training_pkc_off,  nrow(training_pkc_on), replace=TRUE)
training_sim <- rbind(training_pkc_on, training_pkc_off_upsampled)

```{r}
# The Stan model models one row of data only, so must be run iteratively.
# This is slow.  The alternative is to model the data vectors of length N.  
# This would require a technique for enforcing independence between noise terms of each data point in the posterior.
stan_scm_file <- system.file('extdata', 'sachs_scm.stan', package="causaldemon", mustWork=TRUE)
scm_model <- stan_model(file=stan_scm_file)

Technical note: In this data subset, PKC is set by intervention. So for the sake of accuracy, the Stan model doesn't do inference on a noise term for PKC. This is not a huge concern because the proposed experiment will apply an intervention to PKC, and therefore not use that noise term.

noise_terms <- paste0("N_", c("PKC", "PKA", "Raf", "Mek", "Erk"))
get_noise <- function(item){
  noise_posterior <- sampling(
    scm_model,
    data = as.list(item),
    cores = parallel::detectCores(),
    iter = 2000,
    pars = noise_terms,
    verbose=FALSE
  )
  print(i)
  print(summary(noise_posterior)$summary[, 'Rhat'])
  samples <- rstan::extract(noise_posterior, permuted = T)
  log_prob <- samples[['lp__']]
  samples <- map(samples[noise_terms], as.numeric)
  out <- samples[noise_terms] %>%
    as_tibble %>%
    mutate(i = factor(1)) %>% 
    nest(-i) %>%
    select(-i)
  return(out)
}

noise <- tibble()
for(i in 1:nrow(training)){
  item <- training[i, , drop=FALSE]
  noise <- rbind(noise, get_noise(item))
}
save(noise, file=paste0(getwd(), '/scm_noise_inference.Rdata'))

```{r, echo=FALSE}
load(paste0(getwd(), '/scm_noise_inference.Rdata'))

For each observation in the data, the inference algorithm provided samples from the posterior of the noise terms. The following are the noise term samples for the first observation in the data.

str(noise$data[[1]])

scm_sim_result <- training %>%
mutate(data = noise$data) %>% # add the noise samples as a variable to the data
mutate(replicate = 1:n()) %>% # label each row (replicate) with an index
unnest(data) %>% # expand out the samples so each row corresponds to a sample
mutate(Erk_sim = scm_simulation(PKA, N_Raf, N_Erk)) %>% # simulate Erk
group_by(PKA, replicate) %>% # calculate the mean Erk value (Bayes estimate of Erk) for each replicate
summarize(Erk_sim_mean = mean(Erk_sim)) %>%
ungroup %>%
group_by(PKA) %>% # Condition on PKA
summarize(p_Erk_on = mean(Erk_sim_mean)) %>%
mutate(
PKA = ifelse(PKA == 1, 'on', 'off'),
data = "scm_sim"
)
results <- rbind(results, scm_sim_result)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant