103-appendix-stitching.Rmd

# Stitching Across Forms {#appendix-stitching}

Because we use different forms for different ages, there are sometimes good reasons to combine data across forms to get a broader range of ages in a particular analysis. We call this combination "stitching." This appendix provides some motivation for the practice. 

The simplest stitching method is to use proportion measures derived each form. as shown in \@ref(fig:appstitch-eng-prod), this naive method does not perform well: There is a clear gap between proportions derived from the two different instruments. This gap is presumably generated by the greater number of items on the WS form Because the WS form includes several hundred more items, many of these will likely be more difficult than those included on the WG form; thus, the proportion for any individual will be lower. 

```{r appstitch-eng-prod, fig.cap="Proportion WS and WG production scores plotted by form."}
num_words <- items %>%
  filter(type == "word") %>%
  group_by(language, form) %>%
  summarise(n = n())

vocab_data <- admins %>%
  select(data_id, language, form, age, sex,
         mom_ed, birth_order, production, comprehension) %>%
  left_join(num_words) %>%
  mutate(no_production = n - production)

eng_prod_data <- vocab_data %>%
  filter(language == "English (American)") %>%
  mutate(mean = production / n, 
         production = production)

ggplot(eng_prod_data,
       aes(x = age, y = mean, col = form)) +
  facet_wrap(~language) +
  geom_jitter(width = .4, size = 1, alpha = .1) +
  geom_smooth(se = FALSE) +
  # geom_line(data = ws_prod_preds,
  #           aes(y = pred, col = percentile, group = percentile)) +
  .scale_colour_discrete(name="Instrument") +
  scale_x_continuous(name = "Age (months)",
                     breaks = seq(8, 30, 4),
                     limits = c(8, 30)) +
  scale_y_continuous(breaks = c(0, .5, 1), lim = c(0, 1),
                     name = "Proportion production") +
  theme(legend.position = "top")
```
A second method is to use absolute numbers, rather than relative proportions. This method is shown in Figure \@ref(fig:appstitch-eng-prod-absolute). While this method appears better, a gap is also visible between the smoothed means from the two instruments. Here, the vocabulary estimated from the WS instrument is *larger*, presumably because the form affords more total items for parents to check, even if they are on average more difficult. 

```{r appstitch-eng-prod-absolute, fig.cap="Total WS and WG production scores plotted by form."}
ggplot(eng_prod_data,
       aes(x = age, y = production, col = form)) +
  facet_wrap(~language) +
  geom_jitter(width = .4, size = 1, alpha = .1) +
  geom_smooth(se = FALSE) +
  # geom_line(data = ws_prod_preds,
  #           aes(y = pred, col = percentile, group = percentile)) +
  .scale_colour_discrete(name="Instrument") +
  scale_x_continuous(name = "Age (months)",
                     breaks = seq(8, 30, 4),
                     limits = c(8, 30)) +
  scale_y_continuous(name = "Total production") +
  theme(legend.position = "top")
```

With these two negative examples in hand, we might be tempted to ask whether differences in the two forms allow data from them to be commensurate at all. For example, a longer form might lead parents to be make different choices (e.g., being more liberal in checking items so as to get through the form faster). To address this issue, we examine data from individual items in  Figure \@ref(fig:appstitch-example), which shows 25 randomly-sampled items from the American English data.

```{r appstitch-example, fig.cap="WS and WG proportion production scores for a set of 25 randomly-sampled examples.", fig.height=9, fig.width=8}

wg_unilemmas <- filter(items, language == "English (American)",
                       form == "WG",
                       !is.na(uni_lemma)) %>%
  pull(uni_lemma)

set.seed(12)
target_items <- sample(wg_unilemmas, 25)#c("dog","table","run")

eng_wg <- get_instrument_data(language = "English (American)",
                              form = "WG", 
                              iteminfo = TRUE, 
                              items = items %>%
                                filter(language == "English (American)", 
                                       form == "WG", 
                                       uni_lemma %in% target_items)  %>%
                                pull(item_id)) %>%
  filter(uni_lemma %in% target_items)

eng_ws <- get_instrument_data(language = "English (American)",
                              form = "WS", 
                              iteminfo = TRUE, 
                              items = items %>%
                                filter(language == "English (American)", 
                                       form == "WS", 
                                       uni_lemma %in% target_items)  %>%
                                pull(item_id)) %>%
  filter(uni_lemma %in% target_items)

eng_items <- bind_rows(eng_wg, eng_ws) %>%
  left_join(admins) %>%
  group_by(age, form, uni_lemma) %>%
  summarise(produces = mean(value == "produces", 
                            na.rm=TRUE)) %>%
  filter(!is.na(age))

ggplot(eng_items, 
       aes(x = age, y = produces, col = form)) + 
  facet_wrap(~uni_lemma) +
  geom_point(alpha = .5) + 
  # geom_smooth() +
  .scale_colour_discrete(name = "Form") +
  labs(x = "Age (months)", y = "Productive vocabulary (proportion)") +
  theme(legend.position = "top")
```


To a first approximation, production trajectories line up quite nicely with little or no visible gap between the two instruments. Thus, at least to the tolerances of visual inspection, we conclude that stitching is best accomplished at the level of individual items. Individual item reports seem robust to some of the details of form construction, while percentiles and absolute scores are clearly not. 

 
<!-- In contrast, this procedure does not work for full percentile ranks -- presumably because of the differences in -->


<!-- You can also stitch item by item, this is trickier but better.  -->