pres_stefan.Rmd

---
title: "Fit or Frite?"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    logo: http://52.214.106.253/img/fit36.png
editor_options: 
  chunk_output_type: console
---


```{r include=FALSE}
library(jsonlite)
library(data.table)
library(hms)
library(data.table)
library(plotly)
library(knitr)

knitr::opts_chunk$set(include = FALSE, echo = FALSE)

randomize_name <- function(x){
  n <- length(unique(x))
  r <- randomNames::randomNames(n, which.names = "first", sample.with.replacement = FALSE)
  r[as.integer(as.factor(x))]
}


```
```{r prep ilog}
act <- readRDS("/mnt/s3/ilog/activities.rds")
act[, respondent_id := respondent]
act[, respondent := randomize_name(respondent)]


act[, sporty_type := grepl("sport", type, ignore.case = TRUE)]
act[, sporty_mot  := grepl("(foot)|(bicycle)", type, ignore.case = TRUE)]

act[, time := as.hms(ts_notif)]
act

ppl <- act[, .(
  sports = sum(sporty_type | sporty_mot), 
  answers = sum(type != "Expired"),
  sleep   = sum(type == "Sleeping"),
  days = difftime(max(ts_answer), min(ts_notif), unit = "days")
), 
  by = "respondent"
]

ppl[, fitness := sports/as.numeric(days)]
ppl[, sleepy  := sleep/as.numeric(days)]
ppl[, rigor   := answers/as.numeric(days)]

ppl[, fitlvl := dplyr::case_when(
  fitness == 0 ~ 0,
  fitness <  0.5 ~ 1,
  TRUE ~ 2
)]


ppl <- ppl[rigor > 12 & days >= 1]
act <- act[respondent %in% ppl$respondent]

```


### Mission statement 

```{r include=TRUE, fig.width = 3, fig.height=5}
knitr::include_graphics("http://52.214.106.253/img/intro.svg")
```


### iLog is an interactive time-diary that asks hourly questions

```{r include = TRUE}
knitr::include_graphics("http://52.214.106.253/img/bad_sleep.jpeg")
# /usr/share/nginx/html
```

***
The user has to answer multiple questions related to his activity every hour. 
For long monotonous activities like sleeping, this can become very tedious.


### iLog collects data on sleeping patterns, but the quality of the raw data is questionable

```{r include=TRUE}

data.table::setorderv(ppl, "sleepy")

order <- list(
  categoryorder = "array",
  categoryarray = ppl$respondent[order(ppl$sleepy)]
)


ppl[, fitfct := as.factor(fitlvl)]
levels(ppl$fitfct) <- c("easy-going", "average", "fit")

plot_ly(
  data = ppl,
  legendgroup = ~fitfct
) %>% 
  add_markers(
    x = ~respondent,
    y = ~sleepy,
    type = "scatter",
    color = ~fitfct) %>% 
    layout(xaxis = order, yaxis = list(title = "hours of sleep", range = c(0, 10))) %>% 
    add_segments(x = ~respondent, xend =  max(ppl$respondent), y = 5, yend = 5, showlegend = FALSE
  )

```

***

The average sleeping time of iLog users was 5 hours. This is not realistic and suggests
respondents don't properly fill out sleeping cycle related questions.

### Some of the questions iLog asks could also be answered with sensor data

```{r include = TRUE}
knitr::include_graphics("http://52.214.106.253/img/good_sleep.jpeg")
```

***
Long periods of no activities can easily be identified with sensor
data. No complicated calculations would be necessary to identify
periods of no activity, and this could be done
on the fly by the phone.


### iLog collects a lot of sensor data

```{r sensor meta}

x <- fread("/mnt/s3/ilog/meta.csv")

x[, respondent_id := strtrim(respondent, 4)]
x[, respondent := respondent_id]
x[, respondent := act$respondent[match(x$respondent, act$respondent_id)]]
x[is.na(respondent), respondent := randomize_name(respondent_id)]

data.table::setnames(x, names(x), gsub(".parquet", "", names(x)))
m <- as.matrix(x[, !c("respondent", "respondent_id", "sensors")], rownames = x$respondent)
d <- as.integer(m > 0)
dim(d) <- dim(m)
colnames(d) <- colnames(m)
rownames(d) <- rownames(m)
d <- t(d)
d <- d[order(rowSums(d)), ]
d <- d[, rev(order(colSums(d)))]
colnames(d) <- strtrim(colnames(d), 6)
```
```{r heatmap sensor meta, include=TRUE, echo=FALSE}
plot_ly(
  x = seq_along(colnames(d)), 
  y = rownames(d), 
  z = d, 
  type = "heatmap", 
  showscale = FALSE, 
  colors = c("#66c2a4", "#238b45")
) %>% 
  layout(yaxis = list(dtick = 1))
```

***

Not all sensors are available on all phones, and not all of them
are useful for evaluating sleeping patterns. The following
sensors could be especially useful to identify sleep periods 
and wake-up events:

* accelerometerevent
* screenevent
* orientationevent
* rotationvectorevent
* linearaccelerationevent
* gravityevent
* touchevent
* gyroscopeevent
* headsetplugevent
* airplanemodeevent
* (time)


### Using sensor data to reduce response burden for data on sleeping patterns

```{r}
sleep_events <-
  c(
    "accelerometerevent",
    "screenevent",
    "orientationevent",
    "rotationvectorevent",
    "linearaccelerationevent",
    "gravityevent",
    "touchevent",
    "gyroscopeevent",
    "headsetplugevent",
    "airplanemodeevent"
  )


d[rownames(d) %in% sleep_events] <- d[rownames(d) %in% sleep_events] + 2
```

```{r heatmap sensible sensors, include=TRUE, echo=FALSE}
plot_ly(
  x = seq_along(colnames(d)), 
  y = rownames(d), 
  z = d, 
  type = "heatmap", 
  showscale = FALSE, 
  colors = c("#edf8fb", "#b2e2e2", "#66c2a4", "#238b45")
) %>% 
  layout(yaxis = list(dtick = 1))
```

***

Long periods of inactivy at night indicate sleep

***

```{r include = TRUE, fig.width = 1}
knitr::include_graphics("http://52.214.106.253/img/sleep.svg")
```

### The bigO data set provides self reports and sensor data from phones. 

<div style = "font-size: 20px">

```{r, include = TRUE,ou}
knitr::kable(
  caption = "Main information used for the analysis", 
  data.frame(
    #rmarkdown::paged_table(data.frame(
    Information = c("Geographical", "Physical activity", "Eating habits"),
    Source = c("Cell phone", "Cell phone", "Self reports"),
    Example = c("GPS coordinates", "Number of minutes walked",
                "Warm/cold food")
  ))
knitr::kable(
  caption = "Information not used for the analysis", 
  data.frame(
    #rmarkdown::paged_table(data.frame(
    Information = c("Points of interest","Transportation mode"),
    Source = c("Cell phone","Cell phone"),
    Example = c("School/Home","Bike/bus/train/...")
  ))
```

</div>


### Geographical coordinates allow to link physical activities to countries

```{r include = TRUE, fig.width = 1}
knitr::include_graphics("http://52.214.106.253/img/fitness_cat_country_2.jpg")
```

***

The geographical distribution of fitness among the EU countries is displayed 
with the geofacet package. 

#### Fitness

* **Fitness** of the responent modeled with the number of
  biking, walking and jogging minutes

#### Geographical information

* **Country information** derived from GPS data

### Interactive image gallery for pictures from bigO

<iframe src="http://52.214.106.253/img/trelliscope/" width="100%", height="100%"></iframe>

***

Different eating patterns can be explored via a trelloscope widget. The
filtering can be used to limit the pictures according to several characteristics.

#### Fitness

* **Fitness** of the responent (easygoing, average, fit)
* Median number of **steps** per day

#### Properties of the meal

- Does the meal/drink/snack contain **sugar**?
- Is it **home prepared**?
- What **temperature** does it have?
- **Meal type** (breakfast, lunch, dinner, drink, snack)

#### Other

- When (**hour**, **day**) and where (**country**) was the meal consumed?

### Summary

</br></br></br></br>
<div style="font-size: 25px;  top: 50%;">
* Smart sensors can be used to reduce the response burden in a time
  usage survey.
* Reduced respondent burden can result in better data quality.
* Trellis plots allow to visualize relationships between meal
  choices and related data such fitness of the respondent.
</div>