Pipeline for processing data collected with the GENEActiv accelerometer in adolescent members of the Millennium cohort study.
We use R package GGIR to analyse the accelerometer in the conventional way.
Centre for Longitudinal Studies uses R/applyGGIR.R with argument mode = 1 to extract raw data from the two specific days on which the accelerometer was worn. Here, the data is exported to RData files.
.RData files are received by Netherlands eScience Center in encrypted zipped folders. Command to extract them (password not included):
find . -name "*.zip" -type f| xargs -I {} 7z x {}
If the RData files are provided across multiple folders then put all of them in one folder:
mkdir raw
find . -name "*.RData" | xargs -I {} mv {} raw
The time use diary files and wearcode files are generated by the Centre for Longitudinal Studies. If these are provided in mulitple folder then merge them. For example, in R you could use a commands like this:
# merge time use diary (tud):
tud = read.csv(paste0(path,"/tud.csv"))
tud2 = read.csv(paste0(path,"/tud2.csv"))
tud3 = merge(tud,tud2,all=TRUE)
write.csv(tud3,paste0(path,"/tud3.csv"),row.names = FALSE)
# merge wearcodes:
wc = read.csv(paste0(path,"/wc1.csv"))
wc2 = read.csv(paste0(path,"/wc2.csv"))
colnames_of_interest = c("Monitor","Day1","Day2","binFile","file","accSmallID")
wc = wc[,colnames_of_interest]
wc2 = wc2[,colnames_of_interest]
wc3 = merge(wc,wc2,all=TRUE)
write.csv(wc3,paste0(path,"/wc3.csv"),row.names = FALSE)
We run applyGGIR.R with mode =c(1,2) and derive from this day specific reports as well as 10 minute window specific reports
In preparation for the Hidden semi-Markov Models we run R/addheuristics_convert2csv.R to generate csv-files with time series of aggregated data. Also this step generates an indicator of heuristic classes of behaviour (e.g. bouts of MVPA).
The heuristic categories are:
- sustained inactivity or sleep
- non-bouted inactivity
>=
30 minute bouts of inactivity- 10-29 minute bouts of inactivity
- non-bouted light activity
>=
10 minute bouts of light physical activity (LPA)- 1-9 minunte bouts of LPA
- non-bouted moderate or vigorous physical activity (MVPA)
>=
10 minute bouts of MVPA- 1-9 minute bouts of MVPA
We run mergewithID.R to merging in the participant identifier from the wearcodes.csv file and to tidy up the variable list
Further, we use a Hidden semi-Markov model to explore unsupervised analyses of the data.
2.1 Train Hidden semi-Markov Model
Follow the steps as outlined here In summary:
- Make sure you have Python 2.7
- Install the library hsmm4acc
- Adjust the config
- Run the scripts 0_prepare_data.py and 1_HSMM.py