- Script-sanitizer requires dplyr R package to run
- Script expects to find UCI HAR Dataset within current directory
- Script writes tidied data into UCI_HAR_tidied.txt
- It reads datasets slowly since read.table is utilized as file reader(fread of data.table causes SIGSEGV under linux)
How it works
- Read and prepare/enrich dataSet(readEnrichedDataSet function):
- Read dataSet(X_test/X_train), with correct columns(variables) names(comes from features.txt).
- Attach SUBJECT_ID column(subject_test.txt/subject_train.txt)
- Attach ACTIVITY_ID column(y_test.txt/y_train.txt)
- Attach ACTIVITY_NAME column with corresponding activity names(activity_labels.txt)
- Filter out unneeded columns, transform column names(extractMeasuresOfInterest function)
- retain columns containing 'std()' or 'mean()'(but meanFreq dropped) in names.
- std() becomes standard_deviation, mean() becomes mean_value
- Group dataSet by SUBJECT_ID, ACTIVITY_NAME, calculate mean values of remaining columns within groups.
- Write processed dataSet into UCI_HAR_tidied.txt (180 observations of 68 variables (SUBJECT_ID, ACTIVITY_NAME, 66 of sensors measurements))