Skip to content

Processing Script

Ben Murray edited this page Jul 6, 2020 · 3 revisions

The Processing Script

v0.2.1

Sort data

  • Sort patient data by patients.id
  • Sort assessment data by (assessments.patient_id, assessments.created_at)
  • Sort covid_test_data by (tests.patient_id, tests.id)

Clean patient data

  • Calculate patients.age from patients.year_of_birth as numeric field

  • Calculate patients.age_valid from patients.year_of_birth_valid filter as boolean field

  • Calculate patients.16_to_90_years filter as boolean field

  • Calculate patients.weight_kg_clean from patients.weight_kg as numeric field

  • Calculate patients.40_to_200_kg filter as boolean field

  • Calculate patients.height_cm_clean from patients.height_cm as numeric field

  • Calculate patients.110_to_220_cm filter as boolean field

  • Calculate patients.bmi_clean from patients.bmi as numeric field

  • Calculate patients.15_to_55_bmi filter as boolean field

Clean assessment data

  • Create assessments.assessment_patient_id_fkey foreign key from assessments.patient_id to patients.id of indices into patients

  • Create assessments.temperature_c_clean from assessments.temperature and assessments.temperature_unit

  • Create assessments.temperature_35_to_42_inclusive filter as boolean field

  • Create assessments.temperature_modified filter for entries that required cleaning, as boolean field

  • Create assessments.inconsistent_healthy and assessments.inconsistent_not_healthy from assessments.health_status and related symptom fields as boolean fields

  • Generate daily_assessments

    • daily_assessments.id from last assessment for a given day
    • daily_assessments.patient_id from last assessment for a given day
    • daily_assessments.created_at from last assessment for a given day
    • daily_assessments.created_at_day from last assessment for a given day
    • daily_assessments.updated_at from last assessment for a given day
    • daily_assessments.updated_at_day from last assessment for a given day
    • daily_assessments.version: maximum value from patient's assessments in that day
    • daily_assessments.country_code from last assessment for a given day
    • categorical fields: maximum values from patient's assessments in that day
    • numeric fields: maximum values from patient's assessments in that day
    • concatenate non-empty indexed string fields as comma separated list, with escapes if necessary
  • Generate patient-level measures

    • patients.assessment_count from assessments
    • patients.first_assessment_day from assessments
    • patients.last_assessment_day from assessments
    • patients.daily_assessment_count from daily_assessments
Clone this wiki locally