Skip to content
kartikaye edited this page Apr 1, 2014 · 5 revisions

Talking points:-

  1. Remove pre-Q4 2009 data from analysis on vehicle ownership due to incompleteness, though mileage estimates should still be valid. We can keep it simple by just being consistent for both.
  2. delete mi_per_day = 0 and come to a decision on the extremely high mi_per_day estimates (the data I sent out has a max of 657)
  3. delete records with overlap_pct < 95% for accuracy (or higher if necessary).
  4. Fix a training/validation set for modeling.
Clone this wiki locally