-
Notifications
You must be signed in to change notification settings - Fork 0
Data Dump v1.0
Evan W. Patton edited this page Apr 20, 2014
·
3 revisions
- massdrivedata-1.0.sql.bz2
- MD5 0f8dc5f709cc92f3ee640b496098249c
- PostgreSQL 9.3
- PostGIS 2.1
- 5.7 GB disk space
- 1 GB RAM (possibly less)
- Converted ERSI Shape file into PostGIS geometry
- Created model_year table to aggregate years into a smaller table
- Created make table to aggregate car manufacturers into a singlular table
- Created model table to aggregate car manufacturer/model pairs into a singular table
- Created vehicle table to differentiate vehicles appearing in more than one Regulation and Emissions records
- Created municipality table from grid_250m_shape table
- Added foreign key relationships between different tables
- Added indices for make/model names, VINs, muni_ids, g250m_ids
- Installation instructions
These instructions have not been fully vetted, so use at your own risk.
- Download Postgres 9.3.4 from EnterpriseDB.
- Download PostGIS 2.1.1 from OSGeo.
- Install Postgres using the installer.
- Install PostGIS using the installer.
- Launch pgAdmin III from the Start Menu.
- Select the PostgreSQL server instance and select "Tools => Connect" from the menu. Enter the password you set during installation.
- From the Edit menu, select New Object > New Database...
- Enter a name for the database, e.g. massdrivedata.
- Set the owner to postgres.
- Close pgAdmin.
- Launch SQL Shell (psql) from the Start Menu.
- Log in to your newly created database.
- Load the SQL file using the \i command: \i 'C:/path/to/downloaded/data.sql'. You might need to put the file into C:/Temp or some similar folder to make Windows let you load it. E.g. \i 'C:\\TEMP\\massdrivedata-1.0.sql'
- To quit, type \q
Using macports:
sudo port install postgresql93 postgresql93-server [email protected]+postgresql93
createdb massdrivedata
bunzip2 -c massdrivedata-1.0.sql.bz2 | psql93 -d massdrivedata
You may also want to install pgAdmin for a GUI for accessing the database.
sudo /bin/sh -c "echo deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -c`-pgdg main >> /etc/apt/sources.list.d/pgdg.list"
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get install postgresql-9.3 postgresql-9.3-postgis-2.1
wget http://www.evanpatton.com/projects/massdrive/massdrivedata-1.0.sql.bz2
sudo -u postgres createdb massdrivedata
bunzip2 -c massdrivedata-1.0.sql.bz2 | sudo -u postgresql-9.3 psql -d massdrivedata
- get number of toyotas in dataset
select count(*)
from vehicle join make using (make_id)
where make_name = 'Toyota';
count |
---|
678247 |
- get sorted list of percentage of vehicles by manufacturer
select make_name, count(vin_id), (100.0*cast(count(vin_id) as real)/(select count(*) from vehicle)) as percentage
from vehicle join make using (make_id)
group by make_name
order by percentage desc;
make_name | count | percentage |
---|---|---|
Toyota | 678247 | 15.0526272804304 |
Ford | 590035 | 13.0949004380539 |
Honda | 440999 | 9.78728041265576 |
Chevrolet | 405376 | 8.9966838576975 |
Nissan | 233808 | 5.18900146876119 |
Dodge | 205374 | 4.5579534816831 |
Jeep | 168626 | 3.74238931803585 |
Subaru | 120538 | 2.67515165880353 |
Gmc | 110818 | 2.45943151973063 |
Hyundai | 108722 | 2.41291409056429 |
Volkswagen | 91616 | 2.0332732779119 |
Chrysler | 85022 | 1.88692980084948 |
Volvo | 83531 | 1.85383939680033 |
Buick | 81972 | 1.81923983951487 |
Mercury | 81585 | 1.81065098212586 |
Bmw | 81180 | 1.80166264299782 |
Lexus | 72906 | 1.6180342036265 |
Mazda | 71363 | 1.58378974122017 |
Mercedes | 66898 | 1.48469607651229 |
Pontiac | 64699 | 1.43589272406154 |
Acura | 60344 | 1.3392403366477 |
Unknown | 56289 | 1.24924597821759 |
Saturn | 52733 | 1.17032614133043 |
Cadillac | 46589 | 1.0339697077436 |
Kia | 45094 | 1.00079052997467 |
Audi | 41331 | 0.917276653088726 |
Mitsubishi | 38574 | 0.856089366728231 |
Harley | 36968 | 0.82044671823532 |
Lincoln | 36220 | 0.803846032635883 |
Saab | 33251 | 0.737953739126884 |
Infiniti | 32278 | 0.716359531789647 |
Oldsmobile | 31048 | 0.689061612956347 |
Suzuki | 14894 | 0.330548945612337 |
Plymouth | 14640 | 0.32491181440611 |
Scion | 14026 | 0.311285048419406 |
Isuzu | 13100 | 0.290733932289621 |
Intl | 11780 | 0.261438604761201 |
Land Rover | 10962 | 0.243284379065559 |
Freightliner | 9714 | 0.215586978493235 |
Mini | 9263 | 0.205577741587691 |
Porsche | 8651 | 0.191995362460879 |
Jaguar | 7220 | 0.160236564208478 |
Yamaha | 6815 | 0.15124822508044 |
Geo | 6668 | 0.147985790878412 |
Kawasaki | 6439 | 0.14290349542083 |
Hummer | 3409 | 0.0756574026851387 |
Triumph | 1142 | 0.0253448969980723 |
Workhorse | 857 | 0.0190197694635271 |
Polaris | 787 | 0.017466229367323 |
Smart | 715 | 0.0158683024112274 |
Daewoo | 685 | 0.0152024995128542 |
Eagle | 648 | 0.0143813426048606 |
Kw | 594 | 0.0131828973877889 |
Mgb | 495 | 0.0109857478231574 |
Alfa Romeo | 406 | 0.00901053255798366 |
Ferrari | 399 | 0.00885517854836326 |
Mci | 364 | 0.00807840850026122 |
snip... |
- compute average odometer reading for every municipality for 2011 inspections sorted descending by odometer
select muni_name, avg_odom
from municipality join
-- this inner table could be replaced by a view
( select muni_id, avg(start_odom) as avg_odom
from rae_public
where insp_year = 2011
group by muni_id) as t using (muni_id)
order by avg_odom desc;
muni_name | avg_odom | |
---|---|---|
GOSNOLD | 156637.000000000000 | |
HANCOCK | 109736.098726114650 | |
WHATELY | 107538.352830188679 | |
LAWRENCE | 102574.580867518494 | |
WENDELL | 101330.714975845411 | |
NEW SALEM | 98314.463709677419 | |
HATFIELD | 97568.341673182174 | |
PLAINFIELD | 96087.983146067416 | |
FREETOWN | 94097.785403576607 | |
ASHFIELD | 93419.001600000000 | |
CHARLEMONT | 92678.308080808081 | |
PETERSHAM | 92125.079019073569 | |
BROOKFIELD | 91760.321407624633 | |
LEYDEN | 91558.610972568579 | |
SHELBURNE | 91437.688435374150 | |
PLYMPTON | 90992.056917688266 | |
HAWLEY | 90856.323809523810 | |
WEST BRIDGEWATER | 90798.392344497608 | |
WAREHAM | 90245.044320342205 | |
ROYALSTON | 90236.733333333333 | |
PHILLIPSTON | 90054.643973214286 | |
ORANGE | 89918.500492287496 | |
SHUTESBURY | 89347.190397350993 | |
BUCKLAND | 89312.807106598985 | |
WARWICK | 89261.572164948454 | |
AYER | 88158.032517807371 | |
BROCKTON | 88094.455335740995 | |
NEW BRAINTREE | 88093.619607843137 | |
TEMPLETON | 87865.369844020797 | |
LAKEVILLE | 87664.127798507463 | |
NORTHFIELD | 87625.521702127660 | |
HOLLAND | 87602.257275902212 | |
MIDDLEBOROUGH | 87598.072782819660 | |
HEATH | 87379.780000000000 | |
WARREN | 87235.190740740741 | |
WORTHINGTON | 87230.616519174041 | |
WALES | 87182.145794392523 | |
CONWAY | 87164.664133738602 | |
SUNDERLAND | 87087.528594771242 | |
BRIMFIELD | 86973.943371943372 | |
COLRAIN | 86955.183673469388 | |
GRANVILLE | 86720.506566604128 | |
CUMMINGTON | 86622.181451612903 | |
WILLIAMSBURG | 86476.799342105263 | |
ROCHESTER | 86285.817213114754 | |
BERKLEY | 86230.205332409972 | |
ERVING | 86153.606965174129 | |
PALMER | 86087.850258531541 | |
LEVERETT | 86083.598601398601 | |
WINCHENDON | 86062.786084381939 | |
PEPPERELL | 86052.366598430570 | |
SPRINGFIELD | 86028.343141592920 | |
BERNARDSTON | 85777.680424528302 | |
MONSON | 85742.397087378641 | |
MONTAGUE | 85641.875882571535 | |
AVON | 85631.151141552511 | |
HOLYOKE | 85564.579687500000 | |
snip... |