-
Notifications
You must be signed in to change notification settings - Fork 36
Example: White House Visitor Access Records
According to the White House:
As part of President Obama's commitment to government transparency, we are providing records of White House visitors on an ongoing basis online. In December 2009, we will begin posting all White House visitor records for the period from September 15th onwards under the terms of our new voluntary disclosure policy. 2
If the White House releases all visitor access records, we should be able to find the one that led to:
President Obama sat down with FOX News host Bill O'Reilly on Sunday for a live interview that aired during pre-game coverage of the Super Bowl on local FOX broadcast stations. (02/06/2011) 1
Is Bill O'Reilly any more special, or did he go through the same transparent process that everybody else does?
http://www.whitehouse.gov/briefing-room/disclosures/visitor-records has been providing links to data files describing visits to the White House. Those links have changed over the years, and the corresponding data provided has also changed. To deal with these changes, we cached the data to create the following versions (listed according to date of retrieval). The seven versions of the dataset are available from RPI's SVN.
- 0310
- 0510
- 0810
- 0910
- 2009-2010
- 0511
- 2011-Aug-26
- 2011-Nov-28
This is the earliest version that we saved. We don't know where we got it, except that we promise that we followed the link that was listed at http://www.whitehouse.gov/briefing-room/disclosures/visitor-records. We weren't capturing any provenance.
The headers in the data file are listed below. We're pretty sure that WhiteHouse-WAVES-Key-1209.txt is a copy of the documentation available in March 2010.
NAMELAST
NAMEFIRST
NAMEMID
UIN
BDGNBR
ACCESS_TYPE
TOA
POA
TOD
POD
APPT_MADE_DATE
APPT_START_DATE
APPT_END_DATE
APPT_CANCEL_DATE
Total_People
LAST_UPDATEDBY
POST
LastEntryDate
TERMINAL_SUFFIX
visitee_namelast
visitee_namefirst
MEETING_LOC
MEETING_ROOM
CALLER_NAME_LAST
CALLER_NAME_FIRST
CALLER_ROOM
description
RELEASE_DATE
We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0510.csv on 2010-07-08 and got this.
When comparing the 0310 headers to the 0510 headers, we only see a capitalization/underscore tweak:
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM CALLER_ROOM
description description
RELEASE_DATE | Release Date
Although the data file had the string 0827
, (presumably a month and day), we used 0810
to be consistent with the "convention" that they were following with "monthyear". Once they set a path with the first step, they change it on the second step of the journey!
We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0827.csv on 2010-09-12 and got this.
When comparing the headers from 0510 to the headers of 0810, we see that they reverted to the older CAPS_UNDERSCORE naming for Release Date
and decided that a capital Description
looked nicer. Looks like we didn't save a copy of the header documentation this time; probably because the headers didn't change enough.
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM CALLER_ROOM
description | Description
Release Date | RELEASE_DATE
I really wish I knew who thought, asked, and executed the following thoughts:
- anonymous government entity: "Hey! Let's switch to TABS!"
- anonymous government entity: "And we'll keep the .csv extension!"
No changes to the headers (once you deal with the tabs):
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM CALLER_ROOM
Description Description
RELEASE_DATE RELEASE_DATE
We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-1210.zip on 2010-12-29 and got this, which uncompressed to this.
This appears to be some aggregate of the previous releases.
No header changes from 910:
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM CALLER_ROOM
Description Description
RELEASE_DATE RELEASE_DATE
We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0511.zip on 2011-05-27 and got this, which uncompressed to this.
Government censorship! They removed CALLER_ROOM
in this release.
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM <
Description Description
RELEASE_DATE RELEASE_DATE
On 2011-09-14, http://www.whitehouse.gov/briefing-room/disclosures/visitor-records said:
- "To download Part 1 of the data released in 2011 in its raw format, click here. (.zip of a .csv, 7.4MB)"
- "To download Part 2 of the data released in 2011 in its raw format, click here. (.zip of a .csv, 3.4MB)"
- "To download an explanation of the column headers contained in the raw data file, click here. (.txt, 1.3KB)"
This version was named 2011-Aug-26
for the following reasons:
- The White House is not providing a clear identifier for this version, like it had (implicitly) done in the past (e.g.,
0310
and0511
). Although2011
is the closest thing they provide now, we are concerned that this will not be distinctive enough in the future. - Part 1 (http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0711b.zip) was
Last-Modified
2011-Jul-29. - Part 2 (http://www.whitehouse.gov/sites/default/files/visitors/whitehouse-waves-released-2011_part2.csv_.zip) was
Last-Modified
2011-Aug-26. - The most recent date of the two data files referenced from their page is the best distinctive identifier we have.
Cached versions of the URLs referenced above are available at:
- Part 1 expands to a csv (with a VERY disorienting name!)
- Part 2 expands to a csv
- Documentation for the column headers
The headers of Part 1 and Part 2 are inconsistent:
1 NAMELAST 1 NAMELAST
2 NAMEFIRST 2 NAMEFIRST
3 NAMEMID 3 NAMEMID
4 UIN 4 UIN
5 BDGNBR 5 BDGNBR
6 ACCESS_TYPE 6 ACCESS_TYPE
7 TOA 7 TOA
8 POA 8 POA
9 TOD 9 TOD
10 POD 10 POD
11 APPT_MADE_DATE 11 APPT_MADE_DATE
12 APPT_START_DATE 12 APPT_START_DATE
13 APPT_END_DATE 13 APPT_END_DATE
14 APPT_CANCEL_DATE 14 APPT_CANCEL_DATE
15 Total_People 15 Total_People
16 LAST_UPDATEDBY 16 LAST_UPDATEDBY
17 POST 17 POST
18 LastEntryDate 18 LastEntryDate
19 TERMINAL_SUFFIX 19 TERMINAL_SUFFIX
20 visitee_namelast 20 visitee_namelast
21 visitee_namefirst 21 visitee_namefirst
22 MEETING_LOC 22 MEETING_LOC
23 MEETING_ROOM 23 MEETING_ROOM
24 CALLER_NAME_LAST 24 CALLER_NAME_LAST
25 CALLER_NAME_FIRST 25 CALLER_NAME_FIRST
26 Description | 26 CALLER_ROOM
27 RELEASE_DATE | 27 description
| 28 release_date
Characterizing table completeness of 2011-Aug-26 part 1:
Column:
1 2 Number of values missing:
1234567890123456789012345678 Number of rows (of 447598) with sparseness pattern:
............. ..............| (1) 30
............. ........... ..| (2) 3852
............. ............. | (2) 3826
.. .......... ..............| (2) 8
............. ............ .| (2) 4
.. ........................ | (2) 2
.. ....................... .| (2) 1
......................... .| (2) 1
............. ........... .| (3) 10540
............. ........... . | (3) 9531
.. .......... ............. | (3) 3013
.. .......... ........... ..| (3) 2153
............. ...... ...... | (3) 488
............. ...... .... ..| (3) 268
............. ..... ..... ..| (3) 25
........ ... ..............| (3) 16
.. .......... ............ .| (3) 1
........ ................. | (3) 1
.. .......... ........... . | (4) 8933
.. .......... ........... .| (4) 8125
........ ... ........... ..| (4) 6677
........ ... ............. | (4) 6170
............. ...... .... . | (4) 314
.. .......... ...... ...... | (4) 288
............. ...... .... .| (4) 194
.. .......... ...... .... ..| (4) 99
............. ..... ..... .| (4) 23
.. .......... ..... ..... ..| (4) 4
............. .......... . | (4) 3
.. ..... ................. | (4) 2
.. ..... ............... ..| (4) 2
.. ..... ... ..............| (4) 1
.. ..... ... ............. | (5) 5643
.. ..... ... ........... ..| (5) 3679
........ ... ........... . | (5) 3544
........ ... ........... .| (5) 3544
.... . ..................| (5) 337
........ ... ...... ...... | (5) 263
.. .......... ...... .... . | (5) 235
.. .......... ...... .... .| (5) 150
........ ... ...... .... ..| (5) 129
............. ...... ... .| (5) 45
............. ...... ... . | (5) 15
........ ... ..... ..... ..| (5) 14
........ ... .......... .. | (5) 10
.. .......... .......... . | (5) 6
.. .......... ...... ... . .| (5) 2
.. .......... ..... ..... .| (5) 1
.. ..... ... ........... . | (6) 3434
.. ..... ... ........... .| (6) 2364
.... . ................. | (6) 1224
.... . ............... ..| (6) 648
.. ..... ... ...... ...... | (6) 389
.... . ... ..............| (6) 359
........ ... ...... .... . | (6) 237
.. ..... ... ...... .... ..| (6) 226
.. . . ..................| (6) 223
........ ... ...... .... .| (6) 130
........ ... ..... ..... .| (6) 51
.... . ................ .| (6) 50
.. .......... ...... ... .| (6) 20
........ ... ...... ... ..| (6) 5
.. .......... ...... ... . | (6) 4
........ ... .......... . | (6) 4
.. .......... ......... .| (6) 3
.. ..... ... .......... .. | (6) 1
........ ... ...... ... . .| (6) 1
.... . ... ............. | (7) 148107
.... . ... ........... ..| (7) 97086
.. . . ................. | (7) 867
.. ..... ... ...... .... . | (7) 258
.. . . ............... ..| (7) 170
........ ... ...... ... .| (7) 102
.. ..... ... ...... .... .| (7) 89
.... . ............... .| (7) 74
.... . ............... . | (7) 72
........ ... ...... ... . | (7) 42
.. . . ................ .| (7) 35
.... . .......... ...... | (7) 34
.. ..... ... ..... ..... .| (7) 25
.... . .......... ... ...| (7) 15
.... . ... ............ .| (7) 12
.. . . ... ..............| (7) 7
.... . .......... .... ..| (7) 7
.. . . ... ............. | (8) 36254
.. . . ... ........... ..| (8) 16660
.... . ... ........... . | (8) 11413
.... . ... ........... .| (8) 7928
.... . ... ...... ...... | (8) 5205
.... . ... ...... .... ..| (8) 3302
.... . ... ..... ..... ..| (8) 122
.... . .......... .... . | (8) 119
.. . . ............... . | (8) 107
.. . . ............... .| (8) 59
.. . . ... ............ .| (8) 44
.. ..... ... ...... ... .| (8) 37
.. . . .......... ... ...| (8) 34
.. . . .......... ...... | (8) 29
.. . . .......... .... ..| (8) 14
.... . .......... .... .| (8) 12
.. ..... ... ...... ... . | (8) 6
.... . .......... ... .. | (8) 6
.... . .......... ... . .| (8) 1
.. . . ... ...... ...... | (9) 7386
.. . . ... ........... . | (9) 6338
.. . . ... ........... .| (9) 4875
.. . . ... ...... .... ..| (9) 3215
.... . ... ...... .... . | (9) 1009
.... . ... ...... .... .| (9) 529
.... . ... ..... ..... .| (9) 311
.. . . .......... .... . | (9) 104
.. . . .......... ... .. | (9) 63
.... . ... ...... ... .. | (9) 41
.. . . .......... .... .| (9) 15
.. . . ... ..... ..... ..| (9) 11
.... . ... .......... .| (9) 9
.... . ... .......... . | (9) 5
.... . ......... . .... | (9) 3
.... . .......... ... . | (9) 1
.. . . ... .......... ..| (9) 1
.. . . .............. . | (9) 1
.... . ... ...... ... ..| (9) 1
.. . . ... ...... .... . | (10) 610
.. . . ... ...... .... .| (10) 260
.. . . ... ..... ..... .| (10) 165
.... . ... ...... ... .| (10) 42
.. . . ... ...... ... .. | (10) 41
.... . ... ...... ... . | (10) 24
.. . . ... .......... .| (10) 5
.. . . ......... . .... | (10) 3
.. . . ... .......... . | (10) 2
.. . . .......... ... . | (10) 1
.. ..... ... ..... . . .| (10) 1
.. . . ... ...... ... .| (11) 24
.. . . ... ...... ... . | (11) 7
... .. . ..| (20) 8
... . . .. | (21) 411
... . . ..| (21) 118
.. .. . ..| (21) 47
.. . . .. | (22) 1881
.. . . ..| (22) 86
.. .. . .| (22) 2
.. . . . | (23) 2
.| (27) 1
____________________________ <-- Column always has a value? '|': yes, '_': no.
We can't assume that the last name uniquely identifies the visitee:
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/chad>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/john>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/kevin>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/kim>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/kimberley>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/kojetin>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/r108628c21reference>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/r108629c21reference>
- Does reference by only last name indicate importance? e.g.
visitee/adams/jonathon
(10),visitee/alcantara/joseph
(3), one of thevisitee/alvarez/
(12),visitee/arboleda/edwin
(13),visitee/brewster/bill
(18),visitee/cooper/ryan
(dozens ),visitee/murphy/james
,visitee/potus
(TODO: write the SPARQL query for this)
DATE FAILED: "MEETING" !~ 2 patterns @ :source/whitehouse-gov/dataset/visitor-records/part-1/version/2011-Aug-26/visit_268039 :release_date
DateTimeValueHandler FAILED: "40626.39058" !~ 2 patterns @ :source/whitehouse-gov/dataset/visitor-records/part-1/version/2011-Aug-26/visit_426415 :time_of_arrival
- source/WhiteHouse-WAVES-Released-0611.csv 1/20/2009 to 3/9/2011
NAMELAST
NAMEFIRST
NAMEMID
UIN
BDGNBR
ACCESS_TYPE
TOA
POA
TOD
POD
APPT_MADE_DATE
APPT_START_DATE
APPT_END_DATE
APPT_CANCEL_DATE
Total_People
LAST_UPDATEDBY
POST
LastEntryDate
TERMINAL_SUFFIX
visitee_namelast
visitee_namefirst
MEETING_LOC
MEETING_ROOM
CALLER_NAME_LAST
CALLER_NAME_FIRST
Description
RELEASE_DATE
- source/WhiteHouse-WAVES-Released-1210.csv 1/30/2009 to 9/30/2010
- source/WhiteHouse-WAVES-Released-2011_part2.csv 1/29/2009 to 8/31/2011