Skip to content

Example: White House Visitor Access Records

Timothy Lebo edited this page Feb 14, 2012 · 35 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

According to the White House:

As part of President Obama's commitment to government transparency, we are providing records of White House visitors on an ongoing basis online. In December 2009, we will begin posting all White House visitor records for the period from September 15th onwards under the terms of our new voluntary disclosure policy. 2

If the White House releases all visitor access records, we should be able to find the one that led to:

President Obama sat down with FOX News host Bill O'Reilly on Sunday for a live interview that aired during pre-game coverage of the Super Bowl on local FOX broadcast stations. (02/06/2011) 1

Is Bill O'Reilly any more special, or did he go through the same transparent process that everybody else does?

http://www.whitehouse.gov/briefing-room/disclosures/visitor-records has been providing links to data files describing visits to the White House. Those links have changed over the years, and the corresponding data provided has also changed. To deal with these changes, we cached the data to create the following versions (listed according to date of retrieval). The seven versions of the dataset are available from RPI's SVN.

  • 0310
  • 0510
  • 0810
  • 0910
  • 2009-2010
  • 0511
  • 2011-Aug-26
  • 2011-Nov-28

Version: 0310

This is the earliest version that we saved. We don't know where we got it, except that we promise that we followed the link that was listed at http://www.whitehouse.gov/briefing-room/disclosures/visitor-records. We weren't capturing any provenance.

The headers in the data file are listed below. We're pretty sure that WhiteHouse-WAVES-Key-1209.txt is a copy of the documentation available in March 2010.

NAMELAST
NAMEFIRST
NAMEMID
UIN
BDGNBR
ACCESS_TYPE
TOA
POA
TOD
POD
APPT_MADE_DATE
APPT_START_DATE
APPT_END_DATE
APPT_CANCEL_DATE
Total_People
LAST_UPDATEDBY
POST
LastEntryDate
TERMINAL_SUFFIX
visitee_namelast
visitee_namefirst
MEETING_LOC
MEETING_ROOM
CALLER_NAME_LAST
CALLER_NAME_FIRST
CALLER_ROOM
description
RELEASE_DATE

Version: 0510

We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0510.csv on 2010-07-08 and got this.

When comparing the 0310 headers to the 0510 headers, we only see a capitalization/underscore tweak:

NAMELAST							NAMELAST
NAMEFIRST							NAMEFIRST
NAMEMID								NAMEMID
UIN									UIN
BDGNBR								BDGNBR
ACCESS_TYPE							ACCESS_TYPE
TOA									TOA
POA									POA
TOD									TOD
POD									POD
APPT_MADE_DATE						APPT_MADE_DATE
APPT_START_DATE						APPT_START_DATE
APPT_END_DATE						APPT_END_DATE
APPT_CANCEL_DATE					APPT_CANCEL_DATE
Total_People						Total_People
LAST_UPDATEDBY						LAST_UPDATEDBY
POST								POST
LastEntryDate						LastEntryDate
TERMINAL_SUFFIX						TERMINAL_SUFFIX
visitee_namelast					visitee_namelast
visitee_namefirst					visitee_namefirst
MEETING_LOC							MEETING_LOC
MEETING_ROOM						MEETING_ROOM
CALLER_NAME_LAST					CALLER_NAME_LAST
CALLER_NAME_FIRST					CALLER_NAME_FIRST
CALLER_ROOM							CALLER_ROOM
description							description
RELEASE_DATE				    |	Release Date

Version: 0810

Although the data file had the string 0827, (presumably a month and day), we used 0810 to be consistent with the "convention" that they were following with "monthyear". Once they set a path with the first step, they change it on the second step of the journey!

We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0827.csv on 2010-09-12 and got this.

When comparing the headers from 0510 to the headers of 0810, we see that they reverted to the older CAPS_UNDERSCORE naming for Release Date and decided that a capital Description looked nicer. Looks like we didn't save a copy of the header documentation this time; probably because the headers didn't change enough.

NAMELAST							NAMELAST
NAMEFIRST							NAMEFIRST
NAMEMID								NAMEMID
UIN									UIN
BDGNBR								BDGNBR
ACCESS_TYPE							ACCESS_TYPE
TOA									TOA
POA									POA
TOD									TOD
POD									POD
APPT_MADE_DATE						APPT_MADE_DATE
APPT_START_DATE						APPT_START_DATE
APPT_END_DATE						APPT_END_DATE
APPT_CANCEL_DATE					APPT_CANCEL_DATE
Total_People						Total_People
LAST_UPDATEDBY						LAST_UPDATEDBY
POST								POST
LastEntryDate						LastEntryDate
TERMINAL_SUFFIX						TERMINAL_SUFFIX
visitee_namelast					visitee_namelast
visitee_namefirst					visitee_namefirst
MEETING_LOC							MEETING_LOC
MEETING_ROOM						MEETING_ROOM
CALLER_NAME_LAST					CALLER_NAME_LAST
CALLER_NAME_FIRST					CALLER_NAME_FIRST
CALLER_ROOM							CALLER_ROOM
description					    |	Description
Release Date				    |	RELEASE_DATE

Version: 0910

I really wish I knew who thought, asked, and executed the following thoughts:

No changes to the headers (once you deal with the tabs):

NAMELAST							NAMELAST
NAMEFIRST							NAMEFIRST
NAMEMID								NAMEMID
UIN									UIN
BDGNBR								BDGNBR
ACCESS_TYPE							ACCESS_TYPE
TOA									TOA
POA									POA
TOD									TOD
POD									POD
APPT_MADE_DATE						APPT_MADE_DATE
APPT_START_DATE						APPT_START_DATE
APPT_END_DATE						APPT_END_DATE
APPT_CANCEL_DATE					APPT_CANCEL_DATE
Total_People						Total_People
LAST_UPDATEDBY						LAST_UPDATEDBY
POST								POST
LastEntryDate						LastEntryDate
TERMINAL_SUFFIX						TERMINAL_SUFFIX
visitee_namelast					visitee_namelast
visitee_namefirst					visitee_namefirst
MEETING_LOC							MEETING_LOC
MEETING_ROOM						MEETING_ROOM
CALLER_NAME_LAST					CALLER_NAME_LAST
CALLER_NAME_FIRST					CALLER_NAME_FIRST
CALLER_ROOM							CALLER_ROOM
Description							Description
RELEASE_DATE						RELEASE_DATE

Version: 2009-2010

We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-1210.zip on 2010-12-29 and got this, which uncompressed to this.

This appears to be some aggregate of the previous releases.

No header changes from 910:

NAMELAST							NAMELAST
NAMEFIRST							NAMEFIRST
NAMEMID								NAMEMID
UIN									UIN
BDGNBR								BDGNBR
ACCESS_TYPE							ACCESS_TYPE
TOA									TOA
POA									POA
TOD									TOD
POD									POD
APPT_MADE_DATE						APPT_MADE_DATE
APPT_START_DATE						APPT_START_DATE
APPT_END_DATE						APPT_END_DATE
APPT_CANCEL_DATE					APPT_CANCEL_DATE
Total_People						Total_People
LAST_UPDATEDBY						LAST_UPDATEDBY
POST								POST
LastEntryDate						LastEntryDate
TERMINAL_SUFFIX						TERMINAL_SUFFIX
visitee_namelast					visitee_namelast
visitee_namefirst					visitee_namefirst
MEETING_LOC							MEETING_LOC
MEETING_ROOM						MEETING_ROOM
CALLER_NAME_LAST					CALLER_NAME_LAST
CALLER_NAME_FIRST					CALLER_NAME_FIRST
CALLER_ROOM							CALLER_ROOM
Description							Description
RELEASE_DATE						RELEASE_DATE

Version: 0511

We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0511.zip on 2011-05-27 and got this, which uncompressed to this.

Government censorship! They removed CALLER_ROOM in this release.

NAMELAST							NAMELAST
NAMEFIRST							NAMEFIRST
NAMEMID								NAMEMID
UIN									UIN
BDGNBR								BDGNBR
ACCESS_TYPE							ACCESS_TYPE
TOA									TOA
POA									POA
TOD									TOD
POD									POD
APPT_MADE_DATE						APPT_MADE_DATE
APPT_START_DATE						APPT_START_DATE
APPT_END_DATE						APPT_END_DATE
APPT_CANCEL_DATE					APPT_CANCEL_DATE
Total_People						Total_People
LAST_UPDATEDBY						LAST_UPDATEDBY
POST								POST
LastEntryDate						LastEntryDate
TERMINAL_SUFFIX						TERMINAL_SUFFIX
visitee_namelast					visitee_namelast
visitee_namefirst					visitee_namefirst
MEETING_LOC							MEETING_LOC
MEETING_ROOM						MEETING_ROOM
CALLER_NAME_LAST					CALLER_NAME_LAST
CALLER_NAME_FIRST					CALLER_NAME_FIRST
CALLER_ROOM						    <
Description							Description
RELEASE_DATE						RELEASE_DATE

Version: 2011-Aug-26

On 2011-09-14, http://www.whitehouse.gov/briefing-room/disclosures/visitor-records said:

  • "To download Part 1 of the data released in 2011 in its raw format, click here. (.zip of a .csv, 7.4MB)"
  • "To download Part 2 of the data released in 2011 in its raw format, click here. (.zip of a .csv, 3.4MB)"
  • "To download an explanation of the column headers contained in the raw data file, click here. (.txt, 1.3KB)"

This version was named 2011-Aug-26 for the following reasons:

Cached versions of the URLs referenced above are available at:

The headers of Part 1 and Part 2 are inconsistent:

1 NAMELAST								1 NAMELAST
2 NAMEFIRST								2 NAMEFIRST
3 NAMEMID								3 NAMEMID
4 UIN									4 UIN
5 BDGNBR								5 BDGNBR
6 ACCESS_TYPE							6 ACCESS_TYPE
7 TOA									7 TOA
8 POA									8 POA
9 TOD									9 TOD
10 POD									10 POD
11 APPT_MADE_DATE						11 APPT_MADE_DATE
12 APPT_START_DATE						12 APPT_START_DATE
13 APPT_END_DATE						13 APPT_END_DATE
14 APPT_CANCEL_DATE						14 APPT_CANCEL_DATE
15 Total_People							15 Total_People
16 LAST_UPDATEDBY						16 LAST_UPDATEDBY
17 POST									17 POST
18 LastEntryDate						18 LastEntryDate
19 TERMINAL_SUFFIX						19 TERMINAL_SUFFIX
20 visitee_namelast						20 visitee_namelast
21 visitee_namefirst					21 visitee_namefirst
22 MEETING_LOC							22 MEETING_LOC
23 MEETING_ROOM							23 MEETING_ROOM
24 CALLER_NAME_LAST						24 CALLER_NAME_LAST
25 CALLER_NAME_FIRST					25 CALLER_NAME_FIRST
26 Description					    |	26 CALLER_ROOM
27 RELEASE_DATE					    |	27 description
   								    |	28 release_date

Characterizing table completeness of 2011-Aug-26 part 1:

Column:
         1         2           Number of values missing:
1234567890123456789012345678      Number of rows (of 447598) with sparseness pattern:
............. ..............| (1) 30
............. ........... ..| (2) 3852
............. ............. | (2) 3826
.. .......... ..............| (2) 8
............. ............ .| (2) 4
.. ........................ | (2) 2
.. ....................... .| (2) 1
.........................  .| (2) 1
............. ...........  .| (3) 10540
............. ........... . | (3) 9531
.. .......... ............. | (3) 3013
.. .......... ........... ..| (3) 2153
............. ...... ...... | (3) 488
............. ...... .... ..| (3) 268
............. ..... ..... ..| (3) 25
........  ... ..............| (3) 16
.. .......... ............ .| (3) 1
........  ................. | (3) 1
.. .......... ........... . | (4) 8933
.. .......... ...........  .| (4) 8125
........  ... ........... ..| (4) 6677
........  ... ............. | (4) 6170
............. ...... .... . | (4) 314
.. .......... ...... ...... | (4) 288
............. ...... ....  .| (4) 194
.. .......... ...... .... ..| (4) 99
............. ..... .....  .| (4) 23
.. .......... ..... ..... ..| (4) 4
............. ..........  . | (4) 3
.. .....  ................. | (4) 2
.. .....  ............... ..| (4) 2
.. .....  ... ..............| (4) 1
.. .....  ... ............. | (5) 5643
.. .....  ... ........... ..| (5) 3679
........  ... ........... . | (5) 3544
........  ... ...........  .| (5) 3544
.... .    ..................| (5) 337
........  ... ...... ...... | (5) 263
.. .......... ...... .... . | (5) 235
.. .......... ...... ....  .| (5) 150
........  ... ...... .... ..| (5) 129
............. ...... ...   .| (5) 45
............. ...... ...  . | (5) 15
........  ... ..... ..... ..| (5) 14
........  ... .......... .. | (5) 10
.. .......... ..........  . | (5) 6
.. .......... ...... ... . .| (5) 2
.. .......... ..... .....  .| (5) 1
.. .....  ... ........... . | (6) 3434
.. .....  ... ...........  .| (6) 2364
.... .    ................. | (6) 1224
.... .    ............... ..| (6) 648
.. .....  ... ...... ...... | (6) 389
.... .    ... ..............| (6) 359
........  ... ...... .... . | (6) 237
.. .....  ... ...... .... ..| (6) 226
.. . .    ..................| (6) 223
........  ... ...... ....  .| (6) 130
........  ... ..... .....  .| (6) 51
.... .    ................ .| (6) 50
.. .......... ...... ...   .| (6) 20
........  ... ...... ...  ..| (6) 5
.. .......... ...... ...  . | (6) 4
........  ... ..........  . | (6) 4
.. .......... .........    .| (6) 3
.. .....  ... .......... .. | (6) 1
........  ... ...... ... . .| (6) 1
.... .    ... ............. | (7) 148107
.... .    ... ........... ..| (7) 97086
.. . .    ................. | (7) 867
.. .....  ... ...... .... . | (7) 258
.. . .    ............... ..| (7) 170
........  ... ...... ...   .| (7) 102
.. .....  ... ...... ....  .| (7) 89
.... .    ...............  .| (7) 74
.... .    ............... . | (7) 72
........  ... ...... ...  . | (7) 42
.. . .    ................ .| (7) 35
.... .    .......... ...... | (7) 34
.. .....  ... ..... .....  .| (7) 25
.... .    .......... ... ...| (7) 15
.... .    ... ............ .| (7) 12
.. . .    ... ..............| (7) 7
.... .    .......... .... ..| (7) 7
.. . .    ... ............. | (8) 36254
.. . .    ... ........... ..| (8) 16660
.... .    ... ........... . | (8) 11413
.... .    ... ...........  .| (8) 7928
.... .    ... ...... ...... | (8) 5205
.... .    ... ...... .... ..| (8) 3302
.... .    ... ..... ..... ..| (8) 122
.... .    .......... .... . | (8) 119
.. . .    ............... . | (8) 107
.. . .    ...............  .| (8) 59
.. . .    ... ............ .| (8) 44
.. .....  ... ...... ...   .| (8) 37
.. . .    .......... ... ...| (8) 34
.. . .    .......... ...... | (8) 29
.. . .    .......... .... ..| (8) 14
.... .    .......... ....  .| (8) 12
.. .....  ... ...... ...  . | (8) 6
.... .    .......... ... .. | (8) 6
.... .    .......... ... . .| (8) 1
.. . .    ... ...... ...... | (9) 7386
.. . .    ... ........... . | (9) 6338
.. . .    ... ...........  .| (9) 4875
.. . .    ... ...... .... ..| (9) 3215
.... .    ... ...... .... . | (9) 1009
.... .    ... ...... ....  .| (9) 529
.... .    ... ..... .....  .| (9) 311
.. . .    .......... .... . | (9) 104
.. . .    .......... ... .. | (9) 63
.... .    ... ...... ... .. | (9) 41
.. . .    .......... ....  .| (9) 15
.. . .    ... ..... ..... ..| (9) 11
.... .    ... ..........   .| (9) 9
.... .    ... ..........  . | (9) 5
.... .    .........  . .... | (9) 3
.... .    .......... ...  . | (9) 1
.. . .    ... ..........  ..| (9) 1
.. . .    ..............  . | (9) 1
.... .    ... ...... ...  ..| (9) 1
.. . .    ... ...... .... . | (10) 610
.. . .    ... ...... ....  .| (10) 260
.. . .    ... ..... .....  .| (10) 165
.... .    ... ...... ...   .| (10) 42
.. . .    ... ...... ... .. | (10) 41
.... .    ... ...... ...  . | (10) 24
.. . .    ... ..........   .| (10) 5
.. . .    .........  . .... | (10) 3
.. . .    ... ..........  . | (10) 2
.. . .    .......... ...  . | (10) 1
.. .....  ... .....  . .   .| (10) 1
.. . .    ... ...... ...   .| (11) 24
.. . .    ... ...... ...  . | (11) 7
...        ..        .    ..| (20) 8
...        .         .   .. | (21) 411
...        .         .    ..| (21) 118
..         ..        .    ..| (21) 47
..         .         .   .. | (22) 1881
..         .         .    ..| (22) 86
..         ..        .     .| (22) 2
..         .         .    . | (23) 2
                           .| (27) 1
____________________________  <-- Column always has a value? '|': yes, '_': no.

We can't assume that the last name uniquely identifies the visitee:

<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/chad>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/john>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/kevin>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/kim>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/kimberley>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/kojetin>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/r108628c21reference>
<http://localhost/source/whitehouse-gov/dataset/visitor-records/typed/visitee/harris/r108629c21reference>
  • Does reference by only last name indicate importance? e.g. visitee/adams/jonathon (10), visitee/alcantara/joseph (3), one of the visitee/alvarez/ (12), visitee/arboleda/edwin (13), visitee/brewster/bill (18), visitee/cooper/ryan (dozens ), visitee/murphy/james, visitee/potus (TODO: write the SPARQL query for this)
DATE FAILED: "MEETING" !~ 2 patterns @ :source/whitehouse-gov/dataset/visitor-records/part-1/version/2011-Aug-26/visit_268039 :release_date
DateTimeValueHandler FAILED: "40626.39058" !~ 2 patterns @ :source/whitehouse-gov/dataset/visitor-records/part-1/version/2011-Aug-26/visit_426415 :time_of_arrival

Version: 2011-Nov-28

  • source/WhiteHouse-WAVES-Released-0611.csv 1/20/2009 to 3/9/2011
NAMELAST
NAMEFIRST
NAMEMID
UIN
BDGNBR
ACCESS_TYPE
TOA
POA
TOD
POD
APPT_MADE_DATE
APPT_START_DATE
APPT_END_DATE
APPT_CANCEL_DATE
Total_People
LAST_UPDATEDBY
POST
LastEntryDate
TERMINAL_SUFFIX
visitee_namelast
visitee_namefirst
MEETING_LOC
MEETING_ROOM
CALLER_NAME_LAST
CALLER_NAME_FIRST
Description
RELEASE_DATE
  • source/WhiteHouse-WAVES-Released-1210.csv 1/30/2009 to 9/30/2010
  • source/WhiteHouse-WAVES-Released-2011_part2.csv 1/29/2009 to 8/31/2011
Clone this wiki locally