From a406f5c73bc89c79b12becb1026215ccdbda5bb9 Mon Sep 17 00:00:00 2001 From: R I B Z Date: Wed, 21 Feb 2024 12:25:31 -0800 Subject: [PATCH] adds first draft of bulk download docs --- src/templates/cap-docs-page.js | 62 +++++++++++++++++++++++++++++++++- 1 file changed, 61 insertions(+), 1 deletion(-) diff --git a/src/templates/cap-docs-page.js b/src/templates/cap-docs-page.js index 1aa0745..f9e079f 100644 --- a/src/templates/cap-docs-page.js +++ b/src/templates/cap-docs-page.js @@ -52,9 +52,69 @@ export class CapDocsPage extends LitElement { materials included.

Bulk Downloads

+

Downloading

- [We will put some documentation content about bulk downloads here.] + You can download data manually from the CAP website, or select URLs + to download programmatically.

+

+ To access data, identify the reporter slug and volume number, then visit + [URL-TO-COME]/reporter-slug/volume-number.zip. For example, to + download the zip for Arkansas Reports (1837-2009), Volume 14, you'd visit + [URL-TO-COME]/ark/14.zip. You can identify the reporter slug and volume + number by selecting the reporter and volume from the jurisdiction + landing page and examining the URL: + [URL-TO-COME]/reporter=reporter-slug&volume=volume-number. +

+

+ An alternative way to access downloads is to use wget, which retries + when it encounters a network problem. Here's an example for the same zip + mentioned above: +

wget https://[URL-TO-COME]/ark/14.zip
+

+

Data Format

+

+ Bulk data files are provided as zipped directories. In these zips, you + will find directories called metadata, json, and html. + The metadata directory contains files called VolumeMetadata.json and + CasesMetadata.json. The json directory contains all cases for that volume + in JSON format. The html directory contains all cases for that volume in + HTML format. +

.
+├── metadata/
+│   └── VolumeMetadata.json
+│   └── CasesMetadata.json
+├── json/
+│   └── 0001-01.json
+│   └── 0002-01.json
+│   └── etc
+└── html/
+    └── 0001-01.html
+    └── 0003-01.html
+    └── etc
+						
+

+

Using Bulk Data

+

+ The .zip file can be unzipped using third-party GUI programs like + The Unarchiver (Mac) or + 7-zip (Windows), or from the command + line with a command like unzip volume-number.zip. + Once you have the directories unzipped, you can interact directly with the + files themselves. Alternatively, to read the file from the command line, + run (for example): +

cat json/0001-01.json | less
+

+

+ If you install jq + you can run more sophisticated queries on the json files, such as + extracting the id of the case: +

cat json/0001-01.json | jq .id | less
+

+

+ You can also interact directly with zipped files via code using libraries + such as zipfile + with Python.