Skip to content

Commit

Permalink
adds first draft of bulk download docs
Browse files Browse the repository at this point in the history
  • Loading branch information
kilbergr committed Feb 21, 2024
1 parent 44dbb69 commit a406f5c
Showing 1 changed file with 61 additions and 1 deletion.
62 changes: 61 additions & 1 deletion src/templates/cap-docs-page.js
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,69 @@ export class CapDocsPage extends LitElement {
materials included.
</p>
<h2 class="c-decoratedHeader" id="bulk-downloads">Bulk Downloads</h2>
<h3>Downloading</h3>
<p>
[We will put some documentation content about bulk downloads here.]
You can download data manually from the CAP website, or select URLs
to download programmatically.
</p>
<p>
To access data, identify the reporter slug and volume number, then visit
[URL-TO-COME]/reporter-slug/volume-number.zip. For example, to
download the zip for Arkansas Reports (1837-2009), Volume 14, you'd visit
[URL-TO-COME]/ark/14.zip. You can identify the reporter slug and volume
number by selecting the reporter and volume from the jurisdiction
landing page and examining the URL:
[URL-TO-COME]/reporter=reporter-slug&volume=volume-number.
</p>
<p>
An alternative way to access downloads is to use <code>wget</code>, which retries
when it encounters a network problem. Here's an example for the same zip
mentioned above:
<pre>wget https://[URL-TO-COME]/ark/14.zip</pre>
</p>
<h3>Data Format</h3>
<p>
Bulk data files are provided as zipped directories. In these zips, you
will find directories called metadata, json, and html.
The metadata directory contains files called VolumeMetadata.json and
CasesMetadata.json. The json directory contains all cases for that volume
in JSON format. The html directory contains all cases for that volume in
HTML format.
<pre>.
├── metadata/
│ └── VolumeMetadata.json
│ └── CasesMetadata.json
├── json/
│ └── 0001-01.json
│ └── 0002-01.json
│ └── etc
└── html/
└── 0001-01.html
└── 0003-01.html
└── etc
</pre>
</p>
<h3>Using Bulk Data</h3>
<p>
The .zip file can be unzipped using third-party GUI programs like
<a href="https://theunarchiver.com/">The Unarchiver</a> (Mac) or
<a href="https://www.7-zip.org/">7-zip</a> (Windows), or from the command
line with a command like <code>unzip volume-number.zip</code>.
Once you have the directories unzipped, you can interact directly with the
files themselves. Alternatively, to read the file from the command line,
run (for example):
<pre>cat json/0001-01.json | less</pre>
</p>
<p>
If you install <a href="https://stedolan.github.io/jq/download/">jq</a>
you can run more sophisticated queries on the json files, such as
extracting the id of the case:
<pre>cat json/0001-01.json | jq .id | less</pre>
</p>
<p>
You can also interact directly with zipped files via code using libraries
such as <a href="https://docs.python.org/3/library/zipfile.html">zipfile</a>
with Python.
</article>
</main>
<cap-footer></cap-footer>
Expand Down

0 comments on commit a406f5c

Please sign in to comment.