-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Includes documentation of the AWS changes which are not under terraform control, as well as a general introduction to the general concept.
- Loading branch information
1 parent
0acaaf4
commit d10c484
Showing
2 changed files
with
93 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,4 +31,5 @@ nextstrain.org | |
routing | ||
infrastructure | ||
terraform | ||
resource-collection | ||
glossary |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
=================== | ||
Resource Collection | ||
=================== | ||
|
||
In order for nextstrain.org to handle URLs with `@YYYY-MM-DD` identifiers the | ||
server needs to be aware of which files exist, including past versions. | ||
In the future this data will also be used to list and display all available | ||
resources (and their versions) to the user. | ||
|
||
The index is generated by a script and the resulting JSON file is loaded by the | ||
server at start time. Resource collections can be ignored by the server by setting | ||
the env variable ``RESOURCE_INDEX="false"`` (or the equivalent in a config file). | ||
|
||
|
||
Local development | ||
================= | ||
|
||
The index creation script can be run locally which will produce a local JSON | ||
file -- see ``./resourceIndexer/main.js`` for more details. | ||
|
||
To use this file from the server set the env variable ``RESOURCE_INDEX`` to | ||
point to the (JSON) file. | ||
|
||
|
||
Automated index generation | ||
========================== | ||
|
||
*This section will be updated once the | ||
index creation is automated.* | ||
|
||
AWS settings necessary for resource collection | ||
============================================== | ||
|
||
The index creation, storage and retrieval requires certain AWS settings which | ||
are documented here as most of them are not under terraform control. We use `S3 | ||
inventories | ||
<https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html>`__ | ||
to list all the documents in certain buckets (or bucket prefixes) which are | ||
generated daily by AWS. The index creation script will download these | ||
inventories and use them to create an index JSON which it uploads to S3. The | ||
nextstrain.org server will access this JSON from S3. | ||
|
||
S3 inventories | ||
-------------- | ||
|
||
We currently produce inventories for the core (s3://nextstrain-data) and | ||
staging (s3://nextstrain-staging) buckets which are generated daily and | ||
published to s3://nextstrain-inventories. The | ||
s3://nextstrain-inventories bucket is a private bucket. The inventory | ||
configuration can be found in the AWS console for | ||
`core <https://s3.console.aws.amazon.com/s3/management/nextstrain-data/inventory/view?region=us-east-1&id=config-v1>`__ | ||
and | ||
`staging <https://s3.console.aws.amazon.com/s3/management/nextstrain-staging/inventory/view?region=us-east-1&id=config-v1>`__. | ||
The config specifies that additional metadata fields for last modified | ||
and ETag are to be included in the inventory. The inventories for core & | ||
staging are published to | ||
s3://nextstrain-inventories/nextstrain-data/config-v1 and | ||
s3://nextstrain-inventories/nextstrain-staging/config-v1, respectively. | ||
The cost of these is minimal (less than $1/bucket/year). | ||
|
||
A lifecycle rule on the s3://nextstrain-inventories bucket (`console | ||
link <https://s3.console.aws.amazon.com/s3/management/nextstrain-inventories/lifecycle/view?region=us-east-1&id=delete+stale+inventories>`__) | ||
deletes all inventory-related files 30 days after they are created. | ||
|
||
Index creation (Inventory access and index upload) | ||
-------------------------------------------------- | ||
|
||
**Automated index generation** | ||
|
||
*This section will be updated once the | ||
index creation is automated.* | ||
|
||
**Local index generation for development purposes** | ||
|
||
For local index generation (e.g. during development) you will need IAM | ||
credentials which can list and get objects from s3://nextstrain-inventories; if | ||
you want finer scale access for local index creation, you can restrict access to | ||
certain prefixes in that bucket - for instance ``nextstrain-data/config-v1`` and | ||
``nextstrain-staging/config-v1`` correspond to core and staging buckets, | ||
respectively. | ||
|
||
To upload the index you will need write access for | ||
s3://nextstrain-inventories/resources.json.gz. Note that if your aims are | ||
limited to local development purposes this is not necessary (see `Local development`_). | ||
|
||
|
||
Index access by the server | ||
-------------------------- | ||
|
||
IAM users ``nextstrain.org`` and ``nextstrain.org-testing``, which are under | ||
terraform control, have read access to | ||
s3://nextstrain-inventories/resources.json.gz via their associated policies. |