Skip to content

AWS Lambda pipeline to parse package documentation for rdocumentation.org

License

Notifications You must be signed in to change notification settings

datacamp/RDocumentation-lambda-worker

Repository files navigation

RDocumentation-lambda-worker

Note: Please read this confluence page which explains the complete architecture of how RDocumentation works.

Set up an AWS Lambda pipeline in Node.js that every hour:

  1. Reads all packages and their versions from CRAN, Bioconductor, and Github.
  2. If the package doesn't already exist in the S3 bucket assets.rdocumentation.org, it extracts the package information, and sends a job to the rdocs-r-worker SQS queue with basic information about the package.
  3. The rdocs-r-worker will be processed by the RPackageParser service.
  4. The lambdas also update the JSON state files in the S3 bucket.

Installation (deprecated)

TODO: replace these instructions because apex doesn't work anymore.

Use apex command to deploy and invoke the lambda functions

Examples:

  • apex deploy unzip
  • apex invoke unzip
  • apex metrics unzip

License

See the LICENSE file for license rights and limitations (MIT).

About

AWS Lambda pipeline to parse package documentation for rdocumentation.org

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published