Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TTL on URLs Table #104

Open
ashley-evans opened this issue Feb 3, 2022 · 0 comments
Open

Enable TTL on URLs Table #104

ashley-evans opened this issue Feb 3, 2022 · 0 comments
Labels

Comments

@ashley-evans
Copy link
Owner

ashley-evans commented Feb 3, 2022

Value Added

Removes redundant information from URLs Table.

Description

Currently the crawl service will only crawl a site if it has not been crawled within the last 48 hours. Once crawled, the URLs found in that crawl operation are sent as part of an event to the service's bus. However, existing crawl items are never removed from the URLs table, meaning that while some items may be overwritten by future crawls, others may not.

This means that we are storing data related to old crawls that is no longer required. If updated to use TTL, the recent crawl can simply check if the document for a given base URL's root path exists and the TTL is in the future, rather than having to compare the date created to the current time.

Acceptance Criteria

AC01

  • Update the URLsTable to use TTL
  • Each item on the URLsTable should be created with a TTL attribute that is set to two days following document creation

AC02

  • Update the recent crawl lambda to only return recently crawled if the TTL attribute for the root path of any given URL is in the past

AC03

  • All updates must be performed via SAM/CloudFormation template updates
@ashley-evans ashley-evans added the 3 label Feb 3, 2022
@ashley-evans ashley-evans removed their assignment Feb 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant