You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the crawl service will only crawl a site if it has not been crawled within the last 48 hours. Once crawled, the URLs found in that crawl operation are sent as part of an event to the service's bus. However, existing crawl items are never removed from the URLs table, meaning that while some items may be overwritten by future crawls, others may not.
This means that we are storing data related to old crawls that is no longer required. If updated to use TTL, the recent crawl can simply check if the document for a given base URL's root path exists and the TTL is in the future, rather than having to compare the date created to the current time.
Acceptance Criteria
AC01
Update the URLsTable to use TTL
Each item on the URLsTable should be created with a TTL attribute that is set to two days following document creation
AC02
Update the recent crawl lambda to only return recently crawled if the TTL attribute for the root path of any given URL is in the past
AC03
All updates must be performed via SAM/CloudFormation template updates
The text was updated successfully, but these errors were encountered:
Value Added
Removes redundant information from URLs Table.
Description
Currently the crawl service will only crawl a site if it has not been crawled within the last 48 hours. Once crawled, the URLs found in that crawl operation are sent as part of an event to the service's bus. However, existing crawl items are never removed from the URLs table, meaning that while some items may be overwritten by future crawls, others may not.
This means that we are storing data related to old crawls that is no longer required. If updated to use TTL, the recent crawl can simply check if the document for a given base URL's root path exists and the TTL is in the future, rather than having to compare the date created to the current time.
Acceptance Criteria
AC01
AC02
AC03
The text was updated successfully, but these errors were encountered: