You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In list_runids function we explicitly skip run ids that are archived on AWS Glacier, which means they will never appear in run manifest. I believe this is wrong solution as customer can restore particular folder without intention to reprocess it (let's say with PySpark job).
I propose to:
Add option list_runids(include_archived=False) to list folders archived to AWS Glacier
Make list_runids()return objects of RunId class (rather than plain string) that can hold information whether folder is archived
Add third possible state to Add state to run manifests #29 to mark RunId processing was explicitly cancelled. Something like CancelledAt.
This should make run manifests feature able to take full control over data processing.
TODO: Think what to do with another storage classes. Currently we list folders that have only STANDARD class.
The text was updated successfully, but these errors were encountered:
In
list_runids
function we explicitly skip run ids that are archived on AWS Glacier, which means they will never appear in run manifest. I believe this is wrong solution as customer can restore particular folder without intention to reprocess it (let's say with PySpark job).I propose to:
list_runids(include_archived=False)
to list folders archived to AWS Glacierlist_runids()
return objects ofRunId
class (rather than plain string) that can hold information whether folder is archivedRunId
processing was explicitly cancelled. Something likeCancelledAt
.This should make run manifests feature able to take full control over data processing.
TODO: Think what to do with another storage classes. Currently we list folders that have only
STANDARD
class.The text was updated successfully, but these errors were encountered: