You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now if you upload a duplicate file, the file is modified on S3 - e.g. its "last modified" timestamp changes. We need to ask some important questions for data management:
Are we simply "touching" the file or are we rewriting it?
What counts as a duplicate on S3? Presumably just the filename, or are we protected by the hash?
Can we use some temporary data store in S3 that gets cleaned regularly for protection?
Should we save datasets according to their hash, then rename on download?
We should make sure that a dataset cannot be overridden if someone uploads a different dataset with the same name.
The text was updated successfully, but these errors were encountered:
At this point we now have a new "versioning" scheme, wherein every dataset has its own unique version that you can download and go back in time to see. This is definitely a more advanced usage, and related to this issue; but further thought is going to be required. As such, I'm moving this issue back into the backlog.
Right now if you upload a duplicate file, the file is modified on S3 - e.g. its "last modified" timestamp changes. We need to ask some important questions for data management:
We should make sure that a dataset cannot be overridden if someone uploads a different dataset with the same name.
The text was updated successfully, but these errors were encountered: