Clarified scope of lock_id in InitiateFileUploadRequest #226

glpatcern · 2024-02-14T16:29:06Z

The rationale of this PR is that an InitiateFileUpload request shall only provide an endpoint, for the client to perform the upload as a separate step. Therefore, any lock validation MUST be performed again at the actual upload time, when the lock_id is to be passed again, and as such there's no need to include it in the payload of InitiateFileUplod.

How is Reva/edge dealing with atomically ensuring the lock is valid currently? @wkloucek and/or @micbar can you shed some light? My only guess is that you'd validate the lock again during the TUS upload. In that case, we may just shortcut.

For Reva/master, this is in the makings (at relatively low priority), that's why I realized only now about this.

micbar · 2024-02-14T17:44:33Z

@butonic @dragotin fyi

glpatcern · 2024-02-15T08:09:03Z

From @micbar:

Regarding the lock checking: it should be done during initiateFileUpload and on any later step too.

glpatcern · 2024-02-15T08:11:04Z

Fair enough, after all lock_id is optional and can stay, but the issue remains: the ultimate place where to check is the actual upload.

So possibly a question to @butonic : how is Reva edge validating the lock context on PUT? Which headers do you use/enforce on the clients for them to pass the lock_id? Can we agree on them so to properly implement cs3org/wopiserver#139 ?

glpatcern · 2024-02-16T11:11:04Z

I'm merging this as it's only a clarification in the comments, the semantic does not change.

The details how to enforce locking by the data transfer protocol returned in response are yet to be followed up in cs3org/wopiserver#139

butonic · 2024-02-21T19:45:14Z

So possibly a question to @butonic : how is Reva edge validating the lock context on PUT? Which headers do you use/enforce on the clients for them to pass the lock_id? Can we agree on them so to properly implement cs3org/wopiserver#139 ?

There are no direct PUT uploads to the dataprovider. Uploads MUST start with a call to InitiateFIleUpload. That will start an UploadSession, which also carries the Lock. When the UploadSession is finished, and the postprocessing outcome allows the upload to be registered as a new file revision (or as a new file) the lock will be checked before aquiring the low level file lock on the file metadata.

Upload sessions can always be written to, regardless of the protocol used to transfer the bytes. edge uses asynchronous postprocessing, so we check the lock twice: once in ̀InitiateUpload and once before creating the revision after postprocessing. Since all upload protocols only write to a temporary location, they all check the lock twice, because the lock is not relevant for the byte transfer. Only at the beginning (to prevent unnecessary uploads) and at the end for the actual check.

glpatcern · 2024-02-22T08:22:53Z

Thanks @butonic for these insights. Granted that the dataprovider does not serve direct uploads, we do exactly the same (and we also have an atomic move after uploading to a temp location, though this is an implementation detail). Yet that does not change the question: if I understand correctly, your UploadSession must actually be persisted, along with the InitiateFileUpload.lock_id metadata, such that the lock check in your postprocessing hook can be done - which is the check that ultimately rules, the initial one being an optimization to avoid unnecessary uploads, as you said. Is that the case?

What I implied with my proposal was to "leave the burden" of keeping the state (the lock metadata) on the client side, and enforce that clients pass it on the actual upload (see how I implemented that in the cs3org/wopiserver#139 PR), for the dataprovider to dispatch it to e.g. the UploadSession. Does that make sense?

butonic · 2024-02-22T08:53:21Z

Yes that is the case. UploadSessions are persisted as a separate json file to keep track of the parent id, the name, the nodeid, the lock, any metadata assigned with the upload, the current offfset and full file size ... regarding the implementation detail we use googles renameio package for the atomicity.

I like the idea of using a JWT as the upload id! Unfortunately, the upload session also has to keep track of the postprocessing state. When all bytes have been transferred we start ASYNC postprocessing and may receive a virus detected event or a delete event. Also if postprocessing is interrupted we need to be able to pick up and restart unfinished uploads. We need to keep track of that upload state. Furthermore, I want to be able to show an incoming upload in the web ui. This requires keeping track of state on the server side.

glpatcern · 2024-02-26T15:00:04Z

Unfortunately, the upload session also has to keep track of the postprocessing state. When all bytes have been transferred we start ASYNC postprocessing and may receive a virus detected event or a delete event. Also if postprocessing is interrupted we need to be able to pick up and restart unfinished uploads. We need to keep track of that upload state. Furthermore, I want to be able to show an incoming upload in the web ui. This requires keeping track of state on the server side.

Understood, it appears you have much more state to be kept server-side than we do with Reva master, where actually some events (e.g. a concurrent delete) are handled by the storage, and we don't store anything else within Reva.

Anyway, decorating the data transfer with the lock context via dedicated HTTP headers would not cause any harm - you're relying on the URL returned by the dataprovider to be unguessable, like we do, and you're not going to read that additional payload for enforcing the lock at the end of the transfer, but for the protocol specification I still see it makes sense to not impose a stateful implementation.

Removed lock_id from InitiateFileUploadRequest

ce92aaa

glpatcern requested a review from wkloucek February 14, 2024 16:29

glpatcern requested a review from labkode as a code owner February 14, 2024 16:29

glpatcern mentioned this pull request Feb 14, 2024

cs3: fixed lock and error handling on writefile cs3org/wopiserver#139

Merged

glpatcern changed the title ~~Removed lock_id from InitiateFileUploadRequest~~ Remove lock_id from InitiateFileUploadRequest Feb 14, 2024

glpatcern added a commit to glpatcern/reva that referenced this pull request Feb 14, 2024

Changed strategy following cs3org/cs3apis#226

9a8b893

glpatcern added a commit to glpatcern/reva that referenced this pull request Feb 14, 2024

Changed strategy following cs3org/cs3apis#226

36a072a

glpatcern changed the title ~~Remove lock_id from InitiateFileUploadRequest~~ Clarified scope of lock_id in InitiateFileUploadRequest Feb 15, 2024

glpatcern force-pushed the lock-initupload branch from f9c520f to 1d86380 Compare February 15, 2024 08:57

glpatcern added a commit to glpatcern/reva that referenced this pull request Feb 15, 2024

Changed strategy following cs3org/cs3apis#226

4df12b1

glpatcern force-pushed the lock-initupload branch from 1d86380 to 7c54307 Compare February 16, 2024 11:08

Reinstated lock_id and added comments for clarification

002e7e3

glpatcern force-pushed the lock-initupload branch from 7c54307 to 002e7e3 Compare February 16, 2024 11:08

glpatcern merged commit 6b01495 into cs3org:main Feb 16, 2024
2 checks passed

glpatcern added a commit to glpatcern/reva that referenced this pull request Feb 27, 2024

Changed strategy following cs3org/cs3apis#226

ad3ce4b

This was referenced Mar 5, 2024

Further clarified scope of lock_id #227

Merged

writefile in cs3 interface does not properly support locking cs3org/wopiserver#137

Closed

glpatcern added a commit to glpatcern/reva that referenced this pull request Mar 7, 2024

Changed strategy following cs3org/cs3apis#226

a8e50b8

glpatcern added a commit to glpatcern/reva that referenced this pull request Mar 29, 2024

Changed strategy following cs3org/cs3apis#226

f320cf1

glpatcern added a commit to glpatcern/reva that referenced this pull request Apr 8, 2024

Changed strategy following cs3org/cs3apis#226

c29ce86

glpatcern added a commit to glpatcern/reva that referenced this pull request Apr 15, 2024

Changed strategy following cs3org/cs3apis#226

2130d0c

glpatcern added a commit to glpatcern/reva that referenced this pull request Jul 30, 2024

Changed strategy following cs3org/cs3apis#226

507399b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarified scope of lock_id in InitiateFileUploadRequest #226

Clarified scope of lock_id in InitiateFileUploadRequest #226

glpatcern commented Feb 14, 2024 •

edited

Loading

micbar commented Feb 14, 2024

glpatcern commented Feb 15, 2024

glpatcern commented Feb 15, 2024 •

edited

Loading

glpatcern commented Feb 16, 2024

butonic commented Feb 21, 2024

glpatcern commented Feb 22, 2024

butonic commented Feb 22, 2024

glpatcern commented Feb 26, 2024

Clarified scope of lock_id in InitiateFileUploadRequest #226

Clarified scope of lock_id in InitiateFileUploadRequest #226

Conversation

glpatcern commented Feb 14, 2024 • edited Loading

micbar commented Feb 14, 2024

glpatcern commented Feb 15, 2024

glpatcern commented Feb 15, 2024 • edited Loading

glpatcern commented Feb 16, 2024

butonic commented Feb 21, 2024

glpatcern commented Feb 22, 2024

butonic commented Feb 22, 2024

glpatcern commented Feb 26, 2024

glpatcern commented Feb 14, 2024 •

edited

Loading

glpatcern commented Feb 15, 2024 •

edited

Loading