Replies: 2 comments 5 replies
-
Hey @ibrahima , I've been thinking about these issues for a while too. To be clear, I am not a maintainer of shrine, just another user like you, but a long-time user. I don't really have An Answer, but I've spent a while thinking and working on what you're talking about too, so wanted to respond to share what I know, and agree that you are on the right track and I think basically correct in your understanding. Yes, that mutex in derivatives doesn't help with cross-process stuff. I'm not sure what it's doing there... I guess just to catch the simple case when you are using mutliple threads in a process to do concurrent derivative processing? But you are right it doesn't even handle multi-process derivative processing in bg workers. Shrine does have another mechanism in place to do a kind of "optimistic locking" on changes to underlying files. It wasn't initially documented very well (the fact that ti's defined kind of abstractly depending on DB adapter doesn't help), so some time ago after trying to figure out how it worked, I actually contributed some documentation here: https://shrinerb.com/docs/plugins/atomic_helpers I wrote most of that! Take a look and see what you think? I agree with you that database-level pessimistic locking is a performance risk. Some kind of "optimistic locking" is better... those "atomic helpers" are meant to give you some tools for that... but it does get really confusing quickly. I think the basic answer at this point is that shrine itself does not have a built-in concurrency-safe solution for concurrent derivative generation, you are on your own, although it does have those helpers that could be a start... In my own code (not part of shrine), I tried to build out some additional logic tooling on top for your specific use case of adding derivatives... and I tried to do it as a shrine plugin... I think it works, but it got pretty convoluted and confusing, check out my code here: https://github.com/sciencehistory/kithe/blob/master/lib/shrine/plugins/kithe_persisted_derivatives.rb I do think I solve the problem you are talking about, but I guess I'm not 100% sure of that, and it's pretty wacky. I am not sure how/if shrine could use the postgres jsonb_set operators... I've spent some time thinking through that too. Shrine is already trying to handle multiple different ORMs, then to try to add support for multiple underlying databases too (those operators are pg-specific).... I'm not sure. Another project where I am the maintainer, attr_json is all about using jsonb with ActiveRecord, and I've tried to figure out some way to hack postgres jsonb set operators into ActiveRecord in a generalized/automatic way for partial updates of attributes serialized into json... but have not been able to figure it out! It might be more possible to get shrine to do it with very specific use-cases, but it's hard to get it to play well with ActiveRecord. If you are using ActiveRecord, one solution would be using ActiveRecord's own built-in "optimistic locking" feature. https://api.rubyonrails.org/classes/ActiveRecord/Locking/Optimistic.html. Instead of the db-level pessimistic locking you are doing. It shouldn't be too hard to wire up derivative updating involving catching AR optimistic locking failures and reloading and reapplying changes... but the way the AR feature works, you'd have to turn on Optimistic Locking for the entire model shrine is in, and deal with it generally for that model, which I haven't wanted to do yet, but keep considering it. The idea of shrine using multiple columns or a separate table instead of a json column... the single json column approach is pretty baked into shrine's architecture at this point. I don't think it would be easy to make flexible, or to change, and changing it would involve some pretty serious trade-offs too -- whole new problems with keeping the multiple columns consistent with each other! Although just for derivatives... you certainly could ignore shrine's derivative feature entirely, and just implement your own based on a separate associated table where each row has it's own shrine original file, that serves as a derivative. but just using shrine single-files, that you arrange yourself into columns/tables/associations how you want. I've thought of that, but not done it either. It would also have it's own tricks. i think basically it's a problem that shrine doesn't give you a solution for, but is flexible enough to let you try to solve it in various differnet ways yourself, depending on your specific needs and nature of your use... but it's not that easy! |
Beta Was this translation helpful? Give feedback.
-
There is a guide section showing how to process derivatives concurrently. It relies on database locks, A Postgres-specific operator such as Honestly, storing files as separate records would probably be a great idea, as it would eliminate the need for DB locking. It's something I've admired in Active Storage design and wished I did in Shrine. Unfortunately, at this point it would be too difficult to shoehorn this into Shrine, as a lot of the existing functionality relies on a JSON column. So, if you would like this, I would recommend using Active Storage instead. |
Beta Was this translation helpful? Give feedback.
-
Hi! I have been looking at Shrine for a while and recently joined a project that's using it for image processing. I noticed that in our codebase, we have used database locks when processing derivatives, presumably to guard against concurrent processes (e.g. different background jobs) trying to generate derivatives and producing incorrect data that might say, drop one derivative when updating another if the second writer reads stale data. It seems like the derivatives plugin tries to handle concurrent updates, but it only does so with a Mutex, which if I understand correctly would protect against multiple threads trying to update the same record, but would not help in the case that different worker processes tried to update the same record. However, I feel like the database locking solution is a potential source of future pain, because in a high concurrency situation it could amplify the load on the system as a bunch of writers get stuck waiting for locks.
It seems like at least for Postgres, there is a jsonb_set operator which could handle fine-grained updates to the relevant derivative's field without the potential for dropping updates to other derivatives or other JSON fields. But I don't know if that's something that's available across ActiveRecord backends...
I think in general storing a lot of data on a JSONB field feels kinda risky to me due to possibilities like this, so I'm also wondering if there's a way to store this data in say, a separate table instead. I didn't find anything obvious right now but I haven't really dived into Shrine much yet (besides researching it several years ago for a different project).
I want to stress that I'm just thinking out loud here, and only came across this potential issue about half an hour ago. I don't know if we're just doing something wrong to need to lock these DB records, so it may be a non-issue for most people. (Maybe we should just avoid concurrent derivative processing or something? IDK.) Thanks for humoring me with this discussion!
Edit: After reading some docs, I did find this: https://shrinerb.com/docs/plugins/persistence#atomic-persistence. So maybe we need to be using that instead? Right now we are just calling
record.image_derivatives!
and it seems like we needed to throw a DB lock around that, so it seems like to useatomic_persist
we'd need to go a little lower level?Beta Was this translation helpful? Give feedback.
All reactions